Gradient and Hessian via ForwardDiff.jl
Experimental support for computing the gradient and the Hessian of the objective function (i.e. negative twice the profiled log likelihood) via ForwardDiff.jl are provided as a package extension.
The core functionality is provided by defining appropriate methods for ForwardDiff.gradient
and ForwardDiff.hessian
:
ForwardDiff.gradient
— MethodForwardDiff.gradient(model::LinearMixedModel)
Evaluate the gradient of the objective function at the currently fitted parameter values.
Most of MixedModels.jl relies strongly on in-place methods in order to minimize the amount of memory allocated. In addition to reducing the memory burden (especially for large models), this practice generally speeds up evaluation of the objective. In-place methods, however, generally do not play well with automatic differentiation. For the automatic differentiation support provided here, the developers instead implemented alternative, out-of-place methods. These will generally be slower and much more memory intensive, so use of this functionality is not recommended for large models.
Compatibility with ForwardDiff.jl is experimental. The precise structure, including function names and method definitions, is subject to change without being considered a breaking change. In particular, the exact set of parameters included is subject to change. The θ parameter is always included, but whether σ and/or the fixed effects should be included is currently still being decided.
ForwardDiff.hessian
— MethodForwardDiff.hessian(model::LinearMixedModel)
Evaluate the Hessian of the objective function at the currently fitted parameter values.
Most of MixedModels.jl relies strongly on in-place methods in order to minimize the amount of memory allocated. In addition to reducing the memory burden (especially for large models), this practice generally speeds up evaluation of the objective. In-place methods, however, generally do not play well with automatic differentiation. For the automatic differentiation support provided here, the developers instead implemented alternative, out-of-place methods. These will generally be slower and much more memory intensive, so use of this functionality is not recommended for large models.
Compatibility with ForwardDiff.jl is experimental. The precise structure, including function names and method definitions, is subject to change without being considered a breaking change. In particular, the exact set of parameters included is subject to change. The θ parameter is always included, but whether σ and/or the fixed effects should be included is currently still being decided.
Exact zero at optimum for trivial models
using MixedModels, ForwardDiff
fm1 = lmm(@formula(yield ~ 1 + (1|batch)), MixedModels.dataset(:dyestuff2))
Linear mixed model fit by maximum likelihood
yield ~ 1 + (1 | batch)
logLik -2 logLik AIC AICc BIC
-81.4365 162.8730 168.8730 169.7961 173.0766
Variance components:
Column Variance Std.Dev.
batch (Intercept) 0.00000 0.00000
Residual 13.34610 3.65323
Number of obs: 30; levels of grouping factors: 6
Fixed-effects parameters:
───────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
───────────────────────────────────────────────
(Intercept) 5.6656 0.666986 8.49 <1e-16
───────────────────────────────────────────────
ForwardDiff.gradient(fm1)
1-element Vector{Float64}:
0.0
ForwardDiff.hessian(fm1)
1×1 Matrix{Float64}:
28.76868076413998
Approximate zero at optimum for non trivial models
fm2 = lmm(@formula(reaction ~ 1 + days + (1+days|subj)), MixedModels.dataset(:sleepstudy))
Linear mixed model fit by maximum likelihood
reaction ~ 1 + days + (1 + days | subj)
logLik -2 logLik AIC AICc BIC
-875.9697 1751.9393 1763.9393 1764.4249 1783.0971
Variance components:
Column Variance Std.Dev. Corr.
subj (Intercept) 565.51067 23.78047
days 32.68212 5.71683 +0.08
Residual 654.94145 25.59182
Number of obs: 180; levels of grouping factors: 18
Fixed-effects parameters:
──────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
──────────────────────────────────────────────────
(Intercept) 251.405 6.63226 37.91 <1e-99
days 10.4673 1.50224 6.97 <1e-11
──────────────────────────────────────────────────
ForwardDiff.gradient(fm2)
3-element Vector{Float64}:
-7.769368957610823e-5
0.0010019612167546654
7.373940113097888e-5
ForwardDiff.hessian(fm2)
3×3 Matrix{Float64}:
45.4126 35.9366 6.35492
35.9366 465.74 203.992
6.35492 203.992 963.952