Gradient and Hessian via ForwardDiff.jl

Experimental support for computing the gradient and the Hessian of the objective function (i.e. negative twice the profiled log likelihood) via ForwardDiff.jl are provided as a package extension.

The core functionality is provided by defining appropriate methods for ForwardDiff.gradient and ForwardDiff.hessian:

ForwardDiff.gradientMethod
ForwardDiff.gradient(model::LinearMixedModel)

Evaluate the gradient of the objective function at the currently fitted parameter values.

Large allocations

Most of MixedModels.jl relies strongly on in-place methods in order to minimize the amount of memory allocated. In addition to reducing the memory burden (especially for large models), this practice generally speeds up evaluation of the objective. In-place methods, however, generally do not play well with automatic differentiation. For the automatic differentiation support provided here, the developers instead implemented alternative, out-of-place methods. These will generally be slower and much more memory intensive, so use of this functionality is not recommended for large models.

ForwardDiff.jl support is experimental.

Compatibility with ForwardDiff.jl is experimental. The precise structure, including function names and method definitions, is subject to change without being considered a breaking change. In particular, the exact set of parameters included is subject to change. The θ parameter is always included, but whether σ and/or the fixed effects should be included is currently still being decided.

source
ForwardDiff.hessianMethod
ForwardDiff.hessian(model::LinearMixedModel)

Evaluate the Hessian of the objective function at the currently fitted parameter values.

Large allocations

Most of MixedModels.jl relies strongly on in-place methods in order to minimize the amount of memory allocated. In addition to reducing the memory burden (especially for large models), this practice generally speeds up evaluation of the objective. In-place methods, however, generally do not play well with automatic differentiation. For the automatic differentiation support provided here, the developers instead implemented alternative, out-of-place methods. These will generally be slower and much more memory intensive, so use of this functionality is not recommended for large models.

ForwardDiff.jl support is experimental.

Compatibility with ForwardDiff.jl is experimental. The precise structure, including function names and method definitions, is subject to change without being considered a breaking change. In particular, the exact set of parameters included is subject to change. The θ parameter is always included, but whether σ and/or the fixed effects should be included is currently still being decided.

source

Exact zero at optimum for trivial models

using MixedModels, ForwardDiff
fm1 = lmm(@formula(yield ~ 1 + (1|batch)), MixedModels.dataset(:dyestuff2))
Linear mixed model fit by maximum likelihood
 yield ~ 1 + (1 | batch)
   logLik   -2 logLik     AIC       AICc        BIC    
   -81.4365   162.8730   168.8730   169.7961   173.0766

Variance components:
            Column   Variance Std.Dev.
batch    (Intercept)   0.00000 0.00000
Residual              13.34610 3.65323
 Number of obs: 30; levels of grouping factors: 6

  Fixed-effects parameters:
───────────────────────────────────────────────
              Coef.  Std. Error     z  Pr(>|z|)
───────────────────────────────────────────────
(Intercept)  5.6656    0.666986  8.49    <1e-16
───────────────────────────────────────────────
ForwardDiff.gradient(fm1)
1-element Vector{Float64}:
 0.0
ForwardDiff.hessian(fm1)
1×1 Matrix{Float64}:
 28.76868076413998

Approximate zero at optimum for non trivial models

fm2 = lmm(@formula(reaction ~ 1 + days + (1+days|subj)), MixedModels.dataset(:sleepstudy))
Linear mixed model fit by maximum likelihood
 reaction ~ 1 + days + (1 + days | subj)
   logLik   -2 logLik     AIC       AICc        BIC    
  -875.9697  1751.9393  1763.9393  1764.4249  1783.0971

Variance components:
            Column    Variance Std.Dev.   Corr.
subj     (Intercept)  565.51067 23.78047
         days          32.68212  5.71683 +0.08
Residual              654.94145 25.59182
 Number of obs: 180; levels of grouping factors: 18

  Fixed-effects parameters:
──────────────────────────────────────────────────
                Coef.  Std. Error      z  Pr(>|z|)
──────────────────────────────────────────────────
(Intercept)  251.405      6.63226  37.91    <1e-99
days          10.4673     1.50224   6.97    <1e-11
──────────────────────────────────────────────────
ForwardDiff.gradient(fm2)
3-element Vector{Float64}:
 -7.769368957610823e-5
  0.0010019612167546654
  7.373940113097888e-5
ForwardDiff.hessian(fm2)
3×3 Matrix{Float64}:
 45.4126    35.9366    6.35492
 35.9366   465.74    203.992
  6.35492  203.992   963.952