Gradient and Hessian computation

Experimental support for computing the gradient and the Hessian of the objective function (i.e. negative twice the profiled log likelihood) via ForwardDiff.jl and FiniteDiff.jl are provided as package extensions.

via ForwardDiff.jl

The core functionality is provided by defining appropriate methods for ForwardDiff.gradient and ForwardDiff.hessian:

ForwardDiff.gradientMethod
ForwardDiff.gradient(model::LinearMixedModel)

Evaluate the gradient of the objective function at the currently fitted parameter values.

Large allocations

Most of MixedModels.jl relies strongly on in-place methods in order to minimize the amount of memory allocated. In addition to reducing the memory burden (especially for large models), this practice generally speeds up evaluation of the objective. In-place methods, however, generally do not play well with automatic differentiation. For the automatic differentiation support provided here, the developers instead implemented alternative, out-of-place methods. These will generally be slower and much more memory intensive, so use of this functionality is not recommended for large models.

ForwardDiff.jl support is experimental.

Compatibility with ForwardDiff.jl is experimental. The precise structure, including function names and method definitions, is subject to change without being considered a breaking change. In particular, the exact set of parameters included is subject to change. The θ parameter is always included, but whether σ and/or the fixed effects should be included is currently still being decided.

source
ForwardDiff.hessianMethod
ForwardDiff.hessian(model::LinearMixedModel)

Evaluate the Hessian of the objective function at the currently fitted parameter values.

Large allocations

Most of MixedModels.jl relies strongly on in-place methods in order to minimize the amount of memory allocated. In addition to reducing the memory burden (especially for large models), this practice generally speeds up evaluation of the objective. In-place methods, however, generally do not play well with automatic differentiation. For the automatic differentiation support provided here, the developers instead implemented alternative, out-of-place methods. These will generally be slower and much more memory intensive, so use of this functionality is not recommended for large models.

ForwardDiff.jl support is experimental.

Compatibility with ForwardDiff.jl is experimental. The precise structure, including function names and method definitions, is subject to change without being considered a breaking change. In particular, the exact set of parameters included is subject to change. The θ parameter is always included, but whether σ and/or the fixed effects should be included is currently still being decided.

source

Exact zero at optimum for trivial models

using MixedModels, MixedModelsDatasets, ForwardDiff
fm1 = lmm(@formula(yield ~ 1 + (1|batch)), MixedModelsDatasets.dataset(:dyestuff2))
Linear mixed model fit by maximum likelihood
 yield ~ 1 + (1 | batch)
   logLik   -2 logLik     AIC       AICc        BIC    
   -81.4365   162.8730   168.8730   169.7961   173.0766

Variance components:
            Column   Variance Std.Dev.
batch    (Intercept)   0.00000 0.00000
Residual              13.34610 3.65323
 Number of obs: 30; levels of grouping factors: 6

  Fixed-effects parameters:
───────────────────────────────────────────────
              Coef.  Std. Error     z  Pr(>|z|)
───────────────────────────────────────────────
(Intercept)  5.6656    0.666986  8.49    <1e-16
───────────────────────────────────────────────
ForwardDiff.gradient(fm1)
1-element Vector{Float64}:
 0.0
ForwardDiff.hessian(fm1)
1×1 Matrix{Float64}:
 28.76868076413998

Approximate zero at optimum for non trivial models

fm2 = lmm(@formula(reaction ~ 1 + days + (1+days|subj)), MixedModelsDatasets.dataset(:sleepstudy))
Linear mixed model fit by maximum likelihood
 reaction ~ 1 + days + (1 + days | subj)
   logLik   -2 logLik     AIC       AICc        BIC    
  -875.9697  1751.9393  1763.9393  1764.4249  1783.0971

Variance components:
            Column    Variance Std.Dev.   Corr.
subj     (Intercept)  565.52071 23.78068
         days          32.68242  5.71685 +0.08
Residual              654.94015 25.59180
 Number of obs: 180; levels of grouping factors: 18

  Fixed-effects parameters:
──────────────────────────────────────────────────
                Coef.  Std. Error      z  Pr(>|z|)
──────────────────────────────────────────────────
(Intercept)  251.405      6.6323   37.91    <1e-99
days          10.4673     1.50224   6.97    <1e-11
──────────────────────────────────────────────────
ForwardDiff.gradient(fm2)
3-element Vector{Float64}:
  0.00014807478304845745
 -0.00027151479254072797
  0.0005646589082459741
ForwardDiff.hessian(fm2)
3×3 Matrix{Float64}:
 45.4118   35.9372    6.356
 35.9372  465.736   203.994
  6.356   203.994   963.95

via FiniteDiff.jl

The core functionality is provided by defining appropriate methods for FiniteDiff.finite_difference_gradient and FiniteDiff.finite_difference_hessian:

FiniteDiff.finite_difference_gradientMethod
FiniteDiff.finite_difference_gradient(model::LinearMixedModel, args...; kwargs...)

Evaluate the gradient of the objective function at the currently fitted parameter values.

FiniteDiff.jl support is experimental.

Compatibility with FiniteDiff.jl is experimental. The precise structure, including function names and method definitions, is subject to change without being considered a breaking change. In particular, the exact set of parameters included is subject to change. The θ parameter is always included, but whether σ and/or the fixed effects should be included is currently still being decided.

source
FiniteDiff.finite_difference_hessianMethod
FiniteDiff.finite_difference_hessian(model::LinearMixedModel, args...; kwargs...)

Evaluate the Hessian of the objective function at the currently fitted parameter values.

FiniteDiff.jl support is experimental.

Compatibility with FiniteDiff.jl is experimental. The precise structure, including function names and method definitions, is subject to change without being considered a breaking change. In particular, the exact set of parameters included is subject to change. The θ parameter is always included, but whether σ and/or the fixed effects should be included is currently still being decided.

source
using FiniteDiff
FiniteDiff.finite_difference_gradient(fm2)
3-element Vector{Float64}:
  0.00014822299314916167
 -0.0002720018650722045
  0.0005648056397592629
FiniteDiff.finite_difference_hessian(fm2)
3×3 LinearAlgebra.Symmetric{Float64, Matrix{Float64}}:
  40.8869   31.5168  -14.5287
  31.5168  461.419   183.595
 -14.5287  183.595   867.555