Parametric bootstrap for linear mixed-effects models

Parametric bootstrap for linear mixed-effects models

Julia is well-suited to implementing bootstrapping and other simulation-based methods for statistical models. The parametricbootstrap function in the MixedModels package provides an efficient parametric bootstrap for linear mixed-effects models.

parametricbootstrap(rng::AbstractRNG, nsamp::Integer, m::LinearMixedModel,
    props=(:objective, :σ, :β, :θ); β = m.β, σ = m.σ, θ = m.θ)
parametricbootstrap(nsamp::Integer, m::LinearMixedModel,
    props=(:objective, :σ, :β, :θ); β = m.β, σ = m.σ, θ = m.θ)

Perform nsamp parametric bootstrap replication fits of m, returning a Tables.ColumnTable of parameter estimates of the refit model.

The default random number generator is Random.GLOBAL_RNG.

Named Arguments

β, σ, and θ are the values of m's parameters for simulating the responses.

source

The parametric bootstrap

Bootstrapping is a family of procedures for generating sample values of a statistic, allowing for visualization of the distribution of the statistic or for inference from this sample of values.

A parametric bootstrap is used with a parametric model, m, that has been fit to data. The procedure is to simulate n response vectors from m using the estimated parameter values and refit m to these responses in turn, accumulating the statistics of interest at each iteration.

The parameters of a LinearMixedModel object are the fixed-effects parameters, β, the standard deviation, σ, of the per-observation noise, and the covariance parameter, θ, that defines the variance-covariance matrices of the random effects.

For example, a simple linear mixed-effects model for the Dyestuff data in the lme4 package for R is fit by

julia> using DataFrames, Gadfly, MixedModels, Random, RData

julia> datf = joinpath(dirname(pathof(MixedModels)), "..", "test", "dat.rda");

julia> const dat = Dict(Symbol(k)=>v for (k,v) in load(datf));
julia> ds = names!(dat[:Dyestuff], [:Batch, :Yield])  # the Dyestuff data
30×2 DataFrames.DataFrame
│ Row │ Batch        │ Yield   │
│     │ Categorical… │ Float64 │
├─────┼──────────────┼─────────┤
│ 1   │ A            │ 1545.0  │
│ 2   │ A            │ 1440.0  │
│ 3   │ A            │ 1440.0  │
│ 4   │ A            │ 1520.0  │
│ 5   │ A            │ 1580.0  │
│ 6   │ B            │ 1540.0  │
│ 7   │ B            │ 1555.0  │
⋮
│ 23  │ E            │ 1515.0  │
│ 24  │ E            │ 1635.0  │
│ 25  │ E            │ 1625.0  │
│ 26  │ F            │ 1520.0  │
│ 27  │ F            │ 1455.0  │
│ 28  │ F            │ 1450.0  │
│ 29  │ F            │ 1480.0  │
│ 30  │ F            │ 1445.0  │

julia> m1 = fit(MixedModel, @formula(Yield ~ 1 + (1 | Batch)), ds)
Linear mixed model fit by maximum likelihood
 Yield ~ 1 + (1 | Batch)
   logLik   -2 logLik     AIC        BIC    
 -163.66353  327.32706  333.32706  337.53065

Variance components:
            Column    Variance  Std.Dev. 
Batch    (Intercept)  1388.3333 37.260345
Residual              2451.2500 49.510100
 Number of obs: 30; levels of grouping factors: 6

  Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)    1527.5    17.6946   86.326   <1e-99
──────────────────────────────────────────────────

To bootstrap the model parameters, first initialize a random number generator

julia> const rng = MersenneTwister(1234321);

then create a bootstrap sample

julia> samp = parametricbootstrap(rng, 10_000, m1);

julia> propertynames(samp)
7-element Array{Symbol,1}:
 :model               
 :objective           
 :σ                   
 :θ                   
 :σs                  
 :σρs                 
 Symbol("(Intercept)")

As shown above, the sample has several named properties, which allow for convenient extraction of information. For example, a density plot of the estimates of σ, the residual standard deviation, can be created as

plot(x=samp.σ, Geom.density, Guide.xlabel("Parametric bootstrap estimates of σ"))

For the estimates of the intercept parameter, the getproperty extractor must be used

plot(x = getproperty(samp, Symbol("(Intercept)")), Geom.density,
    Guide.xlabel("Parametric bootstrap estimates of β₁"))

The σs property contains the estimates of the standard deviation of the random effects in a hierarchical format.

julia> typeof(samp.σs)
NamedTuple{(:Batch,),Tuple{Array{NamedTuple{(Symbol("(Intercept)"),),Tuple{Float64}},1}}}

This is to allow for random effects associated with more than one grouping factor. If we only have one grouping factor for random effects, which is the case here, we can use the first extractor, as in

julia> first(samp.σs)
10000-element Array{NamedTuple{(Symbol("(Intercept)"),),Tuple{Float64}},1}:
 ((Intercept) = 14.31202440021031,) 
 ((Intercept) = 25.567322049075496,)
 ((Intercept) = 21.76223553106598,) 
 ((Intercept) = 41.05589492574884,) 
 ((Intercept) = 19.18021885057642,) 
 ((Intercept) = 49.18319132233348,) 
 ((Intercept) = 46.711965099114124,)
 ((Intercept) = 37.636680075523266,)
 ((Intercept) = 15.125683728901562,)
 ((Intercept) = 0.0,)               
 ⋮                                  
 ((Intercept) = 0.0,)               
 ((Intercept) = 35.47993724363591,) 
 ((Intercept) = 85.6889420619044,)  
 ((Intercept) = 52.453848318538654,)
 ((Intercept) = 13.977380127035849,)
 ((Intercept) = 56.19821452821661,) 
 ((Intercept) = 17.3300825173626,)  
 ((Intercept) = 23.634457239496463,)
 ((Intercept) = 23.999614688511368,)

or, to carry this one step further,

plot(x=first.(first(samp.σs)), Geom.density,
    Guide.xlabel("Parametric bootstrap estimates of σ₁"))

Notice that this density plot has a spike, or mode, at zero. Although this mode appears to be diffuse, this is an artifact of the way that density plots are created. In fact, it is a pulse, as can be seen from a histogram.

plot(x=first.(first(samp.σs)), Geom.histogram,
    Guide.xlabel("Parametric bootstrap estimates of σ₁"))

A value of zero for the standard deviation of the random effects is an example of a singular covariance. It is easy to detect the singularity in the case of a scalar random-effects term. However, it is not as straightforward to detect singularity in vector-valued random-effects terms.

For example, if we bootstrap a model fit to the sleepstudy data

julia> m2 = fit(MixedModel, @formula(Y ~ 1+U+(1+U|G)), dat[:sleepstudy])
Linear mixed model fit by maximum likelihood
 Y ~ 1 + U + (1 + U | G)
   logLik   -2 logLik     AIC        BIC    
 -875.96967 1751.93934 1763.93934 1783.09709

Variance components:
            Column    Variance  Std.Dev.   Corr.
G        (Intercept)  565.51069 23.780469
         U             32.68212  5.716828  0.08
Residual              654.94145 25.591824
 Number of obs: 180; levels of grouping factors: 18

  Fixed-effects parameters:
───────────────────────────────────────────────────
             Estimate  Std.Error   z value  P(>|z|)
───────────────────────────────────────────────────
(Intercept)  251.405     6.63226  37.9064    <1e-99
U             10.4673    1.50224   6.96781   <1e-11
───────────────────────────────────────────────────
julia> samp2 = parametricbootstrap(rng, 10_000, m2);

the singularity can be exhibited as a standard deviation of zero or as a correlation of $\pm1$. The σρs property of the sample is a vector of named tuples

julia> σρ = first(samp2.σρs);

julia> typeof(σρ)
Array{NamedTuple{(:σ, :ρ),Tuple{NamedTuple{(Symbol("(Intercept)"), :U),Tuple{Float64,Float64}},Tuple{Float64}}},1}

where the first element of the tuple is itself a tuple of standard deviations and the second (also the last) element of the tuple is the correlation.

A histogram of the estimated correlations from the bootstrap sample has a spike at +1.

ρs = first.(last.(σρ))
plot(x = ρs, Geom.histogram,
    Guide.xlabel("Parametric bootstrap samples of correlation of random effects"))

or, as a count,

julia> sum(ρs .≈ 1)
329

Close examination of the histogram shows a few values of -1.

julia> sum(ρs .≈ -1)
5

Furthermore there are even a few cases where the estimate of the standard deviation of the random effect for the intercept is zero.

julia> sum(first.(first.(first.(σρ))) .≈ 0)
5

There is a general condition to check for singularity of an estimated covariance matrix or matrices in a bootstrap sample. The parameter optimized in the estimation is θ, the relative covariance parameter. Some of the elements of this parameter vector must be non-negative and, when one of these components is approximately zero, one of the covariance matrices will be singular.

The boundary values are available as

julia> samp2.model.optsum.lowerbd
3-element Array{Float64,1}:
    0.0
 -Inf  
    0.0

so the check on singularity becomes

julia> sum(θ -> any(θ .≈ samp2.model.optsum.lowerbd), samp2.θ)
339

The issingular method for a LinearMixedModel object that tests if a parameter vector θ corresponds to a boundary or singular fit. The default value of θ is that from the model but another value can be given as the second argument.

Using this function the number of singular fits in the bootstrap sample can be counted as

julia> sum(issingular.(Ref(samp2.m), samp2.θ))
339