Parametric bootstrap for linear mixed-effects models

Parametric bootstrap for linear mixed-effects models

Julia is well-suited to implementing bootstrapping and other simulation-based methods for statistical models. The parametricbootstrap function in the MixedModels package provides an efficient parametric bootstrap for linear mixed-effects models.

The parametric bootstrap

Bootstrapping is a family of procedures for generating sample values of a statistic, allowing for visualization of the distribution of the statistic or for inference from this sample of values.

A parametric bootstrap is used with a parametric model, m, that has been fitted to data. The procedure is to simulate n response vectors from m using the estimated parameter values and refit m to these responses in turn, accumulating the statistics of interest at each iteration.

The parameters of a LinearMixedModel object are the fixed-effects parameters, β, the standard deviation, σ, of the per-observation noise, and the covariance parameter, θ, that defines the variance-covariance matrices of the random effects.

For example, a simple linear mixed-effects model for the Dyestuff data in the lme4 package for R is fit by

using DataFrames, Gadfly, MixedModels, Random, RData
testdir = normpath(joinpath(dirname(pathof(MixedModels)), "..", "test"));
const dat = Dict(Symbol(k)=>v for (k,v) in load(joinpath(testdir, "dat.rda")));
ds = names!(dat[:Dyestuff], [:Batch, :Yield])  # the Dyestuff data
m1 = fit!(LinearMixedModel(@formula(Yield ~ 1 + (1 | Batch)), ds))
Linear mixed model fit by maximum likelihood
 Yield ~ 1 + (1 | Batch)
   logLik   -2 logLik     AIC        BIC    
 -163.66353  327.32706  333.32706  337.53065

Variance components:
              Column    Variance  Std.Dev. 
 Batch    (Intercept)  1388.3334 37.260347
 Residual              2451.2500 49.510100
 Number of obs: 30; levels of grouping factors: 6

  Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)    1527.5    17.6946   86.326   <1e-99
──────────────────────────────────────────────────

To bootstrap the model parameters, we first initialize a random number generator

rng = MersenneTwister(1234321);

then create a bootstrap sample

samp = parametricbootstrap(rng, 100_000, m1, (:σ, :β, :θ))
Table with 3 columns and 100000 rows:
      σ        β          θ
    ┌─────────────────────────────────
 1  │ 67.4315  [1509.13]  [0.212245]
 2  │ 47.9831  [1538.08]  [0.532841]
 3  │ 50.1346  [1508.02]  [0.434076]
 4  │ 53.2238  [1538.47]  [0.771382]
 5  │ 45.2975  [1520.62]  [0.423428]
 6  │ 36.7556  [1536.94]  [1.33812]
 7  │ 53.8161  [1519.88]  [0.867993]
 8  │ 47.8989  [1528.43]  [0.785752]
 9  │ 41.4     [1497.46]  [0.365355]
 10 │ 64.616   [1532.65]  [0.0]
 11 │ 57.2036  [1552.54]  [0.00848485]
 12 │ 49.355   [1519.28]  [0.493776]
 13 │ 59.6272  [1509.04]  [0.306747]
 14 │ 51.5431  [1531.7]   [0.630042]
 15 │ 64.0205  [1536.17]  [0.238096]
 16 │ 58.6856  [1526.42]  [0.0]
 17 │ 43.218   [1517.67]  [0.832206]
 ⋮  │    ⋮         ⋮           ⋮

The results from the sampling are returned as a Table, as defined in the TypedTables.jl package. Both $\beta$ and $\theta$ are vectors - in this case one-dimensional vectors. The first and last functions are useful for extracting individual elements from the sampled vectors.

Notice that, for some samples, the estimated value of $\theta$ is [0.0]. In fact, this is the case for about about 10% of all the samples.

sum(iszero, samp.θ)
10090

A density plot of the bootstrapped values of σ shows a slightly skewed but unimodal distribution

plot(x=samp.σ, Geom.density, Guide.xlabel("Parametric bootstrap estimates of σ"))

but a density plot of the bootstrap estimates of $\theta_1$, or of $\sigma_1=\theta_1 \cdot \sigma$

plot(x=first.(samp.θ), Geom.density, Guide.xlabel("Parametric bootstrap estimates of θ₁"))

plot(x=first.(samp.θ) .* samp.σ, Geom.density, Guide.xlabel("Parametric bootstrap estimates of σ₁"))

has a mode at zero. Although this mode shows up as being diffuse, this is an artifact of the way that density plots are created. In fact, it is a pulse, as can be seen from a histogram.

plot(x=first.(samp.θ).*samp.σ, Geom.histogram, Guide.xlabel("Parametric bootstrap estimates of σ₁"))