Parametric bootstrap for mixed-effects models

Julia is well-suited to implementing bootstrapping and other simulation-based methods for statistical models. The parametricbootstrap function in the MixedModels package provides an efficient parametric bootstrap for mixed-effects models.

MixedModels.parametricbootstrapFunction
parametricbootstrap([rng::AbstractRNG], nsamp::Integer, m::MixedModel{T}, ftype=T;
    β = fixef(m), σ = m.σ, θ = m.θ, progress=true, optsum_overrides=(;))

Perform nsamp parametric bootstrap replication fits of m, returning a MixedModelBootstrap.

The default random number generator is Random.GLOBAL_RNG.

ftype can be used to store the computed bootstrap values in a lower precision. ftype is not a named argument because named arguments are not used in method dispatch and thus specialization. In other words, having ftype as a positional argument has some potential performance benefits.

Keyword Arguments

  • β, σ, and θ are the values of m's parameters for simulating the responses.
  • σ is only valid for LinearMixedModel and GeneralizedLinearMixedModel for

families with a dispersion parameter.

  • progress controls whether the progress bar is shown. Note that the progress

bar is automatically disabled for non-interactive (i.e. logging) contexts.

  • optsum_overrides is used to override values of OptSummary in the models

fit during the bootstrapping process. For example, optsum_overrides=(;ftol_rel=1e-08) reduces the convergence criterion, which can greatly speed up the bootstrap fits. Taking advantage of this speed up to increase n can often lead to better estimates of coverage intervals.

Note

All coefficients are bootstrapped. In the rank deficient case, the inestimatable coefficients are treated as -0.0 in the simulations underlying the bootstrap, which will generally result in their estimate from the simulated data also being being inestimable and thus set to -0.0. However this behavior may change in future releases to explicitly drop the extraneous columns before simulation and thus not include their estimates in the bootstrap result.

source

The parametric bootstrap

Bootstrapping is a family of procedures for generating sample values of a statistic, allowing for visualization of the distribution of the statistic or for inference from this sample of values.

A parametric bootstrap is used with a parametric model, m, that has been fit to data. The procedure is to simulate n response vectors from m using the estimated parameter values and refit m to these responses in turn, accumulating the statistics of interest at each iteration.

The parameters of a LinearMixedModel object are the fixed-effects parameters, β, the standard deviation, σ, of the per-observation noise, and the covariance parameter, θ, that defines the variance-covariance matrices of the random effects.

For example, a simple linear mixed-effects model for the Dyestuff data in the lme4 package for R is fit by

using DataFrames
using Gadfly          # plotting package
using MixedModels
using Random
dyestuff = MixedModels.dataset(:dyestuff)
m1 = fit(MixedModel, @formula(yield ~ 1 + (1 | batch)), dyestuff)
Est.SEzpσ_batch
(Intercept)1527.500017.694686.33<1e-9937.2603
Residual49.5101

To bootstrap the model parameters, first initialize a random number generator then create a bootstrap sample and extract the tbl property, which is a Table - a lightweight dataframe-like object.

const rng = MersenneTwister(1234321);
samp = parametricbootstrap(rng, 10_000, m1);
tbl = samp.tbl
Table with 5 columns and 10000 rows:
      obj      β1       σ        σ1        θ1
    ┌────────────────────────────────────────────────
 1  │ 339.022  1509.13  67.4315  14.312    0.212245
 2  │ 322.689  1538.08  47.9831  25.5673   0.53284
 3  │ 324.002  1508.02  50.1346  21.7622   0.434076
 4  │ 331.887  1538.47  53.2238  41.0559   0.771383
 5  │ 317.771  1520.62  45.2975  19.1802   0.423428
 6  │ 315.181  1536.94  36.7556  49.1832   1.33812
 7  │ 333.641  1519.88  53.8161  46.712    0.867993
 8  │ 325.729  1528.43  47.8989  37.6367   0.785752
 9  │ 311.601  1497.46  41.4     15.1257   0.365355
 10 │ 335.244  1532.65  64.616   0.0       0.0
 11 │ 327.935  1552.54  57.2036  0.485275  0.00848329
 12 │ 323.861  1519.28  49.355   24.3703   0.493776
 13 │ 332.736  1509.04  59.6272  18.2905   0.306747
 14 │ 328.243  1531.7   51.5431  32.4743   0.630042
 15 │ 336.186  1536.17  64.0205  15.243    0.238096
 16 │ 329.468  1526.42  58.6856  0.0       0.0
 17 │ 320.086  1517.67  43.218   35.9663   0.832207
 ⋮  │    ⋮        ⋮        ⋮        ⋮          ⋮

A density plot of the estimates of σ, the residual standard deviation, can be created as

plot(x = tbl.σ, Geom.density, Guide.xlabel("Parametric bootstrap estimates of σ"))
Example block output

or, for the intercept parameter

plot(x = tbl.β1, Geom.density, Guide.xlabel("Parametric bootstrap estimates of β₁"))
Example block output

A density plot of the estimates of the standard deviation of the random effects is obtained as

plot(x = tbl.σ1, Geom.density,
    Guide.xlabel("Parametric bootstrap estimates of σ₁"))
Example block output

Notice that this density plot has a spike, or mode, at zero. Although this mode appears to be diffuse, this is an artifact of the way that density plots are created. In fact, it is a pulse, as can be seen from a histogram.

plot(x = tbl.σ1, Geom.histogram,
    Guide.xlabel("Parametric bootstrap estimates of σ₁"))
Example block output

The bootstrap sample can be used to generate intervals that cover a certain percentage of the bootstrapped values. We refer to these as "coverage intervals", similar to a confidence interval. The shortest such intervals, obtained with the shortestcovint extractor, correspond to a highest posterior density interval in Bayesian inference.

MixedModels.shortestcovintFunction
shortestcovint(v, level = 0.95)

Return the shortest interval containing level proportion of the values of v

source
shortestcovint(bsamp::MixedModelFitCollection, level = 0.95)

Return the shortest interval containing level proportion for each parameter from bsamp.allpars.

Warning

Currently, correlations that are systematically zero are included in the the result. This may change in a future release without being considered a breaking change.

source

We generate these directly from the original bootstrap object:

Table(shortestcovint(samp))
Table with 5 columns and 3 rows:
     type  group     names        lower    upper
   ┌──────────────────────────────────────────────
 1 │ β     missing   (Intercept)  1492.54  1561.34
 2 │ σ     batch     (Intercept)  0.0      54.7043
 3 │ σ     residual  missing      35.4909  63.0209

A value of zero for the standard deviation of the random effects is an example of a singular covariance. It is easy to detect the singularity in the case of a scalar random-effects term. However, it is not as straightforward to detect singularity in vector-valued random-effects terms.

For example, if we bootstrap a model fit to the sleepstudy data

sleepstudy = MixedModels.dataset(:sleepstudy)
contrasts = Dict(:subj => Grouping())
m2 = let f = @formula reaction ~ 1+days+(1+days|subj)
    fit(MixedModel, f, sleepstudy; contrasts)
end
Est.SEzpσ_subj
(Intercept)251.40516.632337.91<1e-9923.7805
days10.46731.50226.97<1e-115.7168
Residual25.5918
samp2 = parametricbootstrap(rng, 10_000, m2);
tbl2 = samp2.tbl
Table with 10 columns and 10000 rows:
      obj      β1       β2       σ        σ1       σ2       ρ1          ⋯
    ┌────────────────────────────────────────────────────────────────────
 1  │ 1762.97  262.397  10.5117  26.2691  24.8119  6.86013  -0.335778   ⋯
 2  │ 1761.29  252.753  10.9396  27.4071  15.4139  5.27725  0.191491    ⋯
 3  │ 1760.12  244.773  14.1153  27.7064  18.1545  3.77585  0.501708    ⋯
 4  │ 1728.02  256.885  9.90894  24.3406  22.8058  4.62388  -0.125449   ⋯
 5  │ 1739.12  241.968  12.9145  26.1862  8.27936  5.21775  0.596437    ⋯
 6  │ 1779.57  249.305  11.2124  27.2767  24.9131  7.15134  0.220329    ⋯
 7  │ 1765.37  240.264  10.7107  27.3333  25.2447  4.18552  0.613865    ⋯
 8  │ 1727.31  253.876  9.17989  24.8219  22.5107  3.72496  -0.167116   ⋯
 9  │ 1746.59  264.797  10.315   25.6272  21.3612  5.06129  0.310146    ⋯
 10 │ 1755.46  256.464  10.4106  25.4156  35.3511  4.95412  -0.195481   ⋯
 11 │ 1773.51  260.967  11.0385  28.4814  21.2083  4.39551  1.0         ⋯
 12 │ 1690.61  258.852  10.3472  21.0322  26.1479  4.99185  -0.0787605  ⋯
 13 │ 1742.15  251.951  8.57777  24.3526  28.4102  5.7392   0.139818    ⋯
 14 │ 1733.94  260.473  9.68852  24.1964  20.8341  6.17182  0.377655    ⋯
 15 │ 1781.65  249.654  11.0241  29.552   11.5514  5.28092  0.498551    ⋯
 16 │ 1790.19  247.356  9.46489  28.0847  31.4617  6.25443  0.0073141   ⋯
 17 │ 1740.19  245.537  11.243   25.5401  14.8452  5.53331  0.258929    ⋯
 ⋮  │    ⋮        ⋮        ⋮        ⋮        ⋮        ⋮         ⋮       ⋱

the singularity can be exhibited as a standard deviation of zero or as a correlation of $\pm1$.

shortestcovint(samp2)
6-element Vector{NamedTuple{(:type, :group, :names, :lower, :upper)}}:
 (type = "β", group = missing, names = "(Intercept)", lower = 238.5828600121617, upper = 263.89217662125895)
 (type = "β", group = missing, names = "days", lower = 7.539638299557727, upper = 13.368161276142704)
 (type = "σ", group = "subj", names = "(Intercept)", lower = 10.507876051388921, upper = 33.218443046181676)
 (type = "σ", group = "subj", names = "days", lower = 3.024936712829588, upper = 7.675241393755667)
 (type = "ρ", group = "subj", names = "(Intercept), days", lower = -0.40535765760216863, upper = 1.0)
 (type = "σ", group = "residual", names = missing, lower = 22.656763019290786, upper = 28.43122140376578)

A histogram of the estimated correlations from the bootstrap sample has a spike at +1.

plot(x = tbl2.ρ1, Geom.histogram,
    Guide.xlabel("Parametric bootstrap samples of correlation of random effects"))
Example block output

or, as a count,

count(tbl2.ρ1 .≈ 1)
306

Close examination of the histogram shows a few values of -1.

count(tbl2.ρ1 .≈ -1)
2

Furthermore there are even a few cases where the estimate of the standard deviation of the random effect for the intercept is zero.

count(tbl2.σ1 .≈ 0)
5

There is a general condition to check for singularity of an estimated covariance matrix or matrices in a bootstrap sample. The parameter optimized in the estimation is θ, the relative covariance parameter. Some of the elements of this parameter vector must be non-negative and, when one of these components is approximately zero, one of the covariance matrices will be singular.

The issingular method for a MixedModel object that tests if a parameter vector θ corresponds to a boundary or singular fit.

This operation is encapsulated in a method for the issingular function.

count(issingular(samp2))
313

Reduced Precision Bootstrap

parametricbootstrap accepts an optional keyword argument optsum_overrides, which can be used to override the convergence criteria for bootstrap replicates. One possibility is setting ftol_rel=1e-8, i.e., considering the model converged when the relative change in the objective between optimizer iterations is smaller than 0.00000001. This threshold corresponds approximately to the precision from treating the value of the objective as a single precision (Float32) number, while not changing the precision of the intermediate computations. The resultant loss in precision will generally be smaller than the variation that the bootstrap captures, but can greatly speed up the fitting process for each replicates, especially for large models. More directly, lowering the fit quality for each replicate will reduce the quality of each replicate, but this may be more than compensated for by the ability to fit a much larger number of replicates in the same time.

t = @timed parametricbootstrap(MersenneTwister(42), 1000, m2; progress=false)
t.time
0.781370117
optsum_overrides = (; ftol_rel=1e-8)
t = @timed parametricbootstrap(MersenneTwister(42), 1000, m2; optsum_overrides, progress=false)
t.time
0.651323657

Distributed Computing and the Bootstrap

Earlier versions of MixedModels.jl supported a multi-threaded bootstrap via the use_threads keyword argument. However, with improved BLAS multithreading, the Julia-level threads often wound up competing with the BLAS threads, leading to no improvement or even a worsening of performance when use_threads=true. Nonetheless, the bootstrap is a classic example of an embarrassingly parallel problem and so we provide a few convenience methods for combining results computed separately. In particular, there are vcat and an optimized reduce(::typeof(vcat)) methods for MixedModelBootstrap objects. For computers with many processors (as opposed to a single processor with several cores) or for computing clusters, these provide a convenient way to split the computation across nodes.

using Distributed
# you already have 1 proc by default, so add the number of additional cores with `addprocs`
# you need at least as many RNGs as cores you want to use in parallel
# but you shouldn't use all of your cores because nested within this
# is the multithreading of the linear algebra
# addprocs(1)
@info "Currently using $(nprocs()) processors total and $(nworkers()) for work"

# Load the necessary packages on all workers
# For clusters, you will also need to make sure that the Julia
# environment (Project.toml) is set up and activated on each worker.
@everywhere begin
    using ProgressMeter
    using MixedModels
end
# copy everything to workers
@showprogress for w in workers()
    remotecall_fetch(() -> coefnames(m2), w)
end

# split the replicates across the workers
# this rounds down, so if the number of workers doesn't divide the
# number of replicates, you'll be a few replicates short!
n_replicates = 1000
n_rep_per_worker = n_replicates ÷ nworkers()
# NB: You need a different seed/RNG for each worker otherwise you will
# have copies of the same replicates and not independent replicates!
pb_map = @showprogress pmap(MersenneTwister.(1:nworkers())) do rng
    parametricbootstrap(rng, n_rep_per_worker, m2; optsum_overrides)
end;

# get rid of all the workers
# rmprocs(workers())

confint(reduce(vcat, pb_map))
DictTable with 2 columns and 6 rows:
 par   lower      upper
 ────┬───────────────────
 β1  │ 236.59     264.035
 β2  │ 7.75279    13.3734
 ρ1  │ -0.424371  1.0
 σ   │ 22.4485    28.2745
 σ1  │ 10.6217    32.5576
 σ2  │ 3.18136    7.74161