Parametric bootstrap for mixed-effects models
Julia is well-suited to implementing bootstrapping and other simulation-based methods for statistical models. The parametricbootstrap
function in the MixedModels package provides an efficient parametric bootstrap for mixed-effects models.
MixedModels.parametricbootstrap
— Functionparametricbootstrap([rng::AbstractRNG], nsamp::Integer, m::MixedModel{T}, ftype=T;
β = fixef(m), σ = m.σ, θ = m.θ, progress=true, optsum_overrides=(;))
Perform nsamp
parametric bootstrap replication fits of m
, returning a MixedModelBootstrap
.
The default random number generator is Random.GLOBAL_RNG
.
ftype
can be used to store the computed bootstrap values in a lower precision. ftype
is not a named argument because named arguments are not used in method dispatch and thus specialization. In other words, having ftype
as a positional argument has some potential performance benefits.
Keyword Arguments
β
,σ
, andθ
are the values ofm
's parameters for simulating the responses.σ
is only valid forLinearMixedModel
andGeneralizedLinearMixedModel
for
families with a dispersion parameter.
progress
controls whether the progress bar is shown. Note that the progress
bar is automatically disabled for non-interactive (i.e. logging) contexts.
optsum_overrides
is used to override values of OptSummary in the models
fit during the bootstrapping process. For example, optsum_overrides=(;ftol_rel=1e-08)
reduces the convergence criterion, which can greatly speed up the bootstrap fits. Taking advantage of this speed up to increase n
can often lead to better estimates of coverage intervals.
All coefficients are bootstrapped. In the rank deficient case, the inestimatable coefficients are treated as -0.0 in the simulations underlying the bootstrap, which will generally result in their estimate from the simulated data also being being inestimable and thus set to -0.0. However this behavior may change in future releases to explicitly drop the extraneous columns before simulation and thus not include their estimates in the bootstrap result.
The parametric bootstrap
Bootstrapping is a family of procedures for generating sample values of a statistic, allowing for visualization of the distribution of the statistic or for inference from this sample of values.
A parametric bootstrap is used with a parametric model, m
, that has been fit to data. The procedure is to simulate n
response vectors from m
using the estimated parameter values and refit m
to these responses in turn, accumulating the statistics of interest at each iteration.
The parameters of a LinearMixedModel
object are the fixed-effects parameters, β
, the standard deviation, σ
, of the per-observation noise, and the covariance parameter, θ
, that defines the variance-covariance matrices of the random effects.
For example, a simple linear mixed-effects model for the Dyestuff
data in the lme4
package for R
is fit by
using DataFrames
using Gadfly # plotting package
using MixedModels
using Random
dyestuff = MixedModels.dataset(:dyestuff)
m1 = fit(MixedModel, @formula(yield ~ 1 + (1 | batch)), dyestuff)
Est. | SE | z | p | σ_batch | |
---|---|---|---|---|---|
(Intercept) | 1527.5000 | 17.6946 | 86.33 | <1e-99 | 37.2603 |
Residual | 49.5101 |
To bootstrap the model parameters, first initialize a random number generator then create a bootstrap sample and extract the tbl
property, which is a Table
- a lightweight dataframe-like object.
const rng = MersenneTwister(1234321);
samp = parametricbootstrap(rng, 10_000, m1);
tbl = samp.tbl
Table with 5 columns and 10000 rows:
obj β1 σ σ1 θ1
┌────────────────────────────────────────────────
1 │ 339.022 1509.13 67.4315 14.312 0.212245
2 │ 322.689 1538.08 47.9831 25.5673 0.53284
3 │ 324.002 1508.02 50.1346 21.7622 0.434076
4 │ 331.887 1538.47 53.2238 41.0559 0.771383
5 │ 317.771 1520.62 45.2975 19.1802 0.423428
6 │ 315.181 1536.94 36.7556 49.1832 1.33812
7 │ 333.641 1519.88 53.8161 46.712 0.867993
8 │ 325.729 1528.43 47.8989 37.6367 0.785752
9 │ 311.601 1497.46 41.4 15.1257 0.365355
10 │ 335.244 1532.65 64.616 0.0 0.0
11 │ 327.935 1552.54 57.2036 0.485275 0.00848329
12 │ 323.861 1519.28 49.355 24.3703 0.493776
13 │ 332.736 1509.04 59.6272 18.2905 0.306747
14 │ 328.243 1531.7 51.5431 32.4743 0.630042
15 │ 336.186 1536.17 64.0205 15.243 0.238096
16 │ 329.468 1526.42 58.6856 0.0 0.0
17 │ 320.086 1517.67 43.218 35.9663 0.832207
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
A density plot of the estimates of σ
, the residual standard deviation, can be created as
plot(x = tbl.σ, Geom.density, Guide.xlabel("Parametric bootstrap estimates of σ"))
or, for the intercept parameter
plot(x = tbl.β1, Geom.density, Guide.xlabel("Parametric bootstrap estimates of β₁"))
A density plot of the estimates of the standard deviation of the random effects is obtained as
plot(x = tbl.σ1, Geom.density,
Guide.xlabel("Parametric bootstrap estimates of σ₁"))
Notice that this density plot has a spike, or mode, at zero. Although this mode appears to be diffuse, this is an artifact of the way that density plots are created. In fact, it is a pulse, as can be seen from a histogram.
plot(x = tbl.σ1, Geom.histogram,
Guide.xlabel("Parametric bootstrap estimates of σ₁"))
The bootstrap sample can be used to generate intervals that cover a certain percentage of the bootstrapped values. We refer to these as "coverage intervals", similar to a confidence interval. The shortest such intervals, obtained with the shortestcovint
extractor, correspond to a highest posterior density interval in Bayesian inference.
MixedModels.shortestcovint
— Functionshortestcovint(v, level = 0.95)
Return the shortest interval containing level
proportion of the values of v
shortestcovint(bsamp::MixedModelFitCollection, level = 0.95)
Return the shortest interval containing level
proportion for each parameter from bsamp.allpars
.
Currently, correlations that are systematically zero are included in the the result. This may change in a future release without being considered a breaking change.
We generate these directly from the original bootstrap object:
Table(shortestcovint(samp))
Table with 5 columns and 3 rows:
type group names lower upper
┌──────────────────────────────────────────────
1 │ β missing (Intercept) 1492.54 1561.34
2 │ σ batch (Intercept) 0.0 54.7043
3 │ σ residual missing 35.4909 63.0209
A value of zero for the standard deviation of the random effects is an example of a singular covariance. It is easy to detect the singularity in the case of a scalar random-effects term. However, it is not as straightforward to detect singularity in vector-valued random-effects terms.
For example, if we bootstrap a model fit to the sleepstudy
data
sleepstudy = MixedModels.dataset(:sleepstudy)
contrasts = Dict(:subj => Grouping())
m2 = let f = @formula reaction ~ 1+days+(1+days|subj)
fit(MixedModel, f, sleepstudy; contrasts)
end
Est. | SE | z | p | σ_subj | |
---|---|---|---|---|---|
(Intercept) | 251.4051 | 6.6323 | 37.91 | <1e-99 | 23.7805 |
days | 10.4673 | 1.5022 | 6.97 | <1e-11 | 5.7168 |
Residual | 25.5918 |
samp2 = parametricbootstrap(rng, 10_000, m2);
tbl2 = samp2.tbl
Table with 10 columns and 10000 rows:
obj β1 β2 σ σ1 σ2 ρ1 ⋯
┌────────────────────────────────────────────────────────────────────
1 │ 1762.97 262.397 10.5117 26.2691 24.8119 6.86013 -0.335778 ⋯
2 │ 1761.29 252.753 10.9396 27.4071 15.4139 5.27725 0.191491 ⋯
3 │ 1760.12 244.773 14.1153 27.7064 18.1545 3.77585 0.501708 ⋯
4 │ 1728.02 256.885 9.90894 24.3406 22.8058 4.62388 -0.125449 ⋯
5 │ 1739.12 241.968 12.9145 26.1862 8.27936 5.21775 0.596437 ⋯
6 │ 1779.57 249.305 11.2124 27.2767 24.9131 7.15134 0.220329 ⋯
7 │ 1765.37 240.264 10.7107 27.3333 25.2447 4.18552 0.613865 ⋯
8 │ 1727.31 253.876 9.17989 24.8219 22.5107 3.72496 -0.167116 ⋯
9 │ 1746.59 264.797 10.315 25.6272 21.3612 5.06129 0.310146 ⋯
10 │ 1755.46 256.464 10.4106 25.4156 35.3511 4.95412 -0.195481 ⋯
11 │ 1773.51 260.967 11.0385 28.4814 21.2083 4.39551 1.0 ⋯
12 │ 1690.61 258.852 10.3472 21.0322 26.1479 4.99185 -0.0787605 ⋯
13 │ 1742.15 251.951 8.57777 24.3526 28.4102 5.7392 0.139818 ⋯
14 │ 1733.94 260.473 9.68852 24.1964 20.8341 6.17182 0.377655 ⋯
15 │ 1781.65 249.654 11.0241 29.552 11.5514 5.28092 0.498551 ⋯
16 │ 1790.19 247.356 9.46489 28.0847 31.4617 6.25443 0.0073141 ⋯
17 │ 1740.19 245.537 11.243 25.5401 14.8452 5.53331 0.258929 ⋯
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
the singularity can be exhibited as a standard deviation of zero or as a correlation of $\pm1$.
shortestcovint(samp2)
6-element Vector{NamedTuple{(:type, :group, :names, :lower, :upper)}}:
(type = "β", group = missing, names = "(Intercept)", lower = 238.5828600121617, upper = 263.89217662125895)
(type = "β", group = missing, names = "days", lower = 7.539638299557727, upper = 13.368161276142704)
(type = "σ", group = "subj", names = "(Intercept)", lower = 10.507876051388921, upper = 33.218443046181676)
(type = "σ", group = "subj", names = "days", lower = 3.024936712829588, upper = 7.675241393755667)
(type = "ρ", group = "subj", names = "(Intercept), days", lower = -0.40535765760216863, upper = 1.0)
(type = "σ", group = "residual", names = missing, lower = 22.656763019290786, upper = 28.43122140376578)
A histogram of the estimated correlations from the bootstrap sample has a spike at +1
.
plot(x = tbl2.ρ1, Geom.histogram,
Guide.xlabel("Parametric bootstrap samples of correlation of random effects"))
or, as a count,
count(tbl2.ρ1 .≈ 1)
306
Close examination of the histogram shows a few values of -1
.
count(tbl2.ρ1 .≈ -1)
2
Furthermore there are even a few cases where the estimate of the standard deviation of the random effect for the intercept is zero.
count(tbl2.σ1 .≈ 0)
5
There is a general condition to check for singularity of an estimated covariance matrix or matrices in a bootstrap sample. The parameter optimized in the estimation is θ
, the relative covariance parameter. Some of the elements of this parameter vector must be non-negative and, when one of these components is approximately zero, one of the covariance matrices will be singular.
The issingular
method for a MixedModel
object that tests if a parameter vector θ
corresponds to a boundary or singular fit.
This operation is encapsulated in a method for the issingular
function.
count(issingular(samp2))
313
Reduced Precision Bootstrap
parametricbootstrap
accepts an optional keyword argument optsum_overrides
, which can be used to override the convergence criteria for bootstrap replicates. One possibility is setting ftol_rel=1e-8
, i.e., considering the model converged when the relative change in the objective between optimizer iterations is smaller than 0.00000001. This threshold corresponds approximately to the precision from treating the value of the objective as a single precision (Float32
) number, while not changing the precision of the intermediate computations. The resultant loss in precision will generally be smaller than the variation that the bootstrap captures, but can greatly speed up the fitting process for each replicates, especially for large models. More directly, lowering the fit quality for each replicate will reduce the quality of each replicate, but this may be more than compensated for by the ability to fit a much larger number of replicates in the same time.
t = @timed parametricbootstrap(MersenneTwister(42), 1000, m2; progress=false)
t.time
0.727492265
optsum_overrides = (; ftol_rel=1e-8)
t = @timed parametricbootstrap(MersenneTwister(42), 1000, m2; optsum_overrides, progress=false)
t.time
0.656627258
Distributed Computing and the Bootstrap
Earlier versions of MixedModels.jl supported a multi-threaded bootstrap via the use_threads
keyword argument. However, with improved BLAS multithreading, the Julia-level threads often wound up competing with the BLAS threads, leading to no improvement or even a worsening of performance when use_threads=true
. Nonetheless, the bootstrap is a classic example of an embarrassingly parallel problem and so we provide a few convenience methods for combining results computed separately. In particular, there are vcat
and an optimized reduce(::typeof(vcat))
methods for MixedModelBootstrap
objects. For computers with many processors (as opposed to a single processor with several cores) or for computing clusters, these provide a convenient way to split the computation across nodes.
using Distributed
# you already have 1 proc by default, so add the number of additional cores with `addprocs`
# you need at least as many RNGs as cores you want to use in parallel
# but you shouldn't use all of your cores because nested within this
# is the multithreading of the linear algebra
# addprocs(1)
@info "Currently using $(nprocs()) processors total and $(nworkers()) for work"
# Load the necessary packages on all workers
# For clusters, you will also need to make sure that the Julia
# environment (Project.toml) is set up and activated on each worker.
@everywhere begin
using ProgressMeter
using MixedModels
end
# copy everything to workers
@showprogress for w in workers()
remotecall_fetch(() -> coefnames(m2), w)
end
# split the replicates across the workers
# this rounds down, so if the number of workers doesn't divide the
# number of replicates, you'll be a few replicates short!
n_replicates = 1000
n_rep_per_worker = n_replicates ÷ nworkers()
# NB: You need a different seed/RNG for each worker otherwise you will
# have copies of the same replicates and not independent replicates!
pb_map = @showprogress pmap(MersenneTwister.(1:nworkers())) do rng
parametricbootstrap(rng, n_rep_per_worker, m2; optsum_overrides)
end;
# get rid of all the workers
# rmprocs(workers())
confint(reduce(vcat, pb_map))
DictTable with 2 columns and 6 rows:
par lower upper
────┬───────────────────
β1 │ 236.59 264.035
β2 │ 7.75279 13.3734
ρ1 │ -0.424371 1.0
σ │ 22.4485 28.2745
σ1 │ 10.6217 32.5576
σ2 │ 3.18136 7.74161