edit

Fitting linear mixed-effects models

The lmm function is similar to the lmer function in the lme4 package for R. The first two arguments for in the R version are formula and data. The principle method for the Julia version takes these arguments.

A simple example

The simplest example of a mixed-effects model that we use in the lme4 package for R is a model fit to the Dyestuff data.

> str(Dyestuff)
'data.frame':   30 obs. of  2 variables:
 $ Batch: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 2 ...
 $ Yield: num  1545 1440 1440 1520 1580 ...
> (fm1 <- lmer(Yield ~ 1|Batch, Dyestuff, REML=FALSE))
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Yield ~ 1 | Batch
   Data: Dyestuff

      AIC       BIC    logLik  deviance
 333.3271  337.5307 -163.6635  327.3271

Random effects:
 Groups   Name        Variance Std.Dev.
 Batch    (Intercept) 1388     37.26
 Residual             2451     49.51
 Number of obs: 30, groups: Batch, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)  1527.50      17.69   86.33

These Dyestuff data are available through RCall but to run the doctests we use a stored copy of the dataframe.

julia> using DataFrames, MixedModels

julia> ds
30×2 DataFrames.DataFrame
│ Row │ Yield  │ Batch │
├─────┼────────┼───────┤
│ 1   │ 1545.0 │ 'A'   │
│ 2   │ 1440.0 │ 'A'   │
│ 3   │ 1440.0 │ 'A'   │
│ 4   │ 1520.0 │ 'A'   │
│ 5   │ 1580.0 │ 'A'   │
│ 6   │ 1540.0 │ 'B'   │
│ 7   │ 1555.0 │ 'B'   │
│ 8   │ 1490.0 │ 'B'   │
⋮
│ 22  │ 1630.0 │ 'E'   │
│ 23  │ 1515.0 │ 'E'   │
│ 24  │ 1635.0 │ 'E'   │
│ 25  │ 1625.0 │ 'E'   │
│ 26  │ 1520.0 │ 'F'   │
│ 27  │ 1455.0 │ 'F'   │
│ 28  │ 1450.0 │ 'F'   │
│ 29  │ 1480.0 │ 'F'   │
│ 30  │ 1445.0 │ 'F'   │

lmm defaults to maximum likelihood estimation whereas lmer in R defaults to REML estimation.

Linear mixed model fit by maximum likelihood
 Formula: Yield ~ 1 + (1 | Batch)
   logLik    -2 logLik     AIC        BIC    
  -163.66353  327.32706  333.32706  337.53065

Variance components:
              Column    Variance  Std.Dev.
 Batch    (Intercept)  1388.3332 37.260344
 Residual              2451.2500 49.510100
 Number of obs: 30; levels of grouping factors: 6

  Fixed-effects parameters:
             Estimate Std.Error z value
(Intercept)    1527.5   17.6946  86.326

In general the model should be fit through an explicit call to the fit! function, which may take a second argument indicating a verbose fit.

julia> fit!(lmm(Yield ~ 1 + (1 | Batch), ds), true);
f_1: 327.76702, [1.0]
f_2: 331.03619, [1.75]
f_3: 330.64583, [0.25]
f_4: 327.69511, [0.97619]
f_5: 327.56631, [0.928569]
f_6: 327.3826, [0.833327]
f_7: 327.35315, [0.807188]
f_8: 327.34663, [0.799688]
f_9: 327.341, [0.792188]
f_10: 327.33253, [0.777188]
f_11: 327.32733, [0.747188]
f_12: 327.32862, [0.739688]
f_13: 327.32706, [0.752777]
f_14: 327.32707, [0.753527]
f_15: 327.32706, [0.752584]
f_16: 327.32706, [0.752509]
f_17: 327.32706, [0.752591]
f_18: 327.32706, [0.752581]
FTOL_REACHED

The numeric representation of the model has type

julia> typeof(fit!(lmm(Yield ~ 1 + (1 | Batch), ds)))
MixedModels.LinearMixedModel{Float64}

Those familiar with the lme4 package for R will see the usual suspects.

julia> m = fit!(lmm(Yield ~ 1 + (1 | Batch), ds));

julia> fixef(m)  # estimates of the fixed-effects parameters
1-element Array{Float64,1}:
 1527.5

julia> coef(m)  # another name for fixef
1-element Array{Float64,1}:
 1527.5

julia> ranef(m)
1-element Array{Array{Float64,2},1}:
 1×6 Array{Float64,2}:
 -16.6282  0.369516  26.9747  -21.8014  53.5798  -42.4943

julia> ranef(m, true)  # on the u scale
1-element Array{Array{Float64,2},1}:
 1×6 Array{Float64,2}:
 -22.0949  0.490999  35.8429  -28.9689  71.1948  -56.4648

julia> deviance(m)
327.3270598811394

julia> objective(m)
327.3270598811394

We prefer objective to deviance because the value returned is -2loglikelihood(m), without the correction for the null deviance. It is not clear how the null deviance should be defined for these models.

More substantial examples

Fitting a model to the Dyestuff data is trivial. The InstEval data in the lme4 package is more of a challenge in that there are nearly 75,000 evaluations by 2972 students on a total of 1128 instructors.

julia> head(inst)
6x7 DataFrames.DataFrame
│ Row │ s   │ d      │ studage │ lectage │ service │ dept │ y │
┝━━━━━┿━━━━━┿━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━┿━━━┥
│ 1   │ "1" │ "1002" │ "2"     │ "2"     │ "0"     │ "2"  │ 5 │
│ 2   │ "1" │ "1050" │ "2"     │ "1"     │ "1"     │ "6"  │ 2 │
│ 3   │ "1" │ "1582" │ "2"     │ "2"     │ "0"     │ "2"  │ 5 │
│ 4   │ "1" │ "2050" │ "2"     │ "2"     │ "1"     │ "3"  │ 3 │
│ 5   │ "2" │ "115"  │ "2"     │ "1"     │ "0"     │ "5"  │ 2 │
│ 6   │ "2" │ "756"  │ "2"     │ "1"     │ "0"     │ "5"  │ 4 │

julia> m2 = fit!(lmm(y ~ 1 + dept*service + (1|s) + (1|d), inst))
Linear mixed model fit by maximum likelihood
 logLik: -118792.776708, deviance: 237585.553415, AIC: 237647.553415, BIC: 237932.876339

Variance components:
            Variance   Std.Dev.
 s        0.105417971 0.32468134
 d        0.258416394 0.50834673
 Residual 1.384727771 1.17674456
 Number of obs: 73421; levels of grouping factors: 2972, 1128

  Fixed-effects parameters:
                           Estimate Std.Error   z value
(Intercept)                 3.22961  0.064053   50.4209
dept - 5                   0.129536  0.101294   1.27882
dept - 10                 -0.176751 0.0881352  -2.00545
dept - 12                 0.0517102 0.0817524  0.632522
dept - 6                  0.0347319  0.085621  0.405647
dept - 7                    0.14594 0.0997984   1.46235
dept - 4                   0.151689 0.0816897   1.85689
dept - 8                   0.104206  0.118751  0.877517
dept - 9                  0.0440401 0.0962985  0.457329
dept - 14                 0.0517546 0.0986029  0.524879
dept - 1                  0.0466719  0.101942  0.457828
dept - 3                  0.0563461 0.0977925   0.57618
dept - 11                 0.0596536  0.100233   0.59515
dept - 2                 0.00556281  0.110867 0.0501757
service - 1                0.252025 0.0686507   3.67112
dept - 5 & service - 1    -0.180757  0.123179  -1.46744
dept - 10 & service - 1   0.0186492  0.110017  0.169512
dept - 12 & service - 1   -0.282269 0.0792937  -3.55979
dept - 6 & service - 1    -0.494464 0.0790278  -6.25683
dept - 7 & service - 1    -0.392054  0.110313  -3.55403
dept - 4 & service - 1    -0.278547 0.0823727  -3.38154
dept - 8 & service - 1    -0.189526  0.111449  -1.70056
dept - 9 & service - 1    -0.499868 0.0885423  -5.64553
dept - 14 & service - 1   -0.497162 0.0917162  -5.42065
dept - 1 & service - 1     -0.24042 0.0982071   -2.4481
dept - 3 & service - 1    -0.223013 0.0890548  -2.50422
dept - 11 & service - 1   -0.516997 0.0809077  -6.38997
dept - 2 & service - 1    -0.384773  0.091843  -4.18946

Models with vector-valued random effects can be fit

julia> slp
180×3 DataFrames.DataFrame
│ Row │ Reaction │ Days │ Subject │
├─────┼──────────┼──────┼─────────┤
│ 1   │ 249.56   │ 0    │ 1       │
│ 2   │ 258.705  │ 1    │ 1       │
│ 3   │ 250.801  │ 2    │ 1       │
│ 4   │ 321.44   │ 3    │ 1       │
│ 5   │ 356.852  │ 4    │ 1       │
│ 6   │ 414.69   │ 5    │ 1       │
│ 7   │ 382.204  │ 6    │ 1       │
│ 8   │ 290.149  │ 7    │ 1       │
⋮
│ 172 │ 273.474  │ 1    │ 18      │
│ 173 │ 297.597  │ 2    │ 18      │
│ 174 │ 310.632  │ 3    │ 18      │
│ 175 │ 287.173  │ 4    │ 18      │
│ 176 │ 329.608  │ 5    │ 18      │
│ 177 │ 334.482  │ 6    │ 18      │
│ 178 │ 343.22   │ 7    │ 18      │
│ 179 │ 369.142  │ 8    │ 18      │
│ 180 │ 364.124  │ 9    │ 18      │

julia> fm3 = fit!(lmm(Reaction ~ 1 + Days + (1+Days|Subject), slp))
Linear mixed model fit by maximum likelihood
 Formula: Reaction ~ 1 + Days + ((1 + Days) | Subject)
  logLik    -2 logLik     AIC        BIC
 -875.96967 1751.93934 1763.93934 1783.09709

Variance components:
              Column    Variance  Std.Dev.   Corr.
 Subject  (Intercept)  565.51066 23.780468
          Days          32.68212  5.716828  0.08
 Residual Days         654.94145 25.591824
 Number of obs: 180; levels of grouping factors: 18

  Fixed-effects parameters:
             Estimate Std.Error z value P(>|z|)
(Intercept)   251.405   6.63226 37.9064  <1e-99
Days          10.4673   1.50224 6.96781  <1e-11