edit

Fitting linear mixed-effects models

The lmm function is similar to the lmer function in the lme4 package for R. The first two arguments for in the R version are formula and data. The principle method for the Julia version takes these arguments.

A simple example

The simplest example of a mixed-effects model that we use in the lme4 package for R is a model fit to the Dyestuff data.

> str(Dyestuff)
'data.frame':   30 obs. of  2 variables:
 $ Batch: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 2 ...
 $ Yield: num  1545 1440 1440 1520 1580 ...
> (fm1 <- lmer(Yield ~ 1|Batch, Dyestuff, REML=FALSE))
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Yield ~ 1 | Batch
   Data: Dyestuff

      AIC       BIC    logLik  deviance
 333.3271  337.5307 -163.6635  327.3271

Random effects:
 Groups   Name        Variance Std.Dev.
 Batch    (Intercept) 1388     37.26
 Residual             2451     49.51
 Number of obs: 30, groups: Batch, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)  1527.50      17.69   86.33

These Dyestuff data are available through RCall but to run the doctests we use a stored copy of the dataframe.

julia> using DataFrames, MixedModels

julia> ds
30×2 DataFrames.DataFrame
 Row  Yield   Batch 
├─────┼────────┼───────┤
 1    1545.0  'A'   
 2    1440.0  'A'   
 3    1440.0  'A'   
 4    1520.0  'A'   
 5    1580.0  'A'   
 6    1540.0  'B'   
 7    1555.0  'B'   
 8    1490.0  'B'   

 22   1630.0  'E'   
 23   1515.0  'E'   
 24   1635.0  'E'   
 25   1625.0  'E'   
 26   1520.0  'F'   
 27   1455.0  'F'   
 28   1450.0  'F'   
 29   1480.0  'F'   
 30   1445.0  'F'   

lmm defaults to maximum likelihood estimation whereas lmer in R defaults to REML estimation.

Linear mixed model fit by maximum likelihood
 Formula: Yield ~ 1 + (1 | Batch)
   logLik    -2 logLik     AIC        BIC    
  -163.66353  327.32706  333.32706  337.53065

Variance components:
              Column    Variance  Std.Dev.
 Batch    (Intercept)  1388.3332 37.260344
 Residual              2451.2500 49.510100
 Number of obs: 30; levels of grouping factors: 6

  Fixed-effects parameters:
             Estimate Std.Error z value
(Intercept)    1527.5   17.6946  86.326

In general the model should be fit through an explicit call to the fit! function, which may take a second argument indicating a verbose fit.

julia> fit!(lmm(Yield ~ 1 + (1 | Batch), ds), true);
f_1: 327.76702, [1.0]
f_2: 331.03619, [1.75]
f_3: 330.64583, [0.25]
f_4: 327.69511, [0.97619]
f_5: 327.56631, [0.928569]
f_6: 327.3826, [0.833327]
f_7: 327.35315, [0.807188]
f_8: 327.34663, [0.799688]
f_9: 327.341, [0.792188]
f_10: 327.33253, [0.777188]
f_11: 327.32733, [0.747188]
f_12: 327.32862, [0.739688]
f_13: 327.32706, [0.752777]
f_14: 327.32707, [0.753527]
f_15: 327.32706, [0.752584]
f_16: 327.32706, [0.752509]
f_17: 327.32706, [0.752591]
f_18: 327.32706, [0.752581]
FTOL_REACHED

The numeric representation of the model has type

julia> typeof(fit!(lmm(Yield ~ 1 + (1 | Batch), ds)))
MixedModels.LinearMixedModel{Float64}

Those familiar with the lme4 package for R will see the usual suspects.

julia> m = fit!(lmm(Yield ~ 1 + (1 | Batch), ds));

julia> fixef(m)  # estimates of the fixed-effects parameters
1-element Array{Float64,1}:
 1527.5

julia> coef(m)  # another name for fixef
1-element Array{Float64,1}:
 1527.5

julia> ranef(m)
1-element Array{Array{Float64,2},1}:
 1×6 Array{Float64,2}:
 -16.6282  0.369516  26.9747  -21.8014  53.5798  -42.4943

julia> ranef(m, true)  # on the u scale
1-element Array{Array{Float64,2},1}:
 1×6 Array{Float64,2}:
 -22.0949  0.490999  35.8429  -28.9689  71.1948  -56.4648

julia> deviance(m)
327.3270598811394

julia> objective(m)
327.3270598811394

We prefer objective to deviance because the value returned is -2loglikelihood(m), without the correction for the null deviance. It is not clear how the null deviance should be defined for these models.

More substantial examples

Fitting a model to the Dyestuff data is trivial. The InstEval data in the lme4 package is more of a challenge in that there are nearly 75,000 evaluations by 2972 students on a total of 1128 instructors.

julia> head(inst)
6x7 DataFrames.DataFrame
 Row  s    d       studage  lectage  service  dept  y 
┝━━━━━┿━━━━━┿━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━┿━━━┥
 1    "1"  "1002"  "2"      "2"      "0"      "2"   5 
 2    "1"  "1050"  "2"      "1"      "1"      "6"   2 
 3    "1"  "1582"  "2"      "2"      "0"      "2"   5 
 4    "1"  "2050"  "2"      "2"      "1"      "3"   3 
 5    "2"  "115"   "2"      "1"      "0"      "5"   2 
 6    "2"  "756"   "2"      "1"      "0"      "5"   4 

julia> m2 = fit!(lmm(y ~ 1 + dept*service + (1|s) + (1|d), inst))
Linear mixed model fit by maximum likelihood
 logLik: -118792.776708, deviance: 237585.553415, AIC: 237647.553415, BIC: 237932.876339

Variance components:
            Variance   Std.Dev.
 s        0.105417971 0.32468134
 d        0.258416394 0.50834673
 Residual 1.384727771 1.17674456
 Number of obs: 73421; levels of grouping factors: 2972, 1128

  Fixed-effects parameters:
                           Estimate Std.Error   z value
(Intercept)                 3.22961  0.064053   50.4209
dept - 5                   0.129536  0.101294   1.27882
dept - 10                 -0.176751 0.0881352  -2.00545
dept - 12                 0.0517102 0.0817524  0.632522
dept - 6                  0.0347319  0.085621  0.405647
dept - 7                    0.14594 0.0997984   1.46235
dept - 4                   0.151689 0.0816897   1.85689
dept - 8                   0.104206  0.118751  0.877517
dept - 9                  0.0440401 0.0962985  0.457329
dept - 14                 0.0517546 0.0986029  0.524879
dept - 1                  0.0466719  0.101942  0.457828
dept - 3                  0.0563461 0.0977925   0.57618
dept - 11                 0.0596536  0.100233   0.59515
dept - 2                 0.00556281  0.110867 0.0501757
service - 1                0.252025 0.0686507   3.67112
dept - 5 & service - 1    -0.180757  0.123179  -1.46744
dept - 10 & service - 1   0.0186492  0.110017  0.169512
dept - 12 & service - 1   -0.282269 0.0792937  -3.55979
dept - 6 & service - 1    -0.494464 0.0790278  -6.25683
dept - 7 & service - 1    -0.392054  0.110313  -3.55403
dept - 4 & service - 1    -0.278547 0.0823727  -3.38154
dept - 8 & service - 1    -0.189526  0.111449  -1.70056
dept - 9 & service - 1    -0.499868 0.0885423  -5.64553
dept - 14 & service - 1   -0.497162 0.0917162  -5.42065
dept - 1 & service - 1     -0.24042 0.0982071   -2.4481
dept - 3 & service - 1    -0.223013 0.0890548  -2.50422
dept - 11 & service - 1   -0.516997 0.0809077  -6.38997
dept - 2 & service - 1    -0.384773  0.091843  -4.18946

Models with vector-valued random effects can be fit

julia> slp
180×3 DataFrames.DataFrame
 Row  Reaction  Days  Subject 
├─────┼──────────┼──────┼─────────┤
 1    249.56    0     1       
 2    258.705   1     1       
 3    250.801   2     1       
 4    321.44    3     1       
 5    356.852   4     1       
 6    414.69    5     1       
 7    382.204   6     1       
 8    290.149   7     1       

 172  273.474   1     18      
 173  297.597   2     18      
 174  310.632   3     18      
 175  287.173   4     18      
 176  329.608   5     18      
 177  334.482   6     18      
 178  343.22    7     18      
 179  369.142   8     18      
 180  364.124   9     18      

julia> fm3 = fit!(lmm(Reaction ~ 1 + Days + (1+Days|Subject), slp))
Linear mixed model fit by maximum likelihood
 Formula: Reaction ~ 1 + Days + ((1 + Days) | Subject)
  logLik    -2 logLik     AIC        BIC
 -875.96967 1751.93934 1763.93934 1783.09709

Variance components:
              Column    Variance  Std.Dev.   Corr.
 Subject  (Intercept)  565.51066 23.780468
          Days          32.68212  5.716828  0.08
 Residual Days         654.94145 25.591824
 Number of obs: 180; levels of grouping factors: 18

  Fixed-effects parameters:
             Estimate Std.Error z value P(>|z|)
(Intercept)   251.405   6.63226 37.9064  <1e-99
Days          10.4673   1.50224 6.96781  <1e-11