Model constructors

The LinearMixedModel type represents a linear mixed-effects model. Typically, it is constructed from a Formula and an appropriate Table type, usually a DataFrame.

Examples of linear mixed-effects model fits

For illustration, several data sets from the lme4 package for R are made available in .arrow format in this package. Often, for convenience, we will convert these to DataFrames. These data sets include the dyestuff and dyestuff2 data sets.

using DataFrames, MixedModels, StatsModels
dyestuff = MixedModels.dataset(:dyestuff)

Arrow.Table with 30 rows, 2 columns, and schema:
 :batch  String
 :yield  Int16

describe(DataFrame(dyestuff))

2×7 DataFrame

Row	variable	mean	min	median	max	nmissing	eltype
	Symbol	Union…	Any	Union…	Any	Int64	DataType
1	batch		A		F	0	String
2	yield	1527.5	1440	1530.0	1635	0	Int16

The `@formula` language in Julia

MixedModels.jl builds on the Julia formula language provided by StatsModels.jl, which is similar to the formula language in R and is also based on the notation from Wilkinson and Rogers (1973). There are two ways to construct a formula in Julia. The first way is to enclose the formula expression in the @formula macro:

StatsModels.@formula — Macro

@formula(ex)

Capture and parse a formula expression as a Formula struct.

A formula is an abstract specification of a dependence between left-hand and right-hand side variables as in, e.g., a regression model. Each side specifies at a high level how tabular data is to be converted to a numerical matrix suitable for modeling. This specification looks something like Julia code, is represented as a Julia Expr, but uses special syntax. The @formula macro takes an expression like y ~ 1 + a*b, transforms it according to the formula syntax rules into a lowered form (like y ~ 1 + a + b + a&b), and constructs a Formula struct which captures the original expression, the lowered expression, and the left- and right-hand-side.

Operators that have special interpretations in this syntax are

~ is the formula separator, where it is a binary operator (the first argument is the left-hand side, and the second is the right-hand side.
+ concatenates variables as columns when generating a model matrix.
& represents an interaction between two or more variables, which corresponds to a row-wise kronecker product of the individual terms (or element-wise product if all terms involved are continuous/scalar).
* expands to all main effects and interactions: a*b is equivalent to a+b+a&b, a*b*c to a+b+c+a&b+a&c+b&c+a&b&c, etc.
1, 0, and -1 indicate the presence (for 1) or absence (for 0 and -1) of an intercept column.

The rules that are applied are

The associative rule (un-nests nested calls to +, &, and *).
The distributive rule (interactions & distribute over concatenation +).
The * rule expands a*b to a+b+a&b (recursively).
Subtraction is converted to addition and negation, so x-1 becomes x + -1 (applies only to subtraction of literal 1).
Single-argument & calls are stripped, so &(x) becomes the main effect x.

	Est.	SE	z	p	σ_subj	σ_item
(Intercept)	-0.1526	0.3852	-0.40	0.6920	1.3392	0.3423
anger	0.0574	0.0168	3.43	0.0006
gender: M	0.3206	0.1912	1.68	0.0936
btype: scold	-1.0599	0.1842	-5.76	<1e-08
btype: shout	-2.1039	0.1865	-11.28	<1e-28
situ: self	-1.0544	0.1512	-6.97	<1e-11
mode: want	0.7070	0.1510	4.68	<1e-05

Row	batch	(Intercept)
	String	Float64
1	A	-16.6282
2	B	0.369516
3	C	26.9747
4	D	-21.8014
5	E	53.5798
6	F	-42.4943