GLM Documentation
Installation
Pkg.add("GLM")
will install this package and its dependencies, which includes the Distributions package.
The RDatasets package is useful for fitting models on standard R datasets to compare the results with those from R.
Fitting GLM models
To fit a Generalized Linear Model (GLM), use the function, glm(formula, data, family, link)
, where,
formula
: uses column symbols from the DataFrame data, for example, ifnames(data)=[:Y,:X1,:X2]
, then a valid formula is@formula(Y ~ X1 + X2)
data
: a DataFrame which may contain NA values, any rows with NA values are ignoredfamily
: chosen fromBernoulli()
,Binomial()
,Gamma()
,Normal()
, orPoisson()
link
: chosen from the list below, for example,LogitLink()
is a valid link for theBinomial()
family
An intercept is included in any GLM by default.
Methods applied to fitted models
Many of the methods provided by this package have names similar to those in R.
coef
: extract the estimates of the coefficients in the modeldeviance
: measure of the model fit, weighted residual sum of squares for lm'sdof_residual
: degrees of freedom for residuals, when meaningfulglm
: fit a generalized linear model (an alias forfit(GeneralizedLinearModel, ...)
)lm
: fit a linear model (an alias forfit(LinearModel, ...)
)stderror
: standard errors of the coefficientsvcov
: estimated variance-covariance matrix of the coefficient estimatespredict
: obtain predicted values of the dependent variable from the fitted model
Minimal examples
Ordinary Least Squares Regression:
julia> using DataFrames, GLM
julia> data = DataFrame(X=[1,2,3], Y=[2,4,7])
3×2 DataFrames.DataFrame
│ Row │ X │ Y │
├─────┼───┼───┤
│ 1 │ 1 │ 2 │
│ 2 │ 2 │ 4 │
│ 3 │ 3 │ 7 │
julia> ols = lm(@formula(Y ~ X), data)
StatsModels.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: Y ~ 1 + X
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) -0.666667 0.62361 -1.06904 0.4788
X 2.5 0.288675 8.66025 0.0732
julia> stderror(ols)
2-element Array{Float64,1}:
0.62361
0.288675
julia> predict(ols)
3-element Array{Float64,1}:
1.83333
4.33333
6.83333
<!– Andreas Noack: As of 9 May 2018 this example doesn't work so I've temporarily commented it out julia> newX = DataFrame(X=[2,3,4]);
julia> predict(ols, newX, :confint) 3×3 Array{Float64,2}: 4.33333 1.33845 7.32821 6.83333 2.09801 11.5687 9.33333 1.40962 17.257 The columns of the matrix are prediction, 95% lower and upper confidence bounds –>
Probit Regression:
julia> data = DataFrame(X=[1,2,3], Y=[1,0,1])
3×2 DataFrames.DataFrame
│ Row │ X │ Y │
├─────┼───┼───┤
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 0 │
│ 3 │ 3 │ 1 │
julia> probit = glm(@formula(Y ~ X), data, Binomial(), ProbitLink())
StatsModels.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Binomial{Float64},GLM.ProbitLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: Y ~ 1 + X
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) 0.430727 1.98019 0.217518 0.8278
X -3.64399e-19 0.91665 -3.97534e-19 1.0000
Other examples
An example of a simple linear model in R is
> coef(summary(lm(optden ~ carb, Formaldehyde)))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.005085714 0.007833679 0.6492115 5.515953e-01
carb 0.876285714 0.013534536 64.7444207 3.409192e-07
The corresponding model with the GLM
package is
julia> using GLM, RDatasets
julia> form = dataset("datasets", "Formaldehyde")
6×2 DataFrames.DataFrame
│ Row │ Carb │ OptDen │
├─────┼──────┼────────┤
│ 1 │ 0.1 │ 0.086 │
│ 2 │ 0.3 │ 0.269 │
│ 3 │ 0.5 │ 0.446 │
│ 4 │ 0.6 │ 0.538 │
│ 5 │ 0.7 │ 0.626 │
│ 6 │ 0.9 │ 0.782 │
julia> lm1 = fit(LinearModel, @formula(OptDen ~ Carb), form)
StatsModels.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: OptDen ~ 1 + Carb
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 0.00508571 0.00783368 0.649211 0.5516
Carb 0.876286 0.0135345 64.7444 <1e-6
julia> confint(lm1)
2×2 Array{Float64,2}:
-0.0166641 0.0268355
0.838708 0.913864
A more complex example in R is
> coef(summary(lm(sr ~ pop15 + pop75 + dpi + ddpi, LifeCycleSavings)))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.5660865407 7.3545161062 3.8841558 0.0003338249
pop15 -0.4611931471 0.1446422248 -3.1885098 0.0026030189
pop75 -1.6914976767 1.0835989307 -1.5609998 0.1255297940
dpi -0.0003369019 0.0009311072 -0.3618293 0.7191731554
ddpi 0.4096949279 0.1961971276 2.0881801 0.0424711387
with the corresponding Julia code
julia> LifeCycleSavings = dataset("datasets", "LifeCycleSavings")
50×6 DataFrames.DataFrame
│ Row │ Country │ SR │ Pop15 │ Pop75 │ DPI │ DDPI │
├─────┼────────────────┼───────┼───────┼───────┼─────────┼───────┤
│ 1 │ Australia │ 11.43 │ 29.35 │ 2.87 │ 2329.68 │ 2.87 │
│ 2 │ Austria │ 12.07 │ 23.32 │ 4.41 │ 1507.99 │ 3.93 │
│ 3 │ Belgium │ 13.17 │ 23.8 │ 4.43 │ 2108.47 │ 3.82 │
│ 4 │ Bolivia │ 5.75 │ 41.89 │ 1.67 │ 189.13 │ 0.22 │
│ 5 │ Brazil │ 12.88 │ 42.19 │ 0.83 │ 728.47 │ 4.56 │
│ 6 │ Canada │ 8.79 │ 31.72 │ 2.85 │ 2982.88 │ 2.43 │
│ 7 │ Chile │ 0.6 │ 39.74 │ 1.34 │ 662.86 │ 2.67 │
│ 8 │ China │ 11.9 │ 44.75 │ 0.67 │ 289.52 │ 6.51 │
⋮
│ 42 │ Tunisia │ 2.81 │ 46.12 │ 1.21 │ 249.87 │ 1.13 │
│ 43 │ United Kingdom │ 7.81 │ 23.27 │ 4.46 │ 1813.93 │ 2.01 │
│ 44 │ United States │ 7.56 │ 29.81 │ 3.43 │ 4001.89 │ 2.45 │
│ 45 │ Venezuela │ 9.22 │ 46.4 │ 0.9 │ 813.39 │ 0.53 │
│ 46 │ Zambia │ 18.56 │ 45.25 │ 0.56 │ 138.33 │ 5.14 │
│ 47 │ Jamaica │ 7.72 │ 41.12 │ 1.73 │ 380.47 │ 10.23 │
│ 48 │ Uruguay │ 9.24 │ 28.13 │ 2.72 │ 766.54 │ 1.88 │
│ 49 │ Libya │ 8.89 │ 43.69 │ 2.07 │ 123.58 │ 16.71 │
│ 50 │ Malaysia │ 4.71 │ 47.2 │ 0.66 │ 242.69 │ 5.08 │
julia> fm2 = fit(LinearModel, @formula(SR ~ Pop15 + Pop75 + DPI + DDPI), LifeCycleSavings)
StatsModels.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: SR ~ 1 + Pop15 + Pop75 + DPI + DDPI
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 28.5661 7.35452 3.88416 0.0003
Pop15 -0.461193 0.144642 -3.18851 0.0026
Pop75 -1.6915 1.0836 -1.561 0.1255
DPI -0.000336902 0.000931107 -0.361829 0.7192
DDPI 0.409695 0.196197 2.08818 0.0425
The glm
function (or equivalently, fit(GeneralizedLinearModel, ...)
) works similarly to the R glm
function except that the family
argument is replaced by a Distribution
type and, optionally, a Link
type. The first example from ?glm
in R is
glm> ## Dobson (1990) Page 93: Randomized Controlled Trial :
glm> counts <- c(18,17,15,20,10,20,25,13,12)
glm> outcome <- gl(3,1,9)
glm> treatment <- gl(3,3)
glm> print(d.AD <- data.frame(treatment, outcome, counts))
treatment outcome counts
1 1 1 18
2 1 2 17
3 1 3 15
4 2 1 20
5 2 2 10
6 2 3 20
7 3 1 25
8 3 2 13
9 3 3 12
glm> glm.D93 <- glm(counts ~ outcome + treatment, family=poisson())
glm> anova(glm.D93)
Analysis of Deviance Table
Model: poisson, link: log
Response: counts
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 8 10.5814
outcome 2 5.4523 6 5.1291
treatment 2 0.0000 4 5.1291
glm> ## No test:
glm> summary(glm.D93)
Call:
glm(formula = counts ~ outcome + treatment, family = poisson())
Deviance Residuals:
1 2 3 4 5 6 7 8
-0.67125 0.96272 -0.16965 -0.21999 -0.95552 1.04939 0.84715 -0.09167
9
-0.96656
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.045e+00 1.709e-01 17.815 <2e-16 ***
outcome2 -4.543e-01 2.022e-01 -2.247 0.0246 *
outcome3 -2.930e-01 1.927e-01 -1.520 0.1285
treatment2 3.795e-16 2.000e-01 0.000 1.0000
treatment3 3.553e-16 2.000e-01 0.000 1.0000
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 10.5814 on 8 degrees of freedom
Residual deviance: 5.1291 on 4 degrees of freedom
AIC: 56.761
Number of Fisher Scoring iterations: 4
In Julia this becomes
julia> using DataFrames, CategoricalArrays, GLM
julia> dobson = DataFrame(Counts = [18.,17,15,20,10,20,25,13,12],
Outcome = categorical([1,2,3,1,2,3,1,2,3]),
Treatment = categorical([1,1,1,2,2,2,3,3,3]))
9×3 DataFrames.DataFrame
│ Row │ Counts │ Outcome │ Treatment │
├─────┼────────┼─────────┼───────────┤
│ 1 │ 18.0 │ 1 │ 1 │
│ 2 │ 17.0 │ 2 │ 1 │
│ 3 │ 15.0 │ 3 │ 1 │
│ 4 │ 20.0 │ 1 │ 2 │
│ 5 │ 10.0 │ 2 │ 2 │
│ 6 │ 20.0 │ 3 │ 2 │
│ 7 │ 25.0 │ 1 │ 3 │
│ 8 │ 13.0 │ 2 │ 3 │
│ 9 │ 12.0 │ 3 │ 3 │
julia> gm1 = fit(GeneralizedLinearModel, @formula(Counts ~ Outcome + Treatment), dobson, Poisson())
StatsModels.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Poisson{Float64},GLM.LogLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: Counts ~ 1 + Outcome + Treatment
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) 3.04452 0.170899 17.8148 <1e-70
Outcome: 2 -0.454255 0.202171 -2.24689 0.0246
Outcome: 3 -0.292987 0.192742 -1.5201 0.1285
Treatment: 2 4.61065e-16 0.2 2.30532e-15 1.0000
Treatment: 3 3.44687e-17 0.2 1.72344e-16 1.0000
julia> deviance(gm1)
5.129141077001145
Typical distributions for use with glm
and their canonical link functions are
Bernoulli (LogitLink)
Binomial (LogitLink)
Gamma (InverseLink)
Normal (IdentityLink)
Poisson (LogLink)
Currently the available Link types are
CauchitLink
CloglogLink
IdentityLink
InverseLink
LogitLink
LogLink
ProbitLink
SqrtLink
Separation of response object and predictor object
The general approach in this code is to separate functionality related to the response from that related to the linear predictor. This allows for greater generality by mixing and matching different subtypes of the abstract type LinPred
and the abstract type ModResp
.
A LinPred
type incorporates the parameter vector and the model matrix. The parameter vector is a dense numeric vector but the model matrix can be dense or sparse. A LinPred
type must incorporate some form of a decomposition of the weighted model matrix that allows for the solution of a system X'W * X * delta=X'wres
where W
is a diagonal matrix of "X weights", provided as a vector of the square roots of the diagonal elements, and wres
is a weighted residual vector.
Currently there are two dense predictor types, DensePredQR
and DensePredChol
, and the usual caveats apply. The Cholesky version is faster but somewhat less accurate than that QR version. The skeleton of a distributed predictor type is in the code but not yet fully fleshed out. Because Julia by default uses OpenBLAS, which is already multi-threaded on multicore machines, there may not be much advantage in using distributed predictor types.
A ModResp
type must provide methods for the wtres
and sqrtxwts
generics. Their values are the arguments to the updatebeta
methods of the LinPred
types. The Float64
value returned by updatedelta
is the value of the convergence criterion.
Similarly, LinPred
types must provide a method for the linpred
generic. In general linpred
takes an instance of a LinPred
type and a step factor. Methods that take only an instance of a LinPred
type use a default step factor of 1. The value of linpred
is the argument to the updatemu
method for ModResp
types. The updatemu
method returns the updated deviance.
API
Types defined in the package
GLM.LinearModel
— Type.GLM.LmResp
— Type.LmResp
Encapsulates the response for a linear model
Members
mu
: current value of the mean response vector or fitted valueoffset
: optional offset added to the linear predictor to formmu
wts
: optional vector of prior weightsy
: observed response vector
Either or both offset
and wts
may be of length 0
GLM.LinPred
— Type.LinPred
Abstract type representing a linear predictor
GLM.GlmResp
— Type.GlmResp
The response vector and various derived vectors in a generalized linear model.
GLM.DensePredQR
— Type.DensePredQR
A LinPred
type with a dense, unpivoted QR decomposition of X
Members
X
: Model matrix of sizen
×p
withn ≥ p
. Should be full column rank.beta0
: base coefficient vector of lengthp
delbeta
: increment to coefficient vector, also of lengthp
scratchbeta
: scratch vector of lengthp
, used inlinpred!
methodqr
: aQRCompactWY
object created fromX
, with optional row weights.
GLM.DensePredChol
— Type.DensePredChol{T}
A LinPred
type with a dense Cholesky factorization of X'X
Members
X
: model matrix of sizen
×p
withn ≥ p
. Should be full column rank.beta0
: base coefficient vector of lengthp
delbeta
: increment to coefficient vector, also of lengthp
scratchbeta
: scratch vector of lengthp
, used inlinpred!
methodchol
: aBase.LinAlg.Cholesky
object created fromX'X
, possibly using row weights.scratchm1
: scratch Matrix{T} of the same size asX
scratchm2
: scratch Matrix{T} os the same size asX'X
Constructors for models
The most general approach to fitting a model is with the fit
function, as in
julia> fit(LinearModel, hcat(ones(10), 1:10), randn(MersenneTwister(12321), 10))
GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}}:
Coefficients:
Estimate Std.Error t value Pr(>|t|)
x1 0.717436 0.775175 0.925515 0.3818
x2 -0.152062 0.124931 -1.21717 0.2582
This model can also be fit as
julia> lm(hcat(ones(10), 1:10), randn(MersenneTwister(12321), 10))
GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}}:
Coefficients:
Estimate Std.Error t value Pr(>|t|)
x1 0.717436 0.775175 0.925515 0.3818
x2 -0.152062 0.124931 -1.21717 0.2582
Methods for model updating
GLM.delbeta!
— Function.delbeta!(p::LinPred, r::Vector)
Evaluate and return p.delbeta
the increment to the coefficient vector from residual r
GLM.linpred!
— Function.linpred!(out, p::LinPred, f::Real=1.0)
Overwrite out
with the linear predictor from p
with factor f
The effective coefficient vector, p.scratchbeta
, is evaluated as p.beta0 .+ f * p.delbeta
, and out
is updated to p.X * p.scratchbeta
GLM.linpred
— Function.linpred(p::LinPred, f::Read=1.0)
Return the linear predictor p.X * (p.beta0 .+ f * p.delbeta)
GLM.installbeta!
— Function.installbeta!(p::LinPred, f::Real=1.0)
Install pbeta0 .+= f * p.delbeta
and zero out p.delbeta
. Return the updated p.beta0
.
GLM.cancancel
— Function.cancancel(r::GlmResp{V,D,L})
Returns true
if dμ/dη for link L
is the variance function for distribution D
When L
is the canonical link for D
the derivative of the inverse link is a multiple of the variance function for D
. If they are the same a numerator and denominator term in the expression for the working weights will cancel.
GLM.updateμ!
— Function.updateμ!{T<:FPVector}(r::GlmResp{T}, linPr::T)
Update the mean, working weights and working residuals, in r
given a value of the linear predictor, linPr
.
GLM.wrkresp
— Function.wrkresp(r::GlmResp)
The working response, r.eta + r.wrkresid - r.offset
.
GLM.wrkresp!
— Function.wrkresp!{T<:FPVector}(v::T, r::GlmResp{T})
Overwrite v
with the working response of r
GLM.dispersion
— Function.dispersion(m::AbstractGLM, sqr::Bool=false)
Return the estimated dispersion (or scale) parameter for a model's distribution, generally written σ² for linear models and ϕ for generalized linear models. It is, by definition, equal to 1 for the Bernoulli, Binomial, and Poisson families.
If sqr
is true
, the squared dispersion parameter is returned.
Links and methods applied to them
GLM.Link
— Type.Link
An abstract type whose subtypes determine methods for linkfun
, linkinv
, mueta
, and inverselink
.
GLM.Link01
— Type.Link01
An abstract subtype of Link
which are links defined on (0, 1)
GLM.CauchitLink
— Type.CauchitLink
A Link01
corresponding to the standard Cauchy distribution, Distributions.Cauchy
.
GLM.CloglogLink
— Type.CloglogLink
A Link01
corresponding to the extreme value (or log-Wiebull) distribution. The link is the complementary log-log transformation, log(1 - log(-μ))
.
GLM.IdentityLink
— Type.IdentityLink
The canonical Link
for the Normal
distribution, defined as η = μ
.
GLM.InverseLink
— Type.InverseLink
The canonical Link
for Distributions.Gamma
distribution, defined as η = inv(μ)
.
GLM.InverseSquareLink
— Type.InverseSquareLink
The canonical Link
for Distributions.InverseGaussian
distribution, defined as η = inv(abs2(μ))
.
GLM.LogitLink
— Type.LogitLink
The canonical Link01
for Distributions.Bernoulli
and Distributions.Binomial
. The inverse link, linkinv
, is the c.d.f. of the standard logistic distribution, Distributions.Logistic
.
GLM.LogLink
— Type.LogLink
The canonical Link
for Distributions.Poisson
, defined as η = log(μ)
.
GLM.ProbitLink
— Type.ProbitLink
A Link01
whose linkinv
is the c.d.f. of the standard normal distribution, Distributions.Normal()
.
GLM.SqrtLink
— Type.SqrtLink
A Link
defined as η = √μ
GLM.linkfun
— Function.linkfun(L::Link, μ)
Return η
, the value of the linear predictor for link L
at mean μ
.
Examples
julia> μ = inv(10):inv(5):1
0.1:0.2:0.9
julia> show(linkfun.(LogitLink(), μ))
[-2.19722, -0.847298, 0.0, 0.847298, 2.19722]
GLM.linkinv
— Function.linkinv(L::Link, η)
Return μ
, the mean value, for link L
at linear predictor value η
.
Examples
julia> μ = inv(10):inv(5):1; showcompact(collect(μ))
[0.1, 0.3, 0.5, 0.7, 0.9]
julia> η = logit.(μ); showcompact(η)
[-2.19722, -0.847298, 0.0, 0.847298, 2.19722]
julia> showcompact(linkinv.(LogitLink(), η))
[0.1, 0.3, 0.5, 0.7, 0.9]
GLM.mueta
— Function.mueta(L::Link, η)
Return the derivative of linkinv
, dμ/dη
, for link L
at linear predictor value η
.
Examples
julia> showcompact(mueta(LogitLink(), 0.0))
0.25
julia> showcompact(mueta(CloglogLink(), 0.0))
0.367879
julia> showcompact(mueta(LogLink(), 2.0))
7.38906
GLM.inverselink
— Function.inverselink(L::Link, η)
Return a 3-tuple of the inverse link, the derivative of the inverse link, and when appropriate, the variance function μ*(1 - μ)
.
The variance function is returned as NaN unless the range of μ is (0, 1)
Examples
julia> showcompact(inverselink(LogitLink(), 0.0))
(0.5, 0.25, 0.25)
julia> showcompact(inverselink(CloglogLink(), 0.0))
(0.632121, 0.367879, 0.232544)
julia> showcompact(inverselink(LogLink(), 2.0))
(7.38906, 7.38906, NaN)
GLM.canonicallink
— Function.canonicallink(D::Distribution)
Return the canonical link for distribution D
, which must be in the exponential family.
Examples
julia> canonicallink(Bernoulli())
GLM.LogitLink()
GLM.glmvar
— Function.glmvar(D::Distribution, μ)
Return the value of the variance function for D
at μ
The variance of D
at μ
is the product of the dispersion parameter, ϕ, which does not depend on μ
and the value of glmvar
. In other words glmvar
returns the factor of the variance that depends on μ
.
Examples
julia> μ = inv(6):inv(3):1; showcompact(collect(μ))
[0.166667, 0.5, 0.833333]
julia> showcompact(glmvar.(Normal(), μ)) # constant for Normal()
[1.0, 1.0, 1.0]
julia> showcompact(glmvar.(Bernoulli(), μ)) # μ * (1 - μ) for Bernoulli()
[0.138889, 0.25, 0.138889]
julia> showcompact(glmvar.(Poisson(), μ)) # μ for Poisson()
[0.166667, 0.5, 0.833333]
GLM.mustart
— Function.mustart(D::Distribution, y, wt)
Return a starting value for μ.
For some distributions it is appropriate to set μ = y
to initialize the IRLS algorithm but for others, notably the Bernoulli, the values of y
are not allowed as values of μ
and must be modified.
Examples
julia> showcompact(mustart(Bernoulli(), 0.0, 1))
0.25
julia> showcompact(mustart(Bernoulli(), 1.0, 1))
0.75
julia> showcompact(mustart(Binomial(), 0.0, 10))
0.0454545
julia> showcompact(mustart(Normal(), 0.0, 1))
0.0
GLM.devresid
— Function.devresid(D, y, μ)
Return the squared deviance residual of μ
from y
for distribution D
The deviance of a GLM can be evaluated as the sum of the squared deviance residuals. This is the principal use for these values. The actual deviance residual, say for plotting, is the signed square root of this value
sign(y - μ) * sqrt(devresid(D, y, μ))
Examples
julia> showcompact(devresid(Normal(), 0, 0.25)) # abs2(y - μ)
0.0625
julia> showcompact(devresid(Bernoulli(), 1, 0.75)) # -2log(μ) when y == 1
0.575364
julia> showcompact(devresid(Bernoulli(), 0, 0.25)) # -2log1p(-μ) = -2log(1-μ) when y == 0
0.575364
GLM.dispersion_parameter
— Function.dispersion_parameter(D) # not exported
Does distribution D
have a separate dispersion parameter, ϕ?
Returns false
for the Bernoulli
, Binomial
and Poisson
distributions, true
otherwise.
Examples
julia> show(GLM.dispersion_parameter(Normal()))
true
julia> show(GLM.dispersion_parameter(Bernoulli()))
false
GLM.loglik_obs
— Function.loglik_obs(D, y, μ, wt, ϕ) # not exported
Returns wt * logpdf(D(μ, ϕ), y)
where the parameters of D
are derived from μ
and ϕ
.
The wt
argument is a multiplier of the result except in the case of the Binomial
where wt
is the number of trials and μ
is the proportion of successes.
The loglikelihood of a fitted model is the sum of these values over all the observations.