Lasso paths
StatsAPI.fit
— Methodfit(LassoPath, X, y, d=Normal(), l=canonicallink(d); ...)
fits a linear or generalized linear Lasso path given the design matrix X
and response y
:
$\underset{\beta,b_0}{\operatorname{argmin}} -\frac{1}{N} \mathcal{L}(y|X,\beta,b_0) + \lambda\left[(1-\alpha)\frac{1}{2}\|\beta\|_2^2 + \alpha\|\beta\|_1\right]$
where $0 \le \alpha \le 1$ sets the balance between ridge ($\alpha = 0$) and lasso ($\alpha = 1$) regression, and $N$ is the number of rows of $X$. The optional argument d
specifies the conditional distribution of response, while l
specifies the link function. Lasso.jl inherits supported distributions and link functions from GLM.jl. The default is to fit an linear Lasso path, i.e., d=Normal(), l=IdentityLink()
, or $\mathcal{L}(y|X,\beta) = -\frac{1}{2}\|y - X\beta - b_0\|_2^2 + C$
Examples
fit(LassoPath, X, y) # L1-regularized linear regression
fit(LassoPath, X, y, Binomial(), Logit();
α=0.5) # Binomial logit regression with an Elastic net combination of
# 0.5 L1 and 0.5 L2 regularization penalties
Arguments
wts=ones(length(y))
: Weights for each observationoffset=zeros(length(y))
: Offset of each observationλ
: can be used to specify a specific set of λ values at which models are fit. You can pass anAbstractVector
of explicit values forλ
, or a functionλfunc(λmax)
returning such values, whereλmax
will be the smallestλ
value yielding a null model. Ifλ
is unspecified, Lasso.jl selectsnλ
logarithmically spacedλ
values fromλmax
toλminratio * λmax
.α=1
: Value between 0 and 1 controlling the balance between ridge ($\alpha = 0$) and lasso ($\alpha = 1$) regression.α
cannot be set to 0 ifλ
was not specified , though it may be set to 1.nλ=100
number ofλ
values to useλminratio=1e-4
if more observations than predictors otherwise 0.001.stopearly=true
: Whentrue
, if the proportion of deviance explained exceeds 0.999 or the difference between the deviance explained by successive λ values falls below1e-5
, the path stops early.standardize=true
: Whether to standardize predictors to unit standard deviation before fitting.intercept=true
: Whether to fit an (unpenalized) model intercept $b_0$. If false, $b_0=0$.algorithm
: Algorithm to use.NaiveCoordinateDescent
iteratively computes the dot product of the predictors with the residuals, as opposed to theCovarianceCoordinateDescent
algorithm, which uses a precomputed Gram matrix.NaiveCoordinateDescent
is typically faster when there are many predictors that will not enter the model or when fitting generalized linear models. By default usesNaiveCoordinateDescent
if more than 5x as many predictors as observations or model is a GLM.CovarianceCoordinateDescent
otherwise.randomize=true
: Whether to randomize the order in which coefficients are updated by coordinate descent. This can drastically speed convergence if coefficients are highly correlated.rng=RNG_DEFAULT
: Random number generator to be used for coefficient iteration.maxncoef=min(size(X, 2), 2*size(X, 1))
: maximum number of coefficients allowed in the model. If exceeded, an error will be thrown.dofit=true
: Whether to fit the model upon construction. Iffalse
, the model can be fit later by callingfit!(model)
.cd_maxiter=100_000
: The maximum number of coordinate descent iterations.cd_tol=1e-7
: The tolerance for coordinate descent iterations iterations in the inner loop.irls_maxiter=30
: Maximum number of iterations in the iteratively reweighted least squares loop. This is ignored unless the model is a generalized linear model.irls_tol=1e-7
: The tolerance for outer iteratively reweighted least squares iterations. This is ignored unless the model is a generalized linear model.criterion=:coef
Convergence criterion. Controls howcd_tol
andirls_tol
are to be interpreted. Possible values are::coef
: The model is considered to have converged if the the maximum absolute squared difference in coefficients between successive iterations drops below the specified tolerance. This is the criterion used by glmnet.:obj
: The model is considered to have converged if the the relative change in the Lasso/Elastic Net objective between successive iterations drops below the specified tolerance. This is the criterion used by GLM.jl.
minStepFac=0.001
: The minimum step fraction for backtracking line search.penalty_factor=ones(size(X, 2))
: Separate penalty factor $\omega_j$ for each coefficient $j$, i.e. instead of $\lambda$ penalties become $\lambda\omega_j$. Note the penalty factors are internally rescaled to sum to the number of variables (glmnet.R
convention).standardizeω=true
: Whether to scale penalty factors to sum to the number of variables (glmnet.R convention).
Returned objects
fit
returns a RegularizationPath
object describing the fit coefficients and values of λ along the path. The following fields are intended for external use:
λ
: Vector of λ values corresponding to each fit model along the pathcoefs
: SparseMatrixCSC of model coefficients. Columns correspond to fit models; rows correspond to predictorsb0
: Vector of model intercepts for each fit modelpct_dev
: Vector of proportion of deviance explained values for each fit modelnulldev
: The deviance of the null model (including the intercept, if specified)nullb0
: The intercept of the null model, or 0 if no intercept was fitniter
: Total number of coordinate descent iterations required to fit all models
For details of the algorithm, see Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1.
Gamma paths
StatsAPI.fit
— Methodfit(GammaLassoPath, X, y, d=Normal(), l=canonicallink(d); ...)
fits a linear or generalized linear (concave) gamma lasso path given the design matrix X
and response y
.
See also fit(LassoPath...)
for a full list of arguments
Using the model
Lasso adheres to most of the StatsBase
interface, so coef
and predict
should work as expected, except that a particular segment of the path would need to be selected.
StatsAPI.coef
— Functioncoef(path::RegularizationPath; kwargs...)
Coefficient vector for a selected segment of a regularization path.
Examples
coef(path; select=MinBIC()) # BIC minimizing segment
coef(path; select=AllSeg()) # Array with entire path's coefficents
coef(path::RegularizationPath, select::SegSelect)
Coefficient vector for a selected segment of a regularization path.
Examples
coef(path, MinBIC()) # BIC minimizing segment
coef(path, AllSeg()) # Array with entire path's coefficents
StatsAPI.predict
— Functionpredict(path::RegularizationPath, newX::AbstractMatrix; kwargs...)
Predicted values for a selected segment of a regularization path.
Examples
predict(path, newX; select=MinBIC()) # predict using BIC minimizing segment
predicted values for data used to estimate path
predict(m::RegularizedModel, newX::AbstractMatrix; kwargs...)
Predicted values using a selected segment of a regularization path.
Examples
m = fit(LassoModel, X, y; select=MinBIC())
predict(m, newX) # predict using BIC minimizing segment
StatsAPI.deviance
— Functiondeviance at each segment of the path for the fitted model and data
deviance at each segement of the path for (potentially new) data X and y select=AllSeg() or MinAICc() like in coef()
deviance at each segment of the path for (potentially new) y and predicted values μ
StatsAPI.dof
— Functiondof(path::RegularizationPath)
Approximates the degrees-of-freedom in each segment of the path as the number of non zero coefficients plus a dispersion parameter when appropriate. Note that for GammaLassoPath this may be a crude approximation, as gamlr does this differently.
Base.size
— Functionsize(path) returns (p,nλ) where p is the number of coefficients (including any intercept) and nλ is the number of path segments. If model was only initialized but not fit, returns (p,1).
Segment selectors
Lasso.SegSelect
— TypeRegularizationPath segment selector supertype
Lasso.segselect
— FunctionIndex of the selected RegularizationPath segment
Index of the selected RegularizationPath segment
Lasso.MinAIC
— TypeSelects the RegularizationPath segment with the minimum AIC
Lasso.MinAICc
— TypeSelects the RegularizationPath segment with the minimum corrected AIC
Lasso.MinBIC
— TypeSelects the RegularizationPath segment with the minimum BIC
Lasso.CVSegSelect
— TypeRegularizationPath segment selector supertype
Lasso.MinCVmse
— TypeSelects the RegularizationPath segment with the minimum cross-validation mse
Lasso.MinCV1se
— TypeSelects the RegularizationPath segment with the largest λt with mean OOS deviance no more than one standard error away from minimum
Lasso.AllSeg
— TypeA RegularizationPath segment selector that returns all segments
Lasso model fitting
Often one wishes to both fit the path and select a particular segment. This can be done with fit(RegularizedModel,...)
, which returns a fitted RegularizedModel
wrapping a GLM
representation of the selected model.
For example, if we want to fit a LassoPath
and select its segment that minimizes 2-fold cross-validation mean squared error, we can do it in one step as follows:
julia> using DataFrames, Lasso, MLBase, Random
julia> Random.seed!(124); # because CV folds are random
julia> data = DataFrame(X=[1,2,3], Y=[2,4,7])
3×2 DataFrames.DataFrame
Row │ X Y
│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 2 4
3 │ 3 7
julia> m = fit(LassoModel, @formula(Y ~ X), data; select=MinCVmse(Kfold(3,2)))
LassoModel using MinCVmse(Kfold([3, 1, 2], 2, 1.5)) segment of the regularization path.
Coefficients:
────────────
Estimate
────────────
x1 4.33333
x2 0.0
────────────
julia> coef(m)
2-element Array{Float64,1}:
4.333333333333333
0.0
Lasso.RegularizedModel
— TypeA RegularizedModel represents a selected segment from a RegularizationPath
Lasso.LassoModel
— TypeLassoModel represents a selected segment from a LassoPath
Lasso.GammaLassoModel
— TypeGammaLassoModel represents a selected segment from a GammaLassoPath
Lasso.selectmodel
— Functionselectmodel(path::RegularizationPath, select::SegSelect)
Returns a LinearModel or GeneralizedLinearModel representing the selected segment of a regularization path.
Examples
selectmodel(path, MinBIC()) # BIC minimizing model
selectmodel(path, MinCVmse(path, 5)) # 5-fold CV mse minimizing model
StatsAPI.fit
— Methodfit(RegularizedModel, X, y, dist, link; <kwargs>)
Returns a LinearModel or GeneralizedLinearModel representing the selected segment of a regularization path.
Examples
fit(LassoModel, X, y; select=MinBIC()) # BIC minimizing LinearModel
fit(LassoModel, X, y, Binomial(), Logit();
select=MinCVmse(path, 5)) # 5-fold CV mse minimizing model
Arguments
select::SegSelect=MinAICc()
: segment selector.wts=ones(length(y))
: Weights for each observationoffset=zeros(length(y))
: Offset of each observationλ
: can be used to specify a specific set of λ values at which models are fit. If λ is unspecified, Lasso.jl selects nλ logarithmically spaced λ values fromλmax
, the smallest λ value yielding a null model, toλminratio * λmax
.nλ=100
number of λ values to useλminratio=1e-4
if more observations than predictors otherwise 0.001.stopearly=true
: Whentrue
, if the proportion of deviance explained exceeds 0.999 or the difference between the deviance explained by successive λ values falls below1e-5
, the path stops early.standardize=true
: Whether to standardize predictors to unit standard deviation before fitting.intercept=true
: Whether to fit an (unpenalized) model intercept.algorithm
: Algorithm to use.NaiveCoordinateDescent
iteratively computes the dot product of the predictors with the residuals, as opposed to theCovarianceCoordinateDescent
algorithm, which uses a precomputed Gram matrix.NaiveCoordinateDescent
is typically faster when there are many predictors that will not enter the model or when fitting generalized linear models. By default usesNaiveCoordinateDescent
if more than 5x as many predictors as observations or model is a GLM.CovarianceCoordinateDescent
otherwise.randomize=true
: Whether to randomize the order in which coefficients are updated by coordinate descent. This can drastically speed convergence if coefficients are highly correlated.maxncoef=min(size(X, 2), 2*size(X, 1))
: maximum number of coefficients allowed in the model. If exceeded, an error will be thrown.dofit=true
: Whether to fit the model upon construction. Iffalse
, the model can be fit later by callingfit!(model)
.cd_tol=1e-7
: The tolerance for coordinate descent iterations iterations in the inner loop.irls_tol=1e-7
: The tolerance for outer iteratively reweighted least squares iterations. This is ignored unless the model is a generalized linear model.criterion=:coef
Convergence criterion. Controls howcd_tol
andirls_tol
are to be interpreted. Possible values are::coef
: The model is considered to have converged if the the maximum absolute squared difference in coefficients between successive iterations drops below the specified tolerance. This is the criterion used by glmnet.:obj
: The model is considered to have converged if the the relative change in the Lasso/Elastic Net objective between successive iterations drops below the specified tolerance. This is the criterion used by GLM.jl.
minStepFac=0.001
: The minimum step fraction for backtracking line search.penalty_factor=ones(size(X, 2))
: Separate penalty factor $\omega_j$ for each coefficient $j$, i.e. instead of $\lambda$ penalties become $\lambda\omega_j$. Note the penalty factors are internally rescaled to sum to the number of variables (glmnet.R
convention).standardizeω=true
: Whether to scale penalty factors to sum to the number of variables (glmnet.R convention).
See also fit(::Type{LassoPath}, ::Matrix{Float64}, ::Vector{Float64})
for a more complete list of arguments