Lasso paths
StatsAPI.fit — Methodfit(LassoPath, X, y, d=Normal(), l=canonicallink(d); ...)fits a linear or generalized linear Lasso path given the design matrix X and response y:
$\underset{\beta,b_0}{\operatorname{argmin}} -\frac{1}{N} \mathcal{L}(y|X,\beta,b_0) + \lambda\left[(1-\alpha)\frac{1}{2}\|\beta\|_2^2 + \alpha\|\beta\|_1\right]$
where $0 \le \alpha \le 1$ sets the balance between ridge ($\alpha = 0$) and lasso ($\alpha = 1$) regression, and $N$ is the number of rows of $X$. The optional argument d specifies the conditional distribution of response, while l specifies the link function. Lasso.jl inherits supported distributions and link functions from GLM.jl. The default is to fit an linear Lasso path, i.e., d=Normal(), l=IdentityLink(), or $\mathcal{L}(y|X,\beta) = -\frac{1}{2}\|y - X\beta - b_0\|_2^2 + C$
Examples
fit(LassoPath, X, y) # L1-regularized linear regression
fit(LassoPath, X, y, Binomial(), Logit();
α=0.5) # Binomial logit regression with an Elastic net combination of
# 0.5 L1 and 0.5 L2 regularization penaltiesArguments
wts=ones(length(y)): Weights for each observationoffset=zeros(length(y)): Offset of each observationλ: can be used to specify a specific set of λ values at which models are fit. You can pass anAbstractVectorof explicit values forλ, or a functionλfunc(λmax)returning such values, whereλmaxwill be the smallestλvalue yielding a null model. Ifλis unspecified, Lasso.jl selectsnλlogarithmically spacedλvalues fromλmaxtoλminratio * λmax.α=1: Value between 0 and 1 controlling the balance between ridge ($\alpha = 0$) and lasso ($\alpha = 1$) regression.αcannot be set to 0 ifλwas not specified , though it may be set to 1.nλ=100number ofλvalues to useλminratio=1e-4if more observations than predictors otherwise 0.001.stopearly=true: Whentrue, if the proportion of deviance explained exceeds 0.999 or the difference between the deviance explained by successive λ values falls below1e-5, the path stops early.standardize=true: Whether to standardize predictors to unit standard deviation before fitting.intercept=true: Whether to fit an (unpenalized) model intercept $b_0$. If false, $b_0=0$.algorithm: Algorithm to use.NaiveCoordinateDescentiteratively computes the dot product of the predictors with the residuals, as opposed to theCovarianceCoordinateDescentalgorithm, which uses a precomputed Gram matrix.NaiveCoordinateDescentis typically faster when there are many predictors that will not enter the model or when fitting generalized linear models. By default usesNaiveCoordinateDescentif more than 5x as many predictors as observations or model is a GLM.CovarianceCoordinateDescentotherwise.randomize=true: Whether to randomize the order in which coefficients are updated by coordinate descent. This can drastically speed convergence if coefficients are highly correlated.rng=RNG_DEFAULT: Random number generator to be used for coefficient iteration.maxncoef=min(size(X, 2), 2*size(X, 1)): maximum number of coefficients allowed in the model. If exceeded, an error will be thrown.dofit=true: Whether to fit the model upon construction. Iffalse, the model can be fit later by callingfit!(model).cd_maxiter=100_000: The maximum number of coordinate descent iterations.cd_tol=1e-7: The tolerance for coordinate descent iterations iterations in the inner loop.irls_maxiter=30: Maximum number of iterations in the iteratively reweighted least squares loop. This is ignored unless the model is a generalized linear model.irls_tol=1e-7: The tolerance for outer iteratively reweighted least squares iterations. This is ignored unless the model is a generalized linear model.criterion=:coefConvergence criterion. Controls howcd_tolandirls_tolare to be interpreted. Possible values are::coef: The model is considered to have converged if the the maximum absolute squared difference in coefficients between successive iterations drops below the specified tolerance. This is the criterion used by glmnet.:obj: The model is considered to have converged if the the relative change in the Lasso/Elastic Net objective between successive iterations drops below the specified tolerance. This is the criterion used by GLM.jl.
minStepFac=0.001: The minimum step fraction for backtracking line search.penalty_factor=ones(size(X, 2)): Separate penalty factor $\omega_j$ for each coefficient $j$, i.e. instead of $\lambda$ penalties become $\lambda\omega_j$. Note the penalty factors are internally rescaled to sum to the number of variables (glmnet.Rconvention).standardizeω=true: Whether to scale penalty factors to sum to the number of variables (glmnet.R convention).
Returned objects
fit returns a RegularizationPath object describing the fit coefficients and values of λ along the path. The following fields are intended for external use:
λ: Vector of λ values corresponding to each fit model along the pathcoefs: SparseMatrixCSC of model coefficients. Columns correspond to fit models; rows correspond to predictorsb0: Vector of model intercepts for each fit modelpct_dev: Vector of proportion of deviance explained values for each fit modelnulldev: The deviance of the null model (including the intercept, if specified)nullb0: The intercept of the null model, or 0 if no intercept was fitniter: Total number of coordinate descent iterations required to fit all models
For details of the algorithm, see Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1.
Gamma paths
StatsAPI.fit — Methodfit(GammaLassoPath, X, y, d=Normal(), l=canonicallink(d); ...)fits a linear or generalized linear (concave) gamma lasso path given the design matrix X and response y.
See also fit(LassoPath...) for a full list of arguments
Using the model
Lasso adheres to most of the StatsBase interface, so coef and predict should work as expected, except that a particular segment of the path would need to be selected.
StatsAPI.coef — Functioncoef(path::RegularizationPath; kwargs...)Coefficient vector for a selected segment of a regularization path.
Examples
coef(path; select=MinBIC()) # BIC minimizing segment
coef(path; select=AllSeg()) # Array with entire path's coefficentscoef(path::RegularizationPath, select::SegSelect)Coefficient vector for a selected segment of a regularization path.
Examples
coef(path, MinBIC()) # BIC minimizing segment
coef(path, AllSeg()) # Array with entire path's coefficentsStatsAPI.predict — Functionpredict(path::RegularizationPath, newX::AbstractMatrix; kwargs...)
Predicted values for a selected segment of a regularization path.
Examples
predict(path, newX; select=MinBIC()) # predict using BIC minimizing segmentpredicted values for data used to estimate path
predict(m::RegularizedModel, newX::AbstractMatrix; kwargs...)
Predicted values using a selected segment of a regularization path.
Examples
m = fit(LassoModel, X, y; select=MinBIC())
predict(m, newX) # predict using BIC minimizing segmentStatsAPI.deviance — Functiondeviance at each segment of the path for the fitted model and data
deviance at each segement of the path for (potentially new) data X and y select=AllSeg() or MinAICc() like in coef()
deviance at each segment of the path for (potentially new) y and predicted values μ
StatsAPI.dof — Functiondof(path::RegularizationPath)Approximates the degrees-of-freedom in each segment of the path as the number of non zero coefficients plus a dispersion parameter when appropriate. Note that for GammaLassoPath this may be a crude approximation, as gamlr does this differently.
Base.size — Functionsize(path) returns (p,nλ) where p is the number of coefficients (including any intercept) and nλ is the number of path segments. If model was only initialized but not fit, returns (p,1).
Segment selectors
Lasso.SegSelect — TypeRegularizationPath segment selector supertype
Lasso.segselect — FunctionIndex of the selected RegularizationPath segment
Index of the selected RegularizationPath segment
Lasso.MinAIC — TypeSelects the RegularizationPath segment with the minimum AIC
Lasso.MinAICc — TypeSelects the RegularizationPath segment with the minimum corrected AIC
Lasso.MinBIC — TypeSelects the RegularizationPath segment with the minimum BIC
Lasso.CVSegSelect — TypeRegularizationPath segment selector supertype
Lasso.MinCVmse — TypeSelects the RegularizationPath segment with the minimum cross-validation mse
Lasso.MinCV1se — TypeSelects the RegularizationPath segment with the largest λt with mean OOS deviance no more than one standard error away from minimum
Lasso.AllSeg — TypeA RegularizationPath segment selector that returns all segments
Lasso model fitting
Often one wishes to both fit the path and select a particular segment. This can be done with fit(RegularizedModel,...), which returns a fitted RegularizedModel wrapping a GLM representation of the selected model.
For example, if we want to fit a LassoPath and select its segment that minimizes 2-fold cross-validation mean squared error, we can do it in one step as follows:
julia> using DataFrames, Lasso, MLBase, Random
julia> Random.seed!(124); # because CV folds are random
julia> data = DataFrame(X=[1,2,3], Y=[2,4,7])
3×2 DataFrames.DataFrame
Row │ X Y
│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 2 4
3 │ 3 7
julia> m = fit(LassoModel, @formula(Y ~ X), data; select=MinCVmse(Kfold(3,2)))
LassoModel using MinCVmse(Kfold([3, 1, 2], 2, 1.5)) segment of the regularization path.
Coefficients:
────────────
Estimate
────────────
x1 4.33333
x2 0.0
────────────
julia> coef(m)
2-element Array{Float64,1}:
4.333333333333333
0.0
Lasso.RegularizedModel — TypeA RegularizedModel represents a selected segment from a RegularizationPath
Lasso.LassoModel — TypeLassoModel represents a selected segment from a LassoPath
Lasso.GammaLassoModel — TypeGammaLassoModel represents a selected segment from a GammaLassoPath
Lasso.selectmodel — Functionselectmodel(path::RegularizationPath, select::SegSelect)Returns a LinearModel or GeneralizedLinearModel representing the selected segment of a regularization path.
Examples
selectmodel(path, MinBIC()) # BIC minimizing model
selectmodel(path, MinCVmse(path, 5)) # 5-fold CV mse minimizing modelStatsAPI.fit — Methodfit(RegularizedModel, X, y, dist, link; <kwargs>)Returns a LinearModel or GeneralizedLinearModel representing the selected segment of a regularization path.
Examples
fit(LassoModel, X, y; select=MinBIC()) # BIC minimizing LinearModel
fit(LassoModel, X, y, Binomial(), Logit();
select=MinCVmse(path, 5)) # 5-fold CV mse minimizing modelArguments
select::SegSelect=MinAICc(): segment selector.wts=ones(length(y)): Weights for each observationoffset=zeros(length(y)): Offset of each observationλ: can be used to specify a specific set of λ values at which models are fit. If λ is unspecified, Lasso.jl selects nλ logarithmically spaced λ values fromλmax, the smallest λ value yielding a null model, toλminratio * λmax.nλ=100number of λ values to useλminratio=1e-4if more observations than predictors otherwise 0.001.stopearly=true: Whentrue, if the proportion of deviance explained exceeds 0.999 or the difference between the deviance explained by successive λ values falls below1e-5, the path stops early.standardize=true: Whether to standardize predictors to unit standard deviation before fitting.intercept=true: Whether to fit an (unpenalized) model intercept.algorithm: Algorithm to use.NaiveCoordinateDescentiteratively computes the dot product of the predictors with the residuals, as opposed to theCovarianceCoordinateDescentalgorithm, which uses a precomputed Gram matrix.NaiveCoordinateDescentis typically faster when there are many predictors that will not enter the model or when fitting generalized linear models. By default usesNaiveCoordinateDescentif more than 5x as many predictors as observations or model is a GLM.CovarianceCoordinateDescentotherwise.randomize=true: Whether to randomize the order in which coefficients are updated by coordinate descent. This can drastically speed convergence if coefficients are highly correlated.maxncoef=min(size(X, 2), 2*size(X, 1)): maximum number of coefficients allowed in the model. If exceeded, an error will be thrown.dofit=true: Whether to fit the model upon construction. Iffalse, the model can be fit later by callingfit!(model).cd_tol=1e-7: The tolerance for coordinate descent iterations iterations in the inner loop.irls_tol=1e-7: The tolerance for outer iteratively reweighted least squares iterations. This is ignored unless the model is a generalized linear model.criterion=:coefConvergence criterion. Controls howcd_tolandirls_tolare to be interpreted. Possible values are::coef: The model is considered to have converged if the the maximum absolute squared difference in coefficients between successive iterations drops below the specified tolerance. This is the criterion used by glmnet.:obj: The model is considered to have converged if the the relative change in the Lasso/Elastic Net objective between successive iterations drops below the specified tolerance. This is the criterion used by GLM.jl.
minStepFac=0.001: The minimum step fraction for backtracking line search.penalty_factor=ones(size(X, 2)): Separate penalty factor $\omega_j$ for each coefficient $j$, i.e. instead of $\lambda$ penalties become $\lambda\omega_j$. Note the penalty factors are internally rescaled to sum to the number of variables (glmnet.Rconvention).standardizeω=true: Whether to scale penalty factors to sum to the number of variables (glmnet.R convention).
See also fit(::Type{LassoPath}, ::Matrix{Float64}, ::Vector{Float64}) for a more complete list of arguments