Lasso paths

StatsAPI.fitMethod
fit(LassoPath, X, y, d=Normal(), l=canonicallink(d); ...)

fits a linear or generalized linear Lasso path given the design matrix X and response y:

$\underset{\beta,b_0}{\operatorname{argmin}} -\frac{1}{N} \mathcal{L}(y|X,\beta,b_0) + \lambda\left[(1-\alpha)\frac{1}{2}\|\beta\|_2^2 + \alpha\|\beta\|_1\right]$

where $0 \le \alpha \le 1$ sets the balance between ridge ($\alpha = 0$) and lasso ($\alpha = 1$) regression, and $N$ is the number of rows of $X$. The optional argument d specifies the conditional distribution of response, while l specifies the link function. Lasso.jl inherits supported distributions and link functions from GLM.jl. The default is to fit an linear Lasso path, i.e., d=Normal(), l=IdentityLink(), or $\mathcal{L}(y|X,\beta) = -\frac{1}{2}\|y - X\beta - b_0\|_2^2 + C$

Examples

fit(LassoPath, X, y)    # L1-regularized linear regression
fit(LassoPath, X, y, Binomial(), Logit();
    α=0.5) # Binomial logit regression with an Elastic net combination of
           # 0.5 L1 and 0.5 L2 regularization penalties

Arguments

  • wts=ones(length(y)): Weights for each observation
  • offset=zeros(length(y)): Offset of each observation
  • λ: can be used to specify a specific set of λ values at which models are fit. You can pass an AbstractVector of explicit values for λ, or a function λfunc(λmax) returning such values, where λmax will be the smallest λ value yielding a null model. If λ is unspecified, Lasso.jl selects logarithmically spaced λ values from λmax to λminratio * λmax.
  • α=1: Value between 0 and 1 controlling the balance between ridge ($\alpha = 0$) and lasso ($\alpha = 1$) regression. α cannot be set to 0 if λ was not specified , though it may be set to 1.
  • nλ=100 number of λ values to use
  • λminratio=1e-4 if more observations than predictors otherwise 0.001.
  • stopearly=true: When true, if the proportion of deviance explained exceeds 0.999 or the difference between the deviance explained by successive λ values falls below 1e-5, the path stops early.
  • standardize=true: Whether to standardize predictors to unit standard deviation before fitting.
  • intercept=true: Whether to fit an (unpenalized) model intercept $b_0$. If false, $b_0=0$.
  • algorithm: Algorithm to use. NaiveCoordinateDescent iteratively computes the dot product of the predictors with the residuals, as opposed to the CovarianceCoordinateDescent algorithm, which uses a precomputed Gram matrix. NaiveCoordinateDescent is typically faster when there are many predictors that will not enter the model or when fitting generalized linear models. By default uses NaiveCoordinateDescent if more than 5x as many predictors as observations or model is a GLM. CovarianceCoordinateDescent otherwise.
  • randomize=true: Whether to randomize the order in which coefficients are updated by coordinate descent. This can drastically speed convergence if coefficients are highly correlated.
  • rng=RNG_DEFAULT: Random number generator to be used for coefficient iteration.
  • maxncoef=min(size(X, 2), 2*size(X, 1)): maximum number of coefficients allowed in the model. If exceeded, an error will be thrown.
  • dofit=true: Whether to fit the model upon construction. If false, the model can be fit later by calling fit!(model).
  • cd_maxiter=100_000: The maximum number of coordinate descent iterations.
  • cd_tol=1e-7: The tolerance for coordinate descent iterations iterations in the inner loop.
  • irls_maxiter=30: Maximum number of iterations in the iteratively reweighted least squares loop. This is ignored unless the model is a generalized linear model.
  • irls_tol=1e-7: The tolerance for outer iteratively reweighted least squares iterations. This is ignored unless the model is a generalized linear model.
  • criterion=:coef Convergence criterion. Controls how cd_tol and irls_tol are to be interpreted. Possible values are:
    • :coef: The model is considered to have converged if the the maximum absolute squared difference in coefficients between successive iterations drops below the specified tolerance. This is the criterion used by glmnet.
    • :obj: The model is considered to have converged if the the relative change in the Lasso/Elastic Net objective between successive iterations drops below the specified tolerance. This is the criterion used by GLM.jl.
  • minStepFac=0.001: The minimum step fraction for backtracking line search.
  • penalty_factor=ones(size(X, 2)): Separate penalty factor $\omega_j$ for each coefficient $j$, i.e. instead of $\lambda$ penalties become $\lambda\omega_j$. Note the penalty factors are internally rescaled to sum to the number of variables (glmnet.R convention).
  • standardizeω=true: Whether to scale penalty factors to sum to the number of variables (glmnet.R convention).
source

Returned objects

fit returns a RegularizationPath object describing the fit coefficients and values of λ along the path. The following fields are intended for external use:

  • λ: Vector of λ values corresponding to each fit model along the path
  • coefs: SparseMatrixCSC of model coefficients. Columns correspond to fit models; rows correspond to predictors
  • b0: Vector of model intercepts for each fit model
  • pct_dev: Vector of proportion of deviance explained values for each fit model
  • nulldev: The deviance of the null model (including the intercept, if specified)
  • nullb0: The intercept of the null model, or 0 if no intercept was fit
  • niter: Total number of coordinate descent iterations required to fit all models

For details of the algorithm, see Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1.

Gamma paths

StatsAPI.fitMethod
fit(GammaLassoPath, X, y, d=Normal(), l=canonicallink(d); ...)

fits a linear or generalized linear (concave) gamma lasso path given the design matrix X and response y.

See also fit(LassoPath...) for a full list of arguments

source

Using the model

Lasso adheres to most of the StatsBase interface, so coef and predict should work as expected, except that a particular segment of the path would need to be selected.

StatsAPI.coefFunction
coef(path::RegularizationPath; kwargs...)

Coefficient vector for a selected segment of a regularization path.

Examples

coef(path; select=MinBIC())     # BIC minimizing segment
coef(path; select=AllSeg())     # Array with entire path's coefficents
source
coef(path::RegularizationPath, select::SegSelect)

Coefficient vector for a selected segment of a regularization path.

Examples

coef(path, MinBIC())     # BIC minimizing segment
coef(path, AllSeg())     # Array with entire path's coefficents
source
StatsAPI.predictFunction

predict(path::RegularizationPath, newX::AbstractMatrix; kwargs...)

Predicted values for a selected segment of a regularization path.

Examples

predict(path, newX; select=MinBIC())     # predict using BIC minimizing segment
source

predicted values for data used to estimate path

source

predict(m::RegularizedModel, newX::AbstractMatrix; kwargs...)

Predicted values using a selected segment of a regularization path.

Examples

m = fit(LassoModel, X, y; select=MinBIC())
predict(m, newX)     # predict using BIC minimizing segment
source
StatsAPI.devianceFunction

deviance at each segment of the path for the fitted model and data

source

deviance at each segement of the path for (potentially new) data X and y select=AllSeg() or MinAICc() like in coef()

source

deviance at each segment of the path for (potentially new) y and predicted values μ

source
StatsAPI.dofFunction
dof(path::RegularizationPath)

Approximates the degrees-of-freedom in each segment of the path as the number of non zero coefficients plus a dispersion parameter when appropriate. Note that for GammaLassoPath this may be a crude approximation, as gamlr does this differently.

source
Base.sizeFunction

size(path) returns (p,nλ) where p is the number of coefficients (including any intercept) and nλ is the number of path segments. If model was only initialized but not fit, returns (p,1).

source

Segment selectors

Lasso.segselectFunction

Index of the selected RegularizationPath segment

source

Index of the selected RegularizationPath segment

source
Lasso.MinAICcType

Selects the RegularizationPath segment with the minimum corrected AIC

source
Lasso.MinCVmseType

Selects the RegularizationPath segment with the minimum cross-validation mse

source
Lasso.MinCV1seType

Selects the RegularizationPath segment with the largest λt with mean OOS deviance no more than one standard error away from minimum

source
Lasso.AllSegType

A RegularizationPath segment selector that returns all segments

source

Lasso model fitting

Often one wishes to both fit the path and select a particular segment. This can be done with fit(RegularizedModel,...), which returns a fitted RegularizedModel wrapping a GLM representation of the selected model.

For example, if we want to fit a LassoPath and select its segment that minimizes 2-fold cross-validation mean squared error, we can do it in one step as follows:

julia> using DataFrames, Lasso, MLBase, Random

julia> Random.seed!(124); # because CV folds are random

julia> data = DataFrame(X=[1,2,3], Y=[2,4,7])
3×2 DataFrames.DataFrame
 Row │ X      Y
     │ Int64  Int64
─────┼──────────────
   1 │     1      2
   2 │     2      4
   3 │     3      7

julia> m = fit(LassoModel, @formula(Y ~ X), data; select=MinCVmse(Kfold(3,2)))
LassoModel using MinCVmse(Kfold([3, 1, 2], 2, 1.5)) segment of the regularization path.

Coefficients:
────────────
    Estimate
────────────
x1   4.33333
x2   0.0    
────────────

julia> coef(m)
2-element Array{Float64,1}:
 4.333333333333333
 0.0              
Lasso.selectmodelFunction
selectmodel(path::RegularizationPath, select::SegSelect)

Returns a LinearModel or GeneralizedLinearModel representing the selected segment of a regularization path.

Examples

selectmodel(path, MinBIC())            # BIC minimizing model
selectmodel(path, MinCVmse(path, 5))   # 5-fold CV mse minimizing model
source
StatsAPI.fitMethod
fit(RegularizedModel, X, y, dist, link; <kwargs>)

Returns a LinearModel or GeneralizedLinearModel representing the selected segment of a regularization path.

Examples

fit(LassoModel, X, y; select=MinBIC()) # BIC minimizing LinearModel
fit(LassoModel, X, y, Binomial(), Logit();
    select=MinCVmse(path, 5)) # 5-fold CV mse minimizing model

Arguments

  • select::SegSelect=MinAICc(): segment selector.
  • wts=ones(length(y)): Weights for each observation
  • offset=zeros(length(y)): Offset of each observation
  • λ: can be used to specify a specific set of λ values at which models are fit. If λ is unspecified, Lasso.jl selects nλ logarithmically spaced λ values from λmax, the smallest λ value yielding a null model, to λminratio * λmax.
  • nλ=100 number of λ values to use
  • λminratio=1e-4 if more observations than predictors otherwise 0.001.
  • stopearly=true: When true, if the proportion of deviance explained exceeds 0.999 or the difference between the deviance explained by successive λ values falls below 1e-5, the path stops early.
  • standardize=true: Whether to standardize predictors to unit standard deviation before fitting.
  • intercept=true: Whether to fit an (unpenalized) model intercept.
  • algorithm: Algorithm to use. NaiveCoordinateDescent iteratively computes the dot product of the predictors with the residuals, as opposed to the CovarianceCoordinateDescent algorithm, which uses a precomputed Gram matrix. NaiveCoordinateDescent is typically faster when there are many predictors that will not enter the model or when fitting generalized linear models. By default uses NaiveCoordinateDescent if more than 5x as many predictors as observations or model is a GLM. CovarianceCoordinateDescent otherwise.
  • randomize=true: Whether to randomize the order in which coefficients are updated by coordinate descent. This can drastically speed convergence if coefficients are highly correlated.
  • maxncoef=min(size(X, 2), 2*size(X, 1)): maximum number of coefficients allowed in the model. If exceeded, an error will be thrown.
  • dofit=true: Whether to fit the model upon construction. If false, the model can be fit later by calling fit!(model).
  • cd_tol=1e-7: The tolerance for coordinate descent iterations iterations in the inner loop.
  • irls_tol=1e-7: The tolerance for outer iteratively reweighted least squares iterations. This is ignored unless the model is a generalized linear model.
  • criterion=:coef Convergence criterion. Controls how cd_tol and irls_tol are to be interpreted. Possible values are:
    • :coef: The model is considered to have converged if the the maximum absolute squared difference in coefficients between successive iterations drops below the specified tolerance. This is the criterion used by glmnet.
    • :obj: The model is considered to have converged if the the relative change in the Lasso/Elastic Net objective between successive iterations drops below the specified tolerance. This is the criterion used by GLM.jl.
  • minStepFac=0.001: The minimum step fraction for backtracking line search.
  • penalty_factor=ones(size(X, 2)): Separate penalty factor $\omega_j$ for each coefficient $j$, i.e. instead of $\lambda$ penalties become $\lambda\omega_j$. Note the penalty factors are internally rescaled to sum to the number of variables (glmnet.R convention).
  • standardizeω=true: Whether to scale penalty factors to sum to the number of variables (glmnet.R convention).

See also fit(::Type{LassoPath}, ::Matrix{Float64}, ::Vector{Float64}) for a more complete list of arguments

source