Regression

A quick tour of glmnet
2023-12-28
Introduction
glmnet is a package that fits generalized linear and similar models via penalized maximum likelihood. The
regularization path is computed for the lasso or elastic net penalty at a grid of values (on the log scale) for
the regularization parameter λ. The algorithm is extremely fast, and can exploit sparsity in the input matrix
x. It fits linear, logistic and multinomial, poisson, and Cox regression models. It can also fit multi-response
linear regression, generalized linear models for custom families, and relaxed lasso regression models. The
package includes methods for prediction and plotting, and functions for cross-validation.
LM-Lasso
The Lasso estimate is defined as
P
X
β̂ lasso = arg minp ∥y − Xβ∥22 + λ |βj |
β∈R
j=1
= arg minp ∥y − Xβ∥22 +λ ∥β∥1

β∈R | {z } | {z }
Loss Penalty
• The difference between the Lasso and Ridge: ∥β∥1 vs ∥β∥22 .

• Although look similar, their solutions behave very differently.
• “lasso”: Least Absolute Selection and Shrinkage Operator.
Lasso solver
Since NO closed form, we need Lasso algorithm:
• QP (JRSSB, 1996)
• Shooting (JCGS, 1998)
• LARS (AoS 2002, Boosting, FS)
• Coordinate Discent (2008~, many tricks: Warm starts, screening rules)
• ADMM, PGD, BLasso, . . .
CD algorithm 1: key idea

Gauss–Seidel method for Ax = b without matrix inverse:
10x1 −x2 +2x3 = 6,
−x1 +11x2 −x3 +3x4 = 25,
2x1 −x2 +10x3 −x4 = −11,
3x2 −x3 +8x4 = 15.
1
Suppose we choose (0, 0, 0, 0) as the initial approximation, with following iteration step,
x1 = x2 /10 − x3 /5 + 3/5,
x2 = x1 /11 + x3 /11 − 3x4 /11 + 25/11,
x3 = −x1 /5 + x2 /10 + x4 /10 − 11/10,
x4 = −3x2 /8 + x3 /8 + 15/8.
After four iterations, we have (1.00086, 2.0003, -1.00031, 0.99985). The exact solution of the system is (1, 2,
-1, 1).
CD algorithm 2 (Many tricks!)

For a given λ, compute β̂ lasso by iterations:
p
X n
X
Q(β1 , . . . , βp ; λ) = (yi − xi1 β1 − · · · − xip βp )2 + λ |βj |
i=1 j=1
1. In each iteration, for j = 1, 2, ..., p,

• fixed all β̃−j = {β̃j ′ , j ′ ̸= j}, minimize Q(βj , β̃−j ; λ);
• min Q(βj , β̃−j ; λ) is equivalent to the following one-variable Lasso problem
1
β̂j = arg min aβj2 − bβj + λ|βj |
θ 2
sign(b)
= (|b| − λ)+ .
a
2. Iterate untile ∥β̂ t+1 − β̂ t ∥2 < ξ, ξ = 10−8 .

Compute β̂ lasso (λ) for a sequnce of λ = seq(λmax , 0.005λmax , 100).
Example
library(glmnet)
## Loading required package: Matrix

## Loaded glmnet 4.1-6
Clustering
data(QuickStartExample)
x <- QuickStartExample$x
y <- QuickStartExample$y
fit <- glmnet(x, y)
plot(fit)
2
0 6 7 9
1.0
0.5
Coefficients
0.0
−1.0 −0.5
0 2 4 6
L1 Norm
print(fit)
##
## Call: glmnet(x = x, y = y)
##
## Df %Dev Lambda
## 1 0 0.00 1.63100
## 2 2 5.53 1.48600
## 3 2 14.59 1.35400
## 4 2 22.11 1.23400
## 5 2 28.36 1.12400
## 6 2 33.54 1.02400
## 7 4 39.04 0.93320
## 8 5 45.60 0.85030
## 9 5 51.54 0.77470
## 10 6 57.35 0.70590
## 11 6 62.55 0.64320
## 12 6 66.87 0.58610
## 13 6 70.46 0.53400
## 14 6 73.44 0.48660
## 15 7 76.21 0.44330
## 16 7 78.57 0.40400
## 17 7 80.53 0.36810
## 18 7 82.15 0.33540
## 19 7 83.50 0.30560
## 20 7 84.62 0.27840
## 21 7 85.55 0.25370
## 22 7 86.33 0.23120
## 23 8 87.06 0.21060
## 24 8 87.69 0.19190
## 25 8 88.21 0.17490
3
## 26 8 88.65 0.15930
## 27 8 89.01 0.14520
## 28 8 89.31 0.13230
## 29 8 89.56 0.12050
## 30 8 89.76 0.10980
## 31 9 89.94 0.10010
## 32 9 90.10 0.09117
## 33 9 90.23 0.08307
## 34 9 90.34 0.07569
## 35 10 90.43 0.06897
## 36 11 90.53 0.06284
## 37 11 90.62 0.05726
## 38 12 90.70 0.05217
## 39 15 90.78 0.04754
## 40 16 90.86 0.04331
## 41 16 90.93 0.03947
## 42 16 90.98 0.03596
## 43 17 91.03 0.03277
## 44 17 91.07 0.02985
## 45 18 91.11 0.02720
## 46 18 91.14 0.02479
## 47 19 91.17 0.02258
## 48 19 91.20 0.02058
## 49 19 91.22 0.01875
## 50 19 91.24 0.01708
## 51 19 91.25 0.01557
## 52 19 91.26 0.01418
## 53 19 91.27 0.01292
## 54 19 91.28 0.01178
## 55 19 91.29 0.01073
## 56 19 91.29 0.00978
## 57 19 91.30 0.00891
## 58 19 91.30 0.00812
## 59 19 91.31 0.00739
## 60 19 91.31 0.00674
## 61 19 91.31 0.00614
## 62 20 91.31 0.00559
## 63 20 91.31 0.00510
## 64 20 91.31 0.00464
## 65 20 91.32 0.00423
## 66 20 91.32 0.00386
## 67 20 91.32 0.00351
coef(fit, s = 0.1)
## 21 x 1 sparse Matrix of class "dgCMatrix"

## s1
## (Intercept) 0.150928072
## V1 1.320597195
## V2 .
## V3 0.675110234
## V4 .
## V5 -0.817411518
## V6 0.521436671
## V7 0.004829335
4
## V8 0.319415917
## V9 .
## V10 .
## V11 0.142498519
## V12 .
## V13 .
## V14 -1.059978702
## V15 .
## V16 .
## V17 .
## V18 .
## V19 .
## V20 -1.021873704
Prediction with some λ

set.seed(29)
nx <- matrix(rnorm(5 * 20), 5, 20)
predict(fit, newx = nx, s = c(0.1, 0.05))
## s1 s2
## [1,] -4.3067990 -4.5979456
## [2,] -4.1244091 -4.3447727
## [3,] -0.1133939 -0.1859237
## [4,] 3.3458748 3.5270269
## [5,] -1.2366422 -1.2772955
Prediction with the optimal λ

cvfit <- cv.glmnet(x, y)
plot(cvfit)
5
20 20 19 19 19 16 11 9 8 8 7 7 6 5 2 0
8
Mean−Squared Error
6
4
2
−5 −4 −3 −2 −1 0
Log(λ)
cvfit$lambda.min
## [1] 0.06284188
coef(cvfit, s = "lambda.min")
## 21 x 1 sparse Matrix of class "dgCMatrix"

## s1
## (Intercept) 0.145832036
## V1 1.340981414
## V2 .
## V3 0.708347140
## V4 .
## V5 -0.848087765
## V6 0.554823782
## V7 0.038519738
## V8 0.347221319
## V9 .
## V10 0.010034050
## V11 0.186365265
## V12 .
## V13 .
## V14 -1.086006902
## V15 -0.004444319
## V16 .
## V17 .
## V18 .
## V19 .
## V20 -1.069942845
6
predict(cvfit, newx = x[1:5,], s = "lambda.min")
## lambda.min
## [1,] -1.3574653
## [2,] 2.5776672
## [3,] 0.5846421
## [4,] 2.0280562
## [5,] 1.5780633
logistic regression
data(BinomialExample)
x <- BinomialExample$x
y <- BinomialExample$y
fit <- glmnet(x, y, family = "binomial")
predict(fit, newx = x[1:5,], type = "class", s = c(0.05, 0.01))
## s1 s2
## [1,] "0" "0"
## [2,] "1" "1"
## [3,] "1" "1"
## [4,] "0" "0"
## [5,] "1" "1"
Acknowledgement: This lecture note contains material adapted from https://glmnet.stanford.edu/articles
/glmnet.html.

Regression

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression

Uploaded by

Copyright:

Available Formats

A quick tour of glmnet

= arg minp ∥y − Xβ∥22 +λ ∥β∥1

• The difference between the Lasso and Ridge: ∥β∥1 vs ∥β∥22 .

CD algorithm 1: key idea

CD algorithm 2 (Many tricks!)

1. In each iteration, for j = 1, 2, ..., p,

2. Iterate untile ∥β̂ t+1 − β̂ t ∥2 < ξ, ξ = 10−8 .

## Loading required package: Matrix

fit <- glmnet(x, y)

## 21 x 1 sparse Matrix of class "dgCMatrix"

Prediction with some λ

Prediction with the optimal λ

## 21 x 1 sparse Matrix of class "dgCMatrix"

predict(fit, newx = x[1:5,], type = "class", s = c(0.05, 0.01))

You might also like