Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

A quick tour of glmnet

2023-12-28

Introduction
glmnet is a package that fits generalized linear and similar models via penalized maximum likelihood. The
regularization path is computed for the lasso or elastic net penalty at a grid of values (on the log scale) for
the regularization parameter λ. The algorithm is extremely fast, and can exploit sparsity in the input matrix
x. It fits linear, logistic and multinomial, poisson, and Cox regression models. It can also fit multi-response
linear regression, generalized linear models for custom families, and relaxed lasso regression models. The
package includes methods for prediction and plotting, and functions for cross-validation.

LM-Lasso
The Lasso estimate is defined as
P
X
β̂ lasso = arg minp ∥y − Xβ∥22 + λ |βj |
β∈R
j=1

= arg minp ∥y − Xβ∥22 +λ ∥β∥1


β∈R | {z } | {z }
Loss Penalty

• The difference between the Lasso and Ridge: ∥β∥1 vs ∥β∥22 .


• Although look similar, their solutions behave very differently.
• “lasso”: Least Absolute Selection and Shrinkage Operator.

Lasso solver
Since NO closed form, we need Lasso algorithm:
• QP (JRSSB, 1996)
• Shooting (JCGS, 1998)
• LARS (AoS 2002, Boosting, FS)
• Coordinate Discent (2008~, many tricks: Warm starts, screening rules)
• ADMM, PGD, BLasso, . . .

CD algorithm 1: key idea


Gauss–Seidel method for Ax = b without matrix inverse:
10x1 −x2 +2x3 = 6,
−x1 +11x2 −x3 +3x4 = 25,
2x1 −x2 +10x3 −x4 = −11,
3x2 −x3 +8x4 = 15.

1
Suppose we choose (0, 0, 0, 0) as the initial approximation, with following iteration step,

x1 = x2 /10 − x3 /5 + 3/5,
x2 = x1 /11 + x3 /11 − 3x4 /11 + 25/11,
x3 = −x1 /5 + x2 /10 + x4 /10 − 11/10,
x4 = −3x2 /8 + x3 /8 + 15/8.

After four iterations, we have (1.00086, 2.0003, -1.00031, 0.99985). The exact solution of the system is (1, 2,
-1, 1).

CD algorithm 2 (Many tricks!)


For a given λ, compute β̂ lasso by iterations:
p
X n
X
Q(β1 , . . . , βp ; λ) = (yi − xi1 β1 − · · · − xip βp )2 + λ |βj |
i=1 j=1

1. In each iteration, for j = 1, 2, ..., p,


• fixed all β̃−j = {β̃j ′ , j ′ ̸= j}, minimize Q(βj , β̃−j ; λ);
• min Q(βj , β̃−j ; λ) is equivalent to the following one-variable Lasso problem

1
β̂j = arg min aβj2 − bβj + λ|βj |
θ 2
sign(b)
= (|b| − λ)+ .
a

2. Iterate untile ∥β̂ t+1 − β̂ t ∥2 < ξ, ξ = 10−8 .


Compute β̂ lasso (λ) for a sequnce of λ = seq(λmax , 0.005λmax , 100).

Example
library(glmnet)

## Loading required package: Matrix


## Loaded glmnet 4.1-6

Clustering
data(QuickStartExample)
x <- QuickStartExample$x
y <- QuickStartExample$y

fit <- glmnet(x, y)

plot(fit)

2
0 6 7 9

1.0
0.5
Coefficients

0.0
−1.0 −0.5

0 2 4 6

L1 Norm
print(fit)

##
## Call: glmnet(x = x, y = y)
##
## Df %Dev Lambda
## 1 0 0.00 1.63100
## 2 2 5.53 1.48600
## 3 2 14.59 1.35400
## 4 2 22.11 1.23400
## 5 2 28.36 1.12400
## 6 2 33.54 1.02400
## 7 4 39.04 0.93320
## 8 5 45.60 0.85030
## 9 5 51.54 0.77470
## 10 6 57.35 0.70590
## 11 6 62.55 0.64320
## 12 6 66.87 0.58610
## 13 6 70.46 0.53400
## 14 6 73.44 0.48660
## 15 7 76.21 0.44330
## 16 7 78.57 0.40400
## 17 7 80.53 0.36810
## 18 7 82.15 0.33540
## 19 7 83.50 0.30560
## 20 7 84.62 0.27840
## 21 7 85.55 0.25370
## 22 7 86.33 0.23120
## 23 8 87.06 0.21060
## 24 8 87.69 0.19190
## 25 8 88.21 0.17490

3
## 26 8 88.65 0.15930
## 27 8 89.01 0.14520
## 28 8 89.31 0.13230
## 29 8 89.56 0.12050
## 30 8 89.76 0.10980
## 31 9 89.94 0.10010
## 32 9 90.10 0.09117
## 33 9 90.23 0.08307
## 34 9 90.34 0.07569
## 35 10 90.43 0.06897
## 36 11 90.53 0.06284
## 37 11 90.62 0.05726
## 38 12 90.70 0.05217
## 39 15 90.78 0.04754
## 40 16 90.86 0.04331
## 41 16 90.93 0.03947
## 42 16 90.98 0.03596
## 43 17 91.03 0.03277
## 44 17 91.07 0.02985
## 45 18 91.11 0.02720
## 46 18 91.14 0.02479
## 47 19 91.17 0.02258
## 48 19 91.20 0.02058
## 49 19 91.22 0.01875
## 50 19 91.24 0.01708
## 51 19 91.25 0.01557
## 52 19 91.26 0.01418
## 53 19 91.27 0.01292
## 54 19 91.28 0.01178
## 55 19 91.29 0.01073
## 56 19 91.29 0.00978
## 57 19 91.30 0.00891
## 58 19 91.30 0.00812
## 59 19 91.31 0.00739
## 60 19 91.31 0.00674
## 61 19 91.31 0.00614
## 62 20 91.31 0.00559
## 63 20 91.31 0.00510
## 64 20 91.31 0.00464
## 65 20 91.32 0.00423
## 66 20 91.32 0.00386
## 67 20 91.32 0.00351
coef(fit, s = 0.1)

## 21 x 1 sparse Matrix of class "dgCMatrix"


## s1
## (Intercept) 0.150928072
## V1 1.320597195
## V2 .
## V3 0.675110234
## V4 .
## V5 -0.817411518
## V6 0.521436671
## V7 0.004829335

4
## V8 0.319415917
## V9 .
## V10 .
## V11 0.142498519
## V12 .
## V13 .
## V14 -1.059978702
## V15 .
## V16 .
## V17 .
## V18 .
## V19 .
## V20 -1.021873704

Prediction with some λ


set.seed(29)
nx <- matrix(rnorm(5 * 20), 5, 20)
predict(fit, newx = nx, s = c(0.1, 0.05))

## s1 s2
## [1,] -4.3067990 -4.5979456
## [2,] -4.1244091 -4.3447727
## [3,] -0.1133939 -0.1859237
## [4,] 3.3458748 3.5270269
## [5,] -1.2366422 -1.2772955

Prediction with the optimal λ


cvfit <- cv.glmnet(x, y)
plot(cvfit)

5
20 20 19 19 19 16 11 9 8 8 7 7 6 5 2 0

8
Mean−Squared Error

6
4
2

−5 −4 −3 −2 −1 0

Log(λ)
cvfit$lambda.min

## [1] 0.06284188
coef(cvfit, s = "lambda.min")

## 21 x 1 sparse Matrix of class "dgCMatrix"


## s1
## (Intercept) 0.145832036
## V1 1.340981414
## V2 .
## V3 0.708347140
## V4 .
## V5 -0.848087765
## V6 0.554823782
## V7 0.038519738
## V8 0.347221319
## V9 .
## V10 0.010034050
## V11 0.186365265
## V12 .
## V13 .
## V14 -1.086006902
## V15 -0.004444319
## V16 .
## V17 .
## V18 .
## V19 .
## V20 -1.069942845

6
predict(cvfit, newx = x[1:5,], s = "lambda.min")

## lambda.min
## [1,] -1.3574653
## [2,] 2.5776672
## [3,] 0.5846421
## [4,] 2.0280562
## [5,] 1.5780633

logistic regression
data(BinomialExample)
x <- BinomialExample$x
y <- BinomialExample$y
fit <- glmnet(x, y, family = "binomial")

predict(fit, newx = x[1:5,], type = "class", s = c(0.05, 0.01))

## s1 s2
## [1,] "0" "0"
## [2,] "1" "1"
## [3,] "1" "1"
## [4,] "0" "0"
## [5,] "1" "1"
Acknowledgement: This lecture note contains material adapted from https://glmnet.stanford.edu/articles
/glmnet.html.

You might also like