FIX/API Fix AIC/BIC in LassoLarsIC and introduce noise_variance #21481

glemaitre · 2021-10-27T16:13:33Z

This solves the way we compute the AIC and BIC. We are using the formulation from:
https://www.sciencedirect.com/science/article/abs/pii/S0893965917301623

Basically, it seems equivalent to Zou et al. at a multiplicative factor (and with correction for the bug reported in #17145 due to the estimation of the noise variance).

We reproduce the example from Zou et al. and diabetes.

glemaitre · 2021-10-27T22:41:29Z

ping @agramfort

sklearn/linear_model/_least_angle.py

agramfort

thx @glemaitre !

doc/modules/linear_model.rst

examples/linear_model/plot_lasso_model_selection.py

sklearn/linear_model/_least_angle.py

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

agramfort

just one last.

sklearn/linear_model/_least_angle.py

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

agramfort

good to go from my end. Someone else wants to have a look? maybe @lorentzenchr @rth @ogrisel ?

thx @glemaitre !

lorentzenchr

A first very partial review.

doc/modules/linear_model.rst

lorentzenchr · 2021-11-01T16:22:51Z

doc/modules/linear_model.rst

+is the predicted target using an ordinary least squares regression. In
+scikit-learn, we use a ridge model with a very small regularization in case
+of ill-conditioned design matrix. Note, that this formula is valid only when
+`n_samples > n_features`.


Are you sure, using Ridge is fine? Could you test that for one case?
https://arxiv.org/pdf/0712.0881.pdf Theorem 1 states that results are valid only if X is full rank (dummy vs one-hot encoding).

The safeguard to use an OLS is to check the rank of the matrix before computing the dof then?

Let's try with plain LinearRegression. The coef will be numerically unstable but the estimated variance based on its in-sample predictions should be fine.

We just need to make sure to include tests in with n_samples ~= n_features and many collinear features to check that either the results are ok without error messages or that the error messages (if any) make sense.

I'm not saying that ridge is false. I'm just asking as the papers do not mention it and use plain OLS. I've no intuition if it's fine in this context.

examples/linear_model/plot_lasso_lars_ic.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

lorentzenchr

LGTM when all comments are addressed.
Thanks @glemaitre. This is a good improvement!

examples/linear_model/plot_lasso_model_selection.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

glemaitre · 2021-11-22T10:27:37Z

Apparently I don't know how to merge main anymore in a PR :)

glemaitre · 2021-11-22T10:29:48Z

Sorry for the "force-pushed"

glemaitre · 2021-11-22T10:34:41Z

I think that this is fine now. I will just check the rendering of the example to be sure that the table is fitting now.

glemaitre · 2021-11-22T13:45:06Z

The rendering seems OK on my side. @lorentzenchr do you want to have a look at the changes regarding the summary to see if this is fine with you?

lorentzenchr

LGTM. Some more nitpicks.

examples/linear_model/plot_lasso_lars_ic.py

examples/linear_model/plot_lasso_model_selection.py

sklearn/linear_model/_least_angle.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

jjerphan

LGTM, thank you @glemaitre!

Here are some suggestions.

doc/modules/linear_model.rst

doc/whats_new/v1.1.rst

sklearn/linear_model/_least_angle.py

sklearn/linear_model/tests/test_least_angle.py

sklearn/utils/estimator_checks.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

glemaitre · 2021-11-23T18:23:41Z

No problem :) It might be ready to be merged then :)

lorentzenchr

Last comment...

examples/linear_model/plot_lasso_model_selection.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

lorentzenchr · 2021-11-23T18:31:17Z

And over the finish line ... as soon as CI is green.

lorentzenchr · 2021-11-23T18:31:56Z

@glemaitre Thanks for your patience with reviewers!:wink:

…it-learn#21481)

FIX compute the AIC and BIC using MSE

c3839d9

github-actions bot added the module:linear_model label Oct 27, 2021

glemaitre added 4 commits October 28, 2021 00:31

iter

b46e6ee

Merge remote-tracking branch 'origin/main' into is/14566

053db7c

add changelog

e864611

acknowledge

0b61363

ogrisel reviewed Oct 29, 2021

View reviewed changes

sklearn/linear_model/_least_angle.py Outdated Show resolved Hide resolved

glemaitre added 6 commits October 29, 2021 18:59

iter

776490e

iter

caea26c

iter

979f629

DOC add a new example

61a6b6c

iter

668461a

iter

880edbf

glemaitre changed the title ~~FIX compute the AIC and BIC using MSE~~ FIX/API Fix AIC/BIC in LassoLarsIC and introduce noise_variance Oct 30, 2021

glemaitre added 3 commits October 30, 2021 14:29

iter

e3ef92b

iter

d81bca9

Merge branch 'main' into is/14566

ecb1401

agramfort reviewed Oct 30, 2021

View reviewed changes

glemaitre and others added 3 commits October 30, 2021 22:05

Apply suggestions from code review

97c3259

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

iter

ee7b6a4

Apply suggestions from code review

fada91c

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

agramfort approved these changes Oct 31, 2021

View reviewed changes

sklearn/linear_model/_least_angle.py Outdated Show resolved Hide resolved

Update sklearn/linear_model/_least_angle.py

25e3e6e

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

agramfort approved these changes Nov 1, 2021

View reviewed changes

lorentzenchr reviewed Nov 1, 2021

View reviewed changes

glemaitre and others added 4 commits November 2, 2021 13:29

iter

794418a

iter

30386c4

Apply suggestions from code review

228ca79

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

iter

10ececf

lorentzenchr approved these changes Nov 9, 2021

View reviewed changes

christian improvements

73a9f5e

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

github-actions bot added the cython label Nov 22, 2021

glemaitre added 2 commits November 22, 2021 11:28

last review

e3dc244

last review

227f2a1

glemaitre force-pushed the is/14566 branch from 574ce32 to 227f2a1 Compare November 22, 2021 10:29

black

19f9032

Merge remote-tracking branch 'origin/main' into is/14566

e3a4daf

glemaitre removed the cython label Nov 22, 2021

lorentzenchr approved these changes Nov 22, 2021

View reviewed changes

glemaitre and others added 2 commits November 22, 2021 15:36

Apply suggestions from code review

d282d94

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

iter

369f6a7

jjerphan approved these changes Nov 23, 2021

View reviewed changes

glemaitre and others added 2 commits November 23, 2021 11:09

Apply suggestions from code review

e426eb4

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

reviews

6f08ddf

lorentzenchr reviewed Nov 23, 2021

View reviewed changes

examples/linear_model/plot_lasso_model_selection.py Outdated Show resolved Hide resolved

Update examples/linear_model/plot_lasso_model_selection.py

e347478

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

lorentzenchr merged commit 4f6a28f into scikit-learn:main Nov 23, 2021

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021

FIX/API Fix AIC/BIC in LassoLarsIC and introduce noise_variance (scik…

2f71afb

…it-learn#21481)

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

FIX/API Fix AIC/BIC in LassoLarsIC and introduce noise_variance (scik…

effc441

…it-learn#21481)

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021

FIX/API Fix AIC/BIC in LassoLarsIC and introduce noise_variance (scik…

9c63636

…it-learn#21481)

glemaitre added a commit that referenced this pull request Dec 25, 2021

FIX/API Fix AIC/BIC in LassoLarsIC and introduce noise_variance (#21481)

8887895

glemaitre mentioned this pull request May 23, 2022

Formula to compute BIC in sklearn.mixture.GaussianMixture wrong #23443

Closed

Uh oh!

FIX/API Fix AIC/BIC in LassoLarsIC and introduce noise_variance #21481

FIX/API Fix AIC/BIC in LassoLarsIC and introduce noise_variance #21481

Uh oh!

Conversation

glemaitre commented Oct 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Oct 27, 2021

Uh oh!

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Nov 1, 2021

Choose a reason for hiding this comment

Uh oh!

glemaitre Nov 2, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 2, 2021

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Nov 2, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Nov 22, 2021

Uh oh!

glemaitre commented Nov 22, 2021

Uh oh!

glemaitre commented Nov 22, 2021

Uh oh!

glemaitre commented Nov 22, 2021

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Oct 27, 2021 •

edited

Loading