-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX/API Fix AIC/BIC in LassoLarsIC and introduce noise_variance #21481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ping @agramfort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx @glemaitre !
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one last.
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to go from my end. Someone else wants to have a look? maybe @lorentzenchr @rth @ogrisel ?
thx @glemaitre !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first very partial review.
doc/modules/linear_model.rst
Outdated
is the predicted target using an ordinary least squares regression. In | ||
scikit-learn, we use a ridge model with a very small regularization in case | ||
of ill-conditioned design matrix. Note, that this formula is valid only when | ||
`n_samples > n_features`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure, using Ridge is fine? Could you test that for one case?
https://arxiv.org/pdf/0712.0881.pdf Theorem 1 states that results are valid only if X
is full rank (dummy vs one-hot encoding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safeguard to use an OLS is to check the rank of the matrix before computing the dof
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try with plain LinearRegression
. The coef will be numerically unstable but the estimated variance based on its in-sample predictions should be fine.
We just need to make sure to include tests in with n_samples ~= n_features
and many collinear features to check that either the results are ok without error messages or that the error messages (if any) make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying that ridge is false. I'm just asking as the papers do not mention it and use plain OLS. I've no intuition if it's fine in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM when all comments are addressed.
Thanks @glemaitre. This is a good improvement!
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Apparently I don't know how to merge |
Sorry for the "force-pushed" |
I think that this is fine now. I will just check the rendering of the example to be sure that the table is fitting now. |
The rendering seems OK on my side. @lorentzenchr do you want to have a look at the changes regarding the summary to see if this is fine with you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some more nitpicks.
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @glemaitre!
Here are some suggestions.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
No problem :) It might be ready to be merged then :)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comment...
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
And over the finish line ... as soon as CI is green. |
@glemaitre Thanks for your patience with reviewers!:wink: |
closes #14566
closes #17145
This solves the way we compute the AIC and BIC. We are using the formulation from:
https://www.sciencedirect.com/science/article/abs/pii/S0893965917301623
Basically, it seems equivalent to Zou et al. at a multiplicative factor (and with correction for the bug reported in #17145 due to the estimation of the noise variance).
We reproduce the example from Zou et al. and diabetes.