-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] Handle numerical instability in ElasticNetCV and LassoCV #4226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also @jamestwebber please take a look! |
can you add a test so we understand what you're fixing? thx |
Sure! btw this is what I am trying to fix - When the targets are uniform the If my understanding is right, so far, another question to be answered is whether we should support uniform values ( n_classes = 1 ) for y or simply bail out with a nice error message. |
just tell me when you added the test
|
done :p |
I guess this fixes the issue, although I think I was a little off when I called out line 100 as the culprit, it was really line 101. Line 100 shouldn't generate any NaNs, so using nanmax there won't help. The problem is when zeros are fed into the log function on the next line. So that is still going to generate NaNs and cause problems. But your changes around line 1095 sidestep that issue, as they ignore all NaNs in the results and there should always be one zero in the array to prevent an exception (taking nanargmax of an all-NaN array would fail). So that's why the test passes. I didn't write a fix because I really just wasn't sure what the right behavior should be. It might make more sense to just bail out if the target vector is all zeros. |
thanks for the comment.
True! Both the EDIT : have fixed the same! |
@@ -389,7 +392,6 @@ def enet_path(X, y, l1_ratio=0.5, eps=1e-3, n_alphas=100, alphas=None, | |||
if selection not in ['random', 'cyclic']: | |||
raise ValueError("selection should be either random or cyclic.") | |||
random = (selection == 'random') | |||
models = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up unused variable
cd8fd48
to
d03aac9
Compare
@agramfort Thanks for the review! Have addressed your comments... Pl. take a look when you find time... |
@@ -69,6 +69,11 @@ def _alpha_grid(X, y, Xy=None, l1_ratio=1.0, fit_intercept=True, | |||
copy_X : boolean, optional, default True | |||
If ``True``, X will be copied; else, it may be overwritten. | |||
""" | |||
if np.unique(y).size == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alpha_max is zero, only when fit_intercept
is set to True (and when all n_targets) are the same. This block of code should be inside a if fit_intercept
condition, IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See hidden diff (#4226 (diff)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! Should I revert it back to the old form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we hard-code this case, we could always just create a DummyRegressor and save a lot of work ;) I guess it would make the code ugly, though.
Maybe we should warn here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we hard-code this case, we could always just create a DummyRegressor and save a lot of work ;)
But this particular use case is quite uncommon right?
Maybe we should warn here.
I'll add a warning. Also since this is quite similar to n_classes
= 1, do we support n_classes
= 1 in classifiers? It was suggested as a smoke test in #406 ( under not so easy )...
@MechCoder @agramfort Have added the tests and reverted the checking to |
num=n_alphas)[::-1] | ||
return alphas | ||
|
||
if alpha_max == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if alpha_max < np.finfo(float).min:
would be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
For my understanding, why is it preferable? :)
34fac3b
to
5c9c67f
Compare
@agramfort I feel that the checking of
What do you think? |
ok makes sense. Fine with me.
|
@agramfort / @MechCoder done... could you please take a final look at this? @GaelVaroquaux have reduced n_alphas to 3, tests now run in 0.16s. @amueller Since now the check is for |
num=n_alphas)[::-1] | ||
return alphas | ||
|
||
if alpha_max <= np.finfo(float).resolution: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agramfort out of curiosity, why do we use resolution instead of 0 directly?
( Is it for variations in the representation of floating point in different architectures so that when 0 is stored as 0.0....1, that does not affect this check? )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid passing a 0 regularization in CD code. It raises warnings for nothing.
@GaelVaroquaux have reduced n_alphas to 3, tests now run in 0.16s.
That's much better. Thanks
|
The python3 test for the multitask case in travis keeps failing 😭 Could someone help me figure this out? This is the log file . (Search for "nan inf", the log for the particular case extends upto 10 lines above the "nan inf" line ( 2735..2745 lines)... Which shows the alpha max is 0 however |
It's this line here: https://github.com/ragv/scikit-learn/blob/fix_4224/sklearn/linear_model/coordinate_descent.py#L109 Using np.finfo(float).min is not a good idea, because multiplying it by anything >1 is going to result in an overflow (technically it's an underflow, but same diff). np.finfo(float).min
Out[195]: -1.7976931348623157e+308
np.finfo(float).min * 1. * 10
Out[196]: -inf |
yay!! the tests pass now! Thanks a lot @jamestwebber for clarifying that ! :) |
for model in models_multi_task: | ||
for y_values in (0, -5, 5, 4.5, -4.5): | ||
y2[:, 0].fill(y_values) | ||
y2[:, 1].fill(2 * y_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also check that the grid of alphas obtained are the same?
Also I think it is sufficient to just check for maybe two inputs for y
, i.e 0 or 5.
After that you have my +1 for merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll have my +1 too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Added +2. Thanks for the review! :)
also now that we test only for two inputs 0 and 5 the tests take |
[MRG+2] Handle numerical instability in ElasticNetCV
Thanks! |
Fixes #4224
n_alphas
when the maximum alpha is 0@GaelVaroquaux does this seem like a reasonable fix?