Skip to content

[MRG+2] Handle numerical instability in ElasticNetCV and LassoCV #4226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 12, 2015

Conversation

raghavrv
Copy link
Member

@raghavrv raghavrv commented Feb 9, 2015

Fixes #4224

  • Return a 0 array of size n_alphas when the maximum alpha is 0
  • Add NRT

@GaelVaroquaux does this seem like a reasonable fix?

@raghavrv
Copy link
Member Author

raghavrv commented Feb 9, 2015

Also @jamestwebber please take a look!

@agramfort
Copy link
Member

can you add a test so we understand what you're fixing? thx

@raghavrv
Copy link
Member Author

raghavrv commented Feb 9, 2015

Sure! btw this is what I am trying to fix -

When the targets are uniform the Xy ( after normalizing etc ) becomes 0. alpha_max computed from the preprocessed Xy is also zero. The range of alphas computed using log10 returns all nans and this prevents the best of max_alphas computation at this line. I assumed this to be the cause of the referenced issue.

If my understanding is right, so far, another question to be answered is whether we should support uniform values ( n_classes = 1 ) for y or simply bail out with a nice error message.

@agramfort
Copy link
Member

agramfort commented Feb 9, 2015 via email

@raghavrv
Copy link
Member Author

raghavrv commented Feb 9, 2015

done :p

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) to 95.02% when pulling dd8890b on ragv:fix_4224 into a3283c6 on scikit-learn:master.

@raghavrv raghavrv changed the title Handle numerical instability in ElasticNetCV [MRG] Handle numerical instability in ElasticNetCV Feb 9, 2015
@jamestwebber
Copy link
Contributor

I guess this fixes the issue, although I think I was a little off when I called out line 100 as the culprit, it was really line 101. Line 100 shouldn't generate any NaNs, so using nanmax there won't help. The problem is when zeros are fed into the log function on the next line. So that is still going to generate NaNs and cause problems.

But your changes around line 1095 sidestep that issue, as they ignore all NaNs in the results and there should always be one zero in the array to prevent an exception (taking nanargmax of an all-NaN array would fail). So that's why the test passes.

I didn't write a fix because I really just wasn't sure what the right behavior should be. It might make more sense to just bail out if the target vector is all zeros.

@raghavrv
Copy link
Member Author

raghavrv commented Feb 9, 2015

thanks for the comment.

But your changes around line 1095 sidestep that issue, as they ignore all NaNs in the results

True! Both the nanmax and nanargmin are unnecessary and are ought to be removed. I had initially tested using them and forgot to remove. Thanks for pointing it out :)

EDIT : have fixed the same!

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) to 95.02% when pulling ff35bcd on ragv:fix_4224 into a3283c6 on scikit-learn:master.

@@ -389,7 +392,6 @@ def enet_path(X, y, l1_ratio=0.5, eps=1e-3, n_alphas=100, alphas=None,
if selection not in ['random', 'cyclic']:
raise ValueError("selection should be either random or cyclic.")
random = (selection == 'random')
models = []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning up unused variable

@raghavrv raghavrv force-pushed the fix_4224 branch 4 times, most recently from cd8fd48 to d03aac9 Compare February 9, 2015 22:44
@raghavrv
Copy link
Member Author

raghavrv commented Feb 9, 2015

@agramfort Thanks for the review! Have addressed your comments... Pl. take a look when you find time...

@@ -69,6 +69,11 @@ def _alpha_grid(X, y, Xy=None, l1_ratio=1.0, fit_intercept=True,
copy_X : boolean, optional, default True
If ``True``, X will be copied; else, it may be overwritten.
"""
if np.unique(y).size == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alpha_max is zero, only when fit_intercept is set to True (and when all n_targets) are the same. This block of code should be inside a if fit_intercept condition, IMO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See hidden diff (#4226 (diff)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! Should I revert it back to the old form?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hard-code this case, we could always just create a DummyRegressor and save a lot of work ;) I guess it would make the code ugly, though.
Maybe we should warn here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hard-code this case, we could always just create a DummyRegressor and save a lot of work ;)

But this particular use case is quite uncommon right?

Maybe we should warn here.

I'll add a warning. Also since this is quite similar to n_classes = 1, do we support n_classes = 1 in classifiers? It was suggested as a smoke test in #406 ( under not so easy )...

@raghavrv
Copy link
Member Author

@MechCoder @agramfort Have added the tests and reverted the checking to alpha_max. Please take a look...

num=n_alphas)[::-1]
return alphas

if alpha_max == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if alpha_max < np.finfo(float).min:

would be cleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

For my understanding, why is it preferable? :)

@raghavrv raghavrv force-pushed the fix_4224 branch 2 times, most recently from 34fac3b to 5c9c67f Compare February 11, 2015 13:51
@amueller
Copy link
Member

@MechCoder
Copy link
Member

@agramfort I feel that the checking of np.unique(y) to set the grid of alphas is incorrect because of the reasons, (that I mentioned in the comments)

  1. This is valid only for the fit_intercept case or when X is centered. If X is not centered, then even if you all targets to be the same, alpha_max need not be zero.
  2. This checking of np.unique(y) has to be done for all the cases which does not outweigh the benefits of not doing these calculations for a really rare case.

What do you think?

@agramfort
Copy link
Member

agramfort commented Feb 11, 2015 via email

@raghavrv
Copy link
Member Author

@agramfort / @MechCoder done... could you please take a final look at this?

@GaelVaroquaux have reduced n_alphas to 3, tests now run in 0.16s.

@amueller Since now the check is for alpha_max <= np.finfo(float).resolution, I haven't added any warnings... Do you think it should still be added?

num=n_alphas)[::-1]
return alphas

if alpha_max <= np.finfo(float).resolution:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort out of curiosity, why do we use resolution instead of 0 directly?
( Is it for variations in the representation of floating point in different architectures so that when 0 is stored as 0.0....1, that does not affect this check? )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid passing a 0 regularization in CD code. It raises warnings for nothing.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Feb 11, 2015 via email

@raghavrv
Copy link
Member Author

The python3 test for the multitask case in travis keeps failing 😭
What's confusing to me is that this had passed in my box under python3.4 / python2.7... 😕

Could someone help me figure this out? This is the log file . (Search for "nan inf", the log for the particular case extends upto 10 lines above the "nan inf" line ( 2735..2745 lines)... Which shows the alpha max is 0 however this_best_mse is nan causing the trouble... Also from the size of Xy it is the first test for the multitask case.

@jamestwebber
Copy link
Contributor

It's this line here: https://github.com/ragv/scikit-learn/blob/fix_4224/sklearn/linear_model/coordinate_descent.py#L109

Using np.finfo(float).min is not a good idea, because multiplying it by anything >1 is going to result in an overflow (technically it's an underflow, but same diff).

np.finfo(float).min
Out[195]: -1.7976931348623157e+308

np.finfo(float).min * 1. * 10
Out[196]: -inf

@raghavrv
Copy link
Member Author

yay!! the tests pass now! Thanks a lot @jamestwebber for clarifying that ! :)

for model in models_multi_task:
for y_values in (0, -5, 5, 4.5, -4.5):
y2[:, 0].fill(y_values)
y2[:, 1].fill(2 * y_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also check that the grid of alphas obtained are the same?
Also I think it is sufficient to just check for maybe two inputs for y, i.e 0 or 5.
After that you have my +1 for merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll have my +1 too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Added +2. Thanks for the review! :)

@raghavrv
Copy link
Member Author

also now that we test only for two inputs 0 and 5 the tests take 10ms 90ms only...

@raghavrv raghavrv changed the title [MRG] Handle numerical instability in ElasticNetCV [MRG+2] Handle numerical instability in ElasticNetCV Feb 12, 2015
MechCoder added a commit that referenced this pull request Feb 12, 2015
[MRG+2] Handle numerical instability in ElasticNetCV
@MechCoder MechCoder merged commit 2a946dd into scikit-learn:master Feb 12, 2015
@MechCoder
Copy link
Member

Thanks!

@raghavrv raghavrv deleted the fix_4224 branch February 12, 2015 17:17
@raghavrv raghavrv changed the title [MRG+2] Handle numerical instability in ElasticNetCV [MRG+2] Handle numerical instability in ElasticNetCV and LassoCV Mar 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

local variable referenced before assignment in LassoCV and ElasticNetCV
7 participants