Skip to content

[MRG] Does not store all cv values nor all dual coef in _RidgeGCV fit #15183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

jeromedockes
Copy link
Contributor

fixes #15182

in master the _RidgeGCV stores the LOO predictions and the dual coefficients for all hyperparameters during fit, which can take a lot of memory. This PR only stores the best score and coefficients when store_cv_values == False

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please add whats new entry with Efficiency tag.

@@ -1048,6 +1048,16 @@ def _matmat(self, v):
return res


class _IdentityEstimator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly less hacky...now that it is a class. :)

jeromedockes and others added 2 commits October 11, 2019 16:53
Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>
@jeromedockes
Copy link
Contributor Author

thanks @thomasjpfan

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jeromedockes ! Some of the added changes are not covered by tests, it would be good to add more tests for those (unless codecov has issues again).

"""Hack to call a scorer when we already have the predictions."""

def decision_function(self, y_predict):
return y_predict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never called by tests.

alpha_score = scorer(
_IdentityEstimator(), predictions.ravel(), y.ravel())
if self.store_cv_values:
self.cv_values_[:, i] = predictions.ravel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not covered by tests either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path is not reached because

if self.store_cv_values:
raise ValueError("cv!=None and store_cv_values=True "
" are incompatible")

which means we can assume that self.store_cv_values is always false when scorer is defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it is always false when cv != None which means do not use GCV, but this uses a different estimator than the one affected here. The tests I added now should cover the missing lines.
But it is true that the meaning of the stored "cv values" is different when scorer is defined, see #13998

best_coef, best_score, best_alpha = c, alpha_score, alpha

self.alpha_ = best_alpha
self.best_score_ = best_score
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to include this, we should document it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note in the docstring. but note that this is a private class -- users don't have access to this best_score_.
adding a best_score_ attribute to public RidgeCV and RidgeClassifierCV estimators is discussed in #4667

@thomasjpfan thomasjpfan changed the title do not store all cv values nor all dual coef in _RidgeGCV fit [MRG] Does not store all cv values nor all dual coef in _RidgeGCV fit Oct 15, 2019
@@ -372,6 +373,12 @@ Changelog
and `fit_intercept=True`.
:pr:`15086` by :user:`Alex Gramfort <agramfort>`.

- |Efficiency| :class:`linear_model.RidgeCV` now does not allocate a potentially
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, this should appear before the linear_model Fix entries.

@thomasjpfan thomasjpfan added this to the 0.22 milestone Oct 26, 2019
@@ -1048,6 +1048,16 @@ def _matmat(self, v):
return res


class _IdentityEstimator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are starting to need those hacks everywhere... We talked about this with @ogrisel but can't remember where

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jeromedockes , a few comments

Comment on lines 568 to 570
def scorer(estimator, X, Y):
pred = estimator.decision_function(X)
return np.sum((pred - Y)**2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this scorer?

assert_allclose(loo_pred, ridge_cv.cv_values_[:, 1])


def test_ridge_gcv_decision_function_scoring():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test testing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this test was intended to check the equivalence between score=None and score=scorer where scorer compute the mean squared error

@glemaitre glemaitre self-requested a review November 15, 2019 14:35
@glemaitre
Copy link
Member

@jeromedockes I pushed what I thought your tests were done for. Could you have a look if it was what you intended.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mention RidgeClassifierCV in what's new, otherwise this LGTM

@@ -495,6 +496,16 @@ Changelog
requires less memory.
:pr:`14108`, :pr:`14170`, :pr:`14296` by :user:`Alex Henrie <alexhenrie>`.

- |Efficiency| :class:`linear_model.RidgeCV` now does not allocate a
potentially large array to store dual coefficients for all hyperparameters
during its `fit`, nor an array to store all LOO predictions unless
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOO predictions or mean squared errors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it depends of scoring: LOO predictions if scoring is not None otherwise mean squared errors.
But agreed that it should be added.

# equivalent to `scoring=None`

def scorer(estimator, X, Y):
pred = estimator.decision_function(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps predict will be more friendly here :)

for i, alpha in enumerate(self.alphas):
G_inverse_diag, c = solve(
float(alpha), y, sqrt_sw, X_mean, *decomposition)
if error:
squared_errors = (c / G_inverse_diag) ** 2
cv_values[:, i] = squared_errors.ravel()
alpha_score = -squared_errors.mean()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's another bug: we need to devide sample_weight here, otherwise we get weighted error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn't sample weights be taken into account when computing the error? Note that this has always been the behaviour of the GCV estimator and is made explicit here

or another form of cross-validation, because only generalized

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's embarassing that such important thing is documented in private functions.
I think users will expect that RidgeCV() should be equivalent to GridSearchCV(Ridge(), cv=LeaveOneOut())

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., if we add sample_weight to test_ridge_gcv_equivalence_prediction_metric, this test will fail

@glemaitre
Copy link
Member

Good catch @qinhanmin2014

Playing a bit, RidgeClassifierCV is currently broken with problem of shape. I will quickly check where the issues are coming from.

@qinhanmin2014
Copy link
Member

Playing a bit, RidgeClassifierCV is currently broken with problem of shape. I will quickly check where the issues are coming from.

I have a draft locally, I'll upload it so you can play with it.

@qinhanmin2014
Copy link
Member

actually there's another issue regarding RidgeClassifierCV, let's fix RidgeCV first.

@jeromedockes
Copy link
Contributor Author

@jeromedockes I pushed what I thought your tests were done for. Could you have a look if it was what you intended.

yes it is. Thanks a lot!

@jeromedockes
Copy link
Contributor Author

should this pr be closed in favour of #15648 ?


- |Fix| In :class:`linear_model.RidgeCV`, the predicitons reported by
`cv_values_` are now rescaled in the original space when `scoring` is not
`None`. :pr:`13995` by :user:`Jérôme Dockès <jeromedockes>`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13995 is an issue, not the PR

@qinhanmin2014
Copy link
Member

should this pr be closed in favour of #15648 ?

We'll close when needed and your name will still be noted. Thanks for contributing.

@qinhanmin2014
Copy link
Member

And if you have time, please review #15648, thanks a lot :)

@glemaitre
Copy link
Member

@qinhanmin2014 @jeromedockes is working as well in Inria with me so we can continue working directly in this PR. It would avoid duplicated comments and keep the history.

@qinhanmin2014
Copy link
Member

@qinhanmin2014 @jeromedockes is working as well in Inria with me so we can continue working directly in this PR. It would avoid duplicated comments and keep the history.

That's fine. I provide that PR for you to play with.

@glemaitre
Copy link
Member

Can you guys redirect there: https://gitter.im/glemaitre/ridgecv

@qinhanmin2014
Copy link
Member

please ping when you want me to review.

@qinhanmin2014 qinhanmin2014 modified the milestones: 0.22, 0.23 Nov 26, 2019
@thomasjpfan thomasjpfan modified the milestones: 0.23, 0.24 Apr 20, 2020
@cmarmo
Copy link
Contributor

cmarmo commented Aug 6, 2020

@glemaitre, the issue this PR was meant to close has been closed in the meanwhile by #15182 (you are the author). Should this one be closed? What about #15648 also meant to close the same issue? Thanks!

@glemaitre
Copy link
Member

We can close this PR. There is still some unsolved issue with Ridge:

*Denormalize predictions

  • Unscaled MSE
  • Deprecate other CV than gcv (which is not gcv :))

@glemaitre glemaitre closed this Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_RidgeGCV stores LOO predictions even when not required
8 participants