-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG] Does not store all cv values nor all dual coef in _RidgeGCV fit #15183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3aac2e5
51b9add
a78be8c
bfb6d98
c7baa7c
61782b3
89dde17
02e694e
b4cf560
98d2f07
097e783
86554c5
3c637e9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -99,6 +99,7 @@ random sampling procedures. | |
- :class:`impute.IterativeImputer` when `X` has features with no missing | ||
values. |Feature| | ||
- :class:`linear_model.Ridge` when `X` is sparse. |Fix| | ||
- :class:`linear_model.RidgeCV` when `cv` is `None`. |Efficiency| |Fix| | ||
- :class:`model_selection.StratifiedKFold` and any use of `cv=int` with a | ||
classifier. |Fix| | ||
|
||
|
@@ -504,6 +505,22 @@ Changelog | |
requires less memory. | ||
:pr:`14108`, :pr:`14170`, :pr:`14296` by :user:`Alex Henrie <alexhenrie>`. | ||
|
||
- |Efficiency| :class:`linear_model.RidgeCV` and | ||
:class:`linear_model.RidgeClassifierCV` now does not allocate a | ||
potentially large array to store dual coefficients for all hyperparameters | ||
during its `fit`, nor an array to store all LOO predictions unless | ||
`store_cv_values` is `True`. | ||
:pr:`15183` by :user:`Jérôme Dockès <jeromedockes>`. | ||
|
||
- |Fix| :class:`linear_model.RidgeCV` returns predictions in the original space | ||
when `scoring is not None`. In addition, the reported mean squared errors are | ||
normalized by the sample weights for consistency. | ||
:pr:`15183` by :user:`Jérôme Dockès <jeromedockes>`. | ||
|
||
- |Fix| In :class:`linear_model.RidgeCV`, the predicitons reported by | ||
`cv_values_` are now rescaled in the original space when `scoring` is not | ||
`None`. :pr:`13995` by :user:`Jérôme Dockès <jeromedockes>`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #13995 is an issue, not the PR |
||
|
||
- |Fix| :class:`linear_model.Ridge` now correctly fits an intercept when `X` is | ||
sparse, `solver="auto"` and `fit_intercept=True`, because the default solver | ||
in this configuration has changed to `sparse_cg`, which can fit an intercept | ||
|
@@ -807,7 +824,6 @@ Changelog | |
- |Fix| The liblinear solver now supports ``sample_weight``. | ||
:pr:`15038` by :user:`Guillaume Lemaitre <glemaitre>`. | ||
|
||
|
||
:mod:`sklearn.tree` | ||
................... | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -1054,6 +1054,16 @@ def _matmat(self, v): | |||
return res | ||||
|
||||
|
||||
class _IdentityEstimator: | ||||
"""Hack to call a scorer when we already have the predictions.""" | ||||
|
||||
def decision_function(self, y_predict): | ||||
return y_predict | ||||
|
||||
def predict(self, y_predict): | ||||
return y_predict | ||||
|
||||
|
||||
class _RidgeGCV(LinearModel): | ||||
"""Ridge regression with built-in Generalized Cross-Validation | ||||
|
||||
|
@@ -1087,6 +1097,10 @@ class _RidgeGCV(LinearModel): | |||
|
||||
looe = y - loov = c / diag(G^-1) | ||||
|
||||
The best score (negative mean squared error or user-provided scoring) is | ||||
stored in the `best_score_` attribute, and the selected hyperparameter in | ||||
`alpha_`. | ||||
|
||||
References | ||||
---------- | ||||
http://cbcl.mit.edu/publications/ps/MIT-CSAIL-TR-2007-025.pdf | ||||
|
@@ -1462,43 +1476,41 @@ def fit(self, X, y, sample_weight=None): | |||
else: | ||||
sqrt_sw = np.ones(X.shape[0], dtype=X.dtype) | ||||
|
||||
X_mean, *decomposition = decompose(X, y, sqrt_sw) | ||||
|
||||
scorer = check_scoring(self, scoring=self.scoring, allow_none=True) | ||||
error = scorer is None | ||||
|
||||
n_y = 1 if len(y.shape) == 1 else y.shape[1] | ||||
cv_values = np.zeros((n_samples * n_y, len(self.alphas)), | ||||
dtype=X.dtype) | ||||
C = [] | ||||
X_mean, *decomposition = decompose(X, y, sqrt_sw) | ||||
|
||||
if self.store_cv_values: | ||||
self.cv_values_ = np.empty( | ||||
(n_samples * n_y, len(self.alphas)), dtype=X.dtype) | ||||
|
||||
best_coef, best_score, best_alpha = None, None, None | ||||
|
||||
for i, alpha in enumerate(self.alphas): | ||||
G_inverse_diag, c = solve( | ||||
float(alpha), y, sqrt_sw, X_mean, *decomposition) | ||||
if error: | ||||
squared_errors = (c / G_inverse_diag) ** 2 | ||||
cv_values[:, i] = squared_errors.ravel() | ||||
alpha_score = -squared_errors.mean() | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there's another bug: we need to devide sample_weight here, otherwise we get weighted error. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why shouldn't sample weights be taken into account when computing the error? Note that this has always been the behaviour of the GCV estimator and is made explicit here scikit-learn/sklearn/linear_model/_ridge.py Line 1554 in 98cb91b
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's embarassing that such important thing is documented in private functions. |
||||
if self.store_cv_values: | ||||
self.cv_values_[:, i] = squared_errors.ravel() | ||||
else: | ||||
predictions = y - (c / G_inverse_diag) | ||||
cv_values[:, i] = predictions.ravel() | ||||
C.append(c) | ||||
|
||||
if error: | ||||
best = cv_values.mean(axis=0).argmin() | ||||
else: | ||||
# The scorer want an object that will make the predictions but | ||||
# they are already computed efficiently by _RidgeGCV. This | ||||
# identity_estimator will just return them | ||||
def identity_estimator(): | ||||
pass | ||||
identity_estimator.decision_function = lambda y_predict: y_predict | ||||
identity_estimator.predict = lambda y_predict: y_predict | ||||
|
||||
# signature of scorer is (estimator, X, y) | ||||
out = [scorer(identity_estimator, cv_values[:, i], y.ravel()) | ||||
for i in range(len(self.alphas))] | ||||
best = np.argmax(out) | ||||
|
||||
self.alpha_ = self.alphas[best] | ||||
self.dual_coef_ = C[best] | ||||
alpha_score = scorer( | ||||
_IdentityEstimator(), predictions.ravel(), y.ravel()) | ||||
if self.store_cv_values: | ||||
self.cv_values_[:, i] = \ | ||||
(predictions.ravel() / np.repeat(sqrt_sw, n_y) | ||||
+ y_offset) | ||||
if (best_score is None) or (alpha_score > best_score): | ||||
best_coef, best_score, best_alpha = c, alpha_score, alpha | ||||
|
||||
self.alpha_ = best_alpha | ||||
self.best_score_ = best_score | ||||
self.dual_coef_ = best_coef | ||||
self.coef_ = safe_sparse_dot(self.dual_coef_.T, X) | ||||
|
||||
X_offset += X_mean * X_scale | ||||
|
@@ -1509,7 +1521,7 @@ def identity_estimator(): | |||
cv_values_shape = n_samples, len(self.alphas) | ||||
else: | ||||
cv_values_shape = n_samples, n_y, len(self.alphas) | ||||
self.cv_values_ = cv_values.reshape(cv_values_shape) | ||||
self.cv_values_ = self.cv_values_.reshape(cv_values_shape) | ||||
|
||||
return self | ||||
|
||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOO predictions or mean squared errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it depends of
scoring
: LOO predictions ifscoring is not None
otherwise mean squared errors.But agreed that it should be added.