-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
FIX switch to 'sparse_cg' solver in Ridge when X is sparse and fitting intercept #13995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX switch to 'sparse_cg' solver in Ridge when X is sparse and fitting intercept #13995
Conversation
sklearn/linear_model/ridge.py
Outdated
solver = 'sparse_cg' | ||
if self.solver not in ['auto', 'sparse_cg']: | ||
warnings.warn( | ||
'setting solver to "sparse_cg" because X is sparse') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better,
"solver={} does not support fitting the intercept on sparse data, "
"falling back to solver='sparse_cg'. To avoid this warning either change the solver "
"to 'sparse_cg' explicitly or set `fit_intercept=False`.
sklearn/linear_model/ridge.py
Outdated
@@ -545,6 +545,13 @@ def fit(self, X, y, sample_weight=None): | |||
accept_sparse=_accept_sparse, | |||
dtype=_dtype, | |||
multi_output=True, y_numeric=True) | |||
if sparse.issparse(X) and self.fit_intercept: | |||
solver = 'sparse_cg' | |||
if self.solver not in ['auto', 'sparse_cg']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solver resolution (e.g. for auto
is normally done in _ridge_regression
) maybe better to move this there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difficulty is that depending on fit_intercept
and whether X
is dense we
need to provide _ridge_regression
with X_offset
and X_scale
(computed in
preprocessing) or not:
scikit-learn/sklearn/linear_model/ridge.py
Line 571 in e871a56
params = {'X_offset': X_offset, 'X_scale': X_scale} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not have prevented someone to use 'sag' with fit_intercept=True. It's not broken per se it is just that it needs a lore more iterations than the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case should there be a warning? Users may be surprised by the number of
iterations they need to set:
>>> x, y = _make_sparse_offset_regression(n_samples=20, n_features=5, random_state=0)
>>> sp_ridge = Ridge(solver='sag', max_iter=10000000, tol=1e-8).fit(sparse.csr_matrix(x), y)
>>> ridge = Ridge(solver='sag', max_iter=10000000, tol=1e-8).fit(x, y)
>>> sp_ridge.n_iter_[0]
566250
>>> ridge.n_iter_[0]
100
>>> np.allclose(sp_ridge.intercept_, ridge.intercept_, rtol=1e-3)
False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agramfort I restored the possibility of using 'sag' and added a warning, let me know if I should remove the warning
assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
for solver in ['saga', 'lsqr', 'sag']: | ||
sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) | ||
assert_warns(UserWarning, sparse.fit, X_csr, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need to match the warning message (a lot of unrelated things can raise a UserWarning
), even if it means splitting checks for sag
and other solvers.
Also, I wonder if it isn't better to raise an exception on an unsupported solver rather than switch solver with a warning. |
thanks! I changed the warning to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we can consider it as a bug fix rather than a change of default.
So we will need an entry in what's new as a bug fix and document it in model changes as well.
assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
for solver in ['saga', 'lsqr', 'sag']: | ||
sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) | ||
assert_raises_regex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use pytest for this one
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@@ -371,7 +371,7 @@ def test_sag_regressor_computed_correctly(): | |||
n_samples = 40 | |||
max_iter = 50 | |||
tol = .000001 | |||
fit_intercept = True | |||
fit_intercept = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this @jeromedockes ? SAG fails to get a good intercept?
can you try initializing the intercept to np.mean(y_train) instead of 0 (if done like this now)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment it does fail to fit a good intercept (with the default tol
and
n_iter
)
the intercept is indeed initialized with zeros:
scikit-learn/sklearn/linear_model/ridge.py
Line 486 in c315bf9
init = {'coef': np.zeros((n_features + int(return_intercept), 1), |
but initializing it with the mean of y
probably won't change much since this
mean is 0
: y
is always assumed to be dense and centered in preprocessing.
The mean of X
is what causes the intercept to be nonzero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that we could revert to only allowing sparse_cg
for sparse data
in this PR to quickly fix the bug and unlock #13246 , and then see if support
for sparse data can be added to the sag
solver in a separate PR. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am +1 for this path. I think that we should ensure that we give proper results. We can investigate later how to fix SAG for this case. @agramfort WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW one of the reasons the intercept takes many iterations to converge can be this decay
scikit-learn/sklearn/linear_model/base.py
Line 43 in 4a6264d
SPARSE_INTERCEPT_DECAY = 0.01 |
scikit-learn/sklearn/linear_model/base.py
Line 97 in 4a6264d
return dataset, intercept_decay |
scikit-learn/sklearn/linear_model/sag.py
Line 299 in 4a6264d
dataset, intercept_decay = make_dataset(X, y, sample_weight, random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments.
dense = Ridge(alpha=1., tol=1.e-15, solver=solver, fit_intercept=True) | ||
sparse = Ridge(alpha=1., tol=1.e-15, solver=solver, fit_intercept=True) | ||
# for now only sparse_cg can fit an intercept with sparse X | ||
for solver in ['sparse_cg']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the for
loop. We will use the parametrization from pytest when we will reintroduce sag
assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
for solver in ['saga', 'lsqr', 'sag']: | ||
sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) | ||
with pytest.raises( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a bit more compact
err_msg = "solver='{}' does not support".format(solver)
with pytest.raises(ValueError, match=err_msg):
sparse.fit(X_csr, y)
sparse = Ridge(alpha=1., tol=1.e-15, solver=solver, fit_intercept=True) | ||
assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
for solver in ['saga', 'lsqr', 'sag']: | ||
sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid to call it sparse
. I think that we have the following import sometimes: from scipy import sparse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
sparse = Ridge(alpha=1., tol=1.e-15, solver=solver, fit_intercept=True) | ||
assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
for solver in ['saga', 'lsqr', 'sag']: | ||
sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
sklearn/linear_model/ridge.py
Outdated
@@ -545,6 +545,13 @@ def fit(self, X, y, sample_weight=None): | |||
accept_sparse=_accept_sparse, | |||
dtype=_dtype, | |||
multi_output=True, y_numeric=True) | |||
if sparse.issparse(X) and self.fit_intercept: | |||
solver = 'sparse_cg' | |||
if self.solver not in ['auto', 'sparse_cg']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not have prevented someone to use 'sag' with fit_intercept=True. It's not broken per se it is just that it needs a lore more iterations than the default value.
doc/whats_new/v0.22.rst
Outdated
@@ -109,6 +111,10 @@ Changelog | |||
of the maximization procedure in :term:`fit`. | |||
:pr:`13618` by :user:`Yoshihiro Uchida <c56pony>`. | |||
|
|||
- |Fix| :class:`linear_model.Ridge` now correctly fits an intercept when | |||
`X` is sparse, `solver="auto"` and `fit_intercept=True`. Setting the solver to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain that it is because the default solver is now sparse_cg
sklearn/linear_model/ridge.py
Outdated
'"sag" solver requires many iterations to fit ' | ||
'an intercept with sparse inputs. Either set the ' | ||
'solver to "auto" or "sparse_cg", or set a low ' | ||
'"tol" and a high "max_iter".') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what bothers me is that you will get the warning whatever you use for tol or max_iter. Maybe just warn if parameter used are the defaults?
# tol and max_iter, sag should raise a warning and is handled in | ||
# test_ridge_fit_intercept_sparse_sag | ||
# "auto" should switch to "sparse_cg" | ||
dense_ridge = Ridge(alpha=1., solver='sparse_cg', fit_intercept=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sparse_cg for dense_ridge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sparse_cg can fit both sparse and dense data. since both "auto" and "sparse_cg" should result in "sparse_cg" being used when X is sparse, the reference is "sparse_cg" with dense data, and Ridge(solver="auto") and Ridge(solver="sparse_cg"), fitted on sparse data, are compared to it
@@ -464,6 +464,7 @@ def test_sag_regressor(): | |||
y = 0.5 * X.ravel() | |||
|
|||
clf1 = Ridge(tol=tol, solver='sag', max_iter=max_iter, | |||
fit_intercept=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would revert changes to test if these are not necessary. Just catch the warning if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted them, there is no warning because in those tests the tol and max_iter used are not the default ones
Co-Authored-By: Alexandre Gramfort <alexandre.gramfort@m4x.org>
…edockes/scikit-learn into fix_ridge_solver_selection
# for now only sparse_cg can fit an intercept with sparse X with default | ||
# tol and max_iter, sag should raise a warning and is handled in | ||
# test_ridge_fit_intercept_sparse_sag | ||
# "auto" should switch to "sparse_cg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it this comment still relevant? I don't sag warnings caught here.
If you update the comment can you write what you answered to me just below? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! updated the comment. you don't see warnings here because sag's behaviour in this configuration is tested separately in test_ridge_fit_intercept_sparse_sag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait for another approval before merging. thx @jeromedockes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single nitpick and then I will merge. LGTM.
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Thanks @jeromedockes |
thanks a lot for the help and advice @glemaitre @agramfort and @rth |
Ridge
is failing the check introduced in #13246 to verify that estimators produce the same results for sparse and dense data.this PR enforces selecting the
sparse_cg
solver when X is sparse andfit_intercept=True
, since this solver is the only one to correctly fit an intercept with the Ridge defaulttol
andmax_iter
when X is sparse at the moment.@agramfort @glemaitre @ogrisel