-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST tight and clean tests for Ridge #22910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4e1f189
to
6ed5de0
Compare
This is tested in test_ridge_fit_intercept_sparse_error.
6ed5de0
to
77268b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indeed a much better way of testing and the fact that we cannot recover the minimum norm solution is troublesome...
# But it is not the minimum norm solution. (This should be equal.) | ||
assert np.linalg.norm(np.r_[model.intercept_, model.coef_]) > np.linalg.norm( | ||
np.r_[intercept, coef] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this is not the minimum norm solution for any solver is surprising, no? Do you have any idea what could cause this discrepancy?
Is the difference in norms the same for all solvers? Or are some solvers significantly worse in that respect than others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to change alpha to a small positive value but this does not fully fix the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this is not the minimum norm solution for any solver is surprising, no? Do you have any idea what could cause this discrepancy?
I edited the PR text to explain this failure. It is connected to mean centering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the difference in norms the same for all solvers? Or are some solvers significantly worse in that respect than others?
Yes, it is the same norm for all solvers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited the PR text to explain this failure. It is connected to mean centering.
Thanks. Do you see an easy way out of this?
I think we can merge this PR that improves the test without fixing this bug. But if we do so, we should open a dedicated issue to document this problem and reference the issue URL where appropriate in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #22947.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can merge this PR
I would like that very much 😉
""" | ||
X, y, coef, _ = ols_ridge_dataset | ||
n_samples, n_features = X.shape | ||
alpha = 0 # OLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that doing something like:
alpha = 0 # OLS | |
alpha = 1e-30 # near OLS |
moves the solution closer to the minimum norm solution for most (all?) solvers, at least for some seed. But the difference in norms is still far from negligible (between 0.001 and 0.2) and very dependent on global_random_seed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider this 2 separate issues/tests:
- Test for exact OLS, i.e.
alpha=0
gives minimum norm solution. - Test that as
alpha
gets close to 0, one approaches the minimum norm solution.
New tests pass with |
@jeremiedbb You seem to be interested in linear models. Maybe you like this PR:wink: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 for merging this test-suite refactoring without waiting for the resolution of #22947.
/cc @agramfort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx @lorentzenchr
now this thing about mean centering affecting minimum norm solution puzzles me. I'll follow up in #22947
thx @lorentzenchr |
* MNT replace pinvh by solve * DOC more info for svd solver * TST rewrite test_ridge * MNT remove test_ridge_singular * MNT restructure into several tests * MNT remove test_toy_ridge_object * MNT remove test_ridge_sparse_svd This is tested in test_ridge_fit_intercept_sparse_error. * TST exclude cholesky from singular problem * CLN two fixes * MNT parametrize test_ridge_sample_weights * MNT restructure test_ridge_sample_weights * CLN tighten tolerance for sag solver * CLN try to fix saga tolerance * CLN make test_ridge_sample_weights nicer * MNT remove test_ridge_regression_sample_weights * MNT rename to test_ridge_regression_sample_weights * CLN make test_ridge_regression_unpenalized pass for all random seeds * CLN make tests pass for all random seeds * DOC fix typos * TST skip cholesky for singular problems * MNT move up test_ridge_regression_sample_weights * CLN set skip reason as comment
* MNT replace pinvh by solve * DOC more info for svd solver * TST rewrite test_ridge * MNT remove test_ridge_singular * MNT restructure into several tests * MNT remove test_toy_ridge_object * MNT remove test_ridge_sparse_svd This is tested in test_ridge_fit_intercept_sparse_error. * TST exclude cholesky from singular problem * CLN two fixes * MNT parametrize test_ridge_sample_weights * MNT restructure test_ridge_sample_weights * CLN tighten tolerance for sag solver * CLN try to fix saga tolerance * CLN make test_ridge_sample_weights nicer * MNT remove test_ridge_regression_sample_weights * MNT rename to test_ridge_regression_sample_weights * CLN make test_ridge_regression_unpenalized pass for all random seeds * CLN make tests pass for all random seeds * DOC fix typos * TST skip cholesky for singular problems * MNT move up test_ridge_regression_sample_weights * CLN set skip reason as comment
Reference Issues/PRs
None...yet.
What does this implement/fix? Explain your changes.
This PR restructures tests for
Ridge
, removes duplicate ones and adds very tight, general tests for correct solutions.Any other comments?
Found Bug
The new tests uncover that for wide/fat
X
, i.e.n_features > n_samples
,Ridge(fit_intercept=True)
does not yield the minimum norm solution. This is now reported in #22947.Reason
For wide
X
, the least squares problem reads a bit different:min ||w||_2 subject to Xw = y
with solutionw = X'(XX')^-1 y
, see e.g. http://ee263.stanford.edu/lectures/min-norm.pdf.With explicit intercept
w0
, this readsw = X'(XX' + 1 1')^-1 y
, where1
is a column vector of ones.w0 = 1'(XX' + 1 1')^-1 y
.This is incompatible with our mean centering approach.