TST tight and clean tests for Ridge #22910

lorentzenchr · 2022-03-20T21:51:54Z

Reference Issues/PRs

None...yet.

What does this implement/fix? Explain your changes.

This PR restructures tests for Ridge, removes duplicate ones and adds very tight, general tests for correct solutions.

Any other comments?

Found Bug

The new tests uncover that for wide/fat X, i.e. n_features > n_samples, Ridge(fit_intercept=True) does not yield the minimum norm solution. This is now reported in #22947.

Reason

For wide X, the least squares problem reads a bit different: min ||w||_2 subject to Xw = y with solution w = X'(XX')^-1 y, see e.g. http://ee263.stanford.edu/lectures/min-norm.pdf.
With explicit intercept w0, this reads w = X'(XX' + 1 1')^-1 y, where 1 is a column vector of ones. w0 = 1'(XX' + 1 1')^-1 y.
This is incompatible with our mean centering approach.

This is tested in test_ridge_fit_intercept_sparse_error.

ogrisel

It's indeed a much better way of testing and the fact that we cannot recover the minimum norm solution is troublesome...

sklearn/linear_model/tests/test_ridge.py

ogrisel · 2022-03-23T13:25:01Z

sklearn/linear_model/tests/test_ridge.py

+        # But it is not the minimum norm solution. (This should be equal.)
+        assert np.linalg.norm(np.r_[model.intercept_, model.coef_]) > np.linalg.norm(
+            np.r_[intercept, coef]
+        )


The fact that this is not the minimum norm solution for any solver is surprising, no? Do you have any idea what could cause this discrepancy?

Is the difference in norms the same for all solvers? Or are some solvers significantly worse in that respect than others?

I tried to change alpha to a small positive value but this does not fully fix the problem.

The fact that this is not the minimum norm solution for any solver is surprising, no? Do you have any idea what could cause this discrepancy?

I edited the PR text to explain this failure. It is connected to mean centering.

Is the difference in norms the same for all solvers? Or are some solvers significantly worse in that respect than others?

Yes, it is the same norm for all solvers.

I edited the PR text to explain this failure. It is connected to mean centering.

Thanks. Do you see an easy way out of this?

I think we can merge this PR that improves the test without fixing this bug. But if we do so, we should open a dedicated issue to document this problem and reference the issue URL where appropriate in the tests.

I opened #22947.

I think we can merge this PR

I would like that very much 😉

sklearn/linear_model/tests/test_ridge.py

ogrisel · 2022-03-23T14:32:23Z

sklearn/linear_model/tests/test_ridge.py

+    """
+    X, y, coef, _ = ols_ridge_dataset
+    n_samples, n_features = X.shape
+    alpha = 0  # OLS


It seems that doing something like:

Suggested change

alpha = 0 # OLS

alpha = 1e-30 # near OLS

moves the solution closer to the minimum norm solution for most (all?) solvers, at least for some seed. But the difference in norms is still far from negligible (between 0.001 and 0.2) and very dependent on global_random_seed.

I consider this 2 separate issues/tests:

Test for exact OLS, i.e. alpha=0 gives minimum norm solution.

Test that as alpha gets close to 0, one approaches the minimum norm solution.

lorentzenchr · 2022-03-23T17:10:44Z

New tests pass with SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest sklearn/linear_model/tests/test_ridge.py.

lorentzenchr · 2022-03-26T08:00:53Z

@jeremiedbb You seem to be interested in linear models. Maybe you like this PR:wink:

ogrisel

LGTM. +1 for merging this test-suite refactoring without waiting for the resolution of #22947.

ogrisel · 2022-03-30T08:29:12Z

/cc @agramfort

agramfort

thx @lorentzenchr

now this thing about mean centering affecting minimum norm solution puzzles me. I'll follow up in #22947

agramfort · 2022-03-31T07:47:54Z

thx @lorentzenchr

* MNT replace pinvh by solve * DOC more info for svd solver * TST rewrite test_ridge * MNT remove test_ridge_singular * MNT restructure into several tests * MNT remove test_toy_ridge_object * MNT remove test_ridge_sparse_svd This is tested in test_ridge_fit_intercept_sparse_error. * TST exclude cholesky from singular problem * CLN two fixes * MNT parametrize test_ridge_sample_weights * MNT restructure test_ridge_sample_weights * CLN tighten tolerance for sag solver * CLN try to fix saga tolerance * CLN make test_ridge_sample_weights nicer * MNT remove test_ridge_regression_sample_weights * MNT rename to test_ridge_regression_sample_weights * CLN make test_ridge_regression_unpenalized pass for all random seeds * CLN make tests pass for all random seeds * DOC fix typos * TST skip cholesky for singular problems * MNT move up test_ridge_regression_sample_weights * CLN set skip reason as comment

github-actions bot added the module:linear_model label Mar 20, 2022

lorentzenchr added the No Changelog Needed label Mar 20, 2022

lorentzenchr force-pushed the clean_ridge_tests branch from 4e1f189 to 6ed5de0 Compare March 21, 2022 11:35

lorentzenchr marked this pull request as draft March 22, 2022 17:23

lorentzenchr added 13 commits March 22, 2022 22:22

MNT replace pinvh by solve

a13bfa3

DOC more info for svd solver

d256b5a

TST rewrite test_ridge

0fc7f3b

MNT remove test_ridge_singular

c61d607

MNT restructure into several tests

de36810

MNT remove test_toy_ridge_object

36eee8f

MNT remove test_ridge_sparse_svd

070ba8b

This is tested in test_ridge_fit_intercept_sparse_error.

TST exclude cholesky from singular problem

69ae791

CLN two fixes

8241d25

MNT parametrize test_ridge_sample_weights

a4fb6ea

MNT restructure test_ridge_sample_weights

f1041f3

CLN tighten tolerance for sag solver

11ccbc7

CLN try to fix saga tolerance

77268b9

lorentzenchr force-pushed the clean_ridge_tests branch from 6ed5de0 to 77268b9 Compare March 22, 2022 21:23

lorentzenchr added 3 commits March 22, 2022 23:12

CLN make test_ridge_sample_weights nicer

72d0962

MNT remove test_ridge_regression_sample_weights

2b24cf8

MNT rename to test_ridge_regression_sample_weights

0ad5352

lorentzenchr marked this pull request as ready for review March 22, 2022 22:16

lorentzenchr added 2 commits March 22, 2022 23:31

CLN make test_ridge_regression_unpenalized pass for all random seeds

a0fe6c0

CLN make tests pass for all random seeds

e3f8cee

ogrisel reviewed Mar 23, 2022

View reviewed changes

lorentzenchr added 3 commits March 23, 2022 17:36

DOC fix typos

6ecf566

TST skip cholesky for singular problems

085bd47

MNT move up test_ridge_regression_sample_weights

b383636

CLN set skip reason as comment

ee0904a

lorentzenchr mentioned this pull request Mar 25, 2022

BUG unpenalized Ridge does not give minimum norm solution #22947

Open

Merge branch 'main' into clean_ridge_tests

c1bc41c

lorentzenchr added this to the 1.1 milestone Mar 29, 2022

ogrisel approved these changes Mar 30, 2022

View reviewed changes

ogrisel added the Waiting for Reviewer label Mar 30, 2022

agramfort approved these changes Mar 31, 2022

View reviewed changes

agramfort merged commit 6528e14 into scikit-learn:main Mar 31, 2022

lorentzenchr deleted the clean_ridge_tests branch April 1, 2022 06:44

jeremiedbb mentioned this pull request Apr 1, 2022

test_ridge_regression_vstacked_X is not stable #23014

Closed

lorentzenchr mentioned this pull request Apr 11, 2022

[RFC] Improve the Ridge solver speed / convergence tradeoff with better solver defaults #19615

Open

5 tasks

lorentzenchr mentioned this pull request Apr 21, 2022

Investigate SAG/SAGA solver #23180

Open

lorentzenchr mentioned this pull request Jun 3, 2022

FEA add Cholesky based Newton solver to GLMs #23314

Closed

lorentzenchr mentioned this pull request Jun 14, 2022

TST tight tests for GLMs #23619

Merged

ogrisel mentioned this pull request Mar 28, 2023

FIX the tests for convergence to the minimum norm solution of unpenalized ridge / OLS #25948

Draft

16 tasks

Uh oh!

TST tight and clean tests for Ridge #22910

TST tight and clean tests for Ridge #22910

Uh oh!

Conversation

lorentzenchr commented Mar 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Found Bug

Reason

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Mar 23, 2022

Uh oh!

lorentzenchr commented Mar 26, 2022

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 30, 2022

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

agramfort commented Mar 31, 2022

Uh oh!

Uh oh!

lorentzenchr commented Mar 20, 2022 •

edited

Loading