FIX the tests for convergence to the minimum norm solution of unpenalized ridge / OLS #25948

ogrisel · 2023-03-22T23:01:12Z

Fixes: #22947.
Related to: #22910.

The existing test assumes that we should recover the minimum norm solution to the OLS problem where the intercept fitting is implemented by adding an extra constant feature and its matching coef.

However this does not hold. In particular, the intercept component should not contribute to the coef norm in the least norm solution as explained in this new note added to the documentation. Furthermore this justifies our pre-centering strategy in _preprocess_data and generic _set_intercept in all linear regression models with a least squares data-fit term, whatever the regularization term.

TODO:

…eriori

sklearn/linear_model/tests/test_ridge.py

doc/modules/linear_model.rst

I did not mean to approve this PR, only to comment it.

…tivation in the intro of the section

ogrisel · 2023-03-24T10:57:31Z

doc/modules/linear_model.rst

@@ -122,7 +128,7 @@ its ``coef_`` member::
    >>> reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
    Ridge(alpha=0.5)
    >>> reg.coef_
-    array([0.34545455, 0.34545455])
+    array([0.34545..., 0.34545...])


Note: this was not actually needed to make the docstest pass: this PR does not change the Ridge implementation. But I think it makes the docstring more readable and consistent with the 5 digits we display for the intercept.

ogrisel · 2023-03-24T11:07:42Z

@agramfort @lorentzenchr @jjerphan I think the clarification of the motivations for the centering in _preprocess_data and post-hoc intercept computation in _set_intercept is ready to review. Here is the rendered docs:

https://output.circle-artifacts.com/output/job/6b1ee83a-3571-4f3b-930f-e819b7edc5e5/artifacts/0/doc/modules/linear_model.html#note-on-the-handling-of-the-intercept-in-linear-regression-models

I still need to think on how to update the ridge tests to be consistent with this for all cases. I probably won't have time to do it today though.

…ble values of global_random_seed

lorentzenchr · 2023-03-24T15:04:56Z

In my lately short style - but implicitly very appreciating —just a few points:

I think it does not solve BUG unpenalized Ridge does not give minimum norm solution #22947 because the math for centering with n_features>n_samples does not work out.
These notes only concern the implementation but should not concern a user. Should we really place a solution to a standard exercise of a stats/ml lecture in out user guide?

ogrisel · 2023-03-24T17:39:07Z

I think it does not solve #22947 because the math for centering with n_features>n_samples does not work out.

Can you be more precise?

EDIT: indeed there is a problem when I wrote "we recover:". There could be other values of w_0 that cause the sample wise sum equality to hold.

These notes only concern the implementation but should not concern a user. Should we really place a solution to a standard exercise of a stats/ml lecture in out user guide?

I agree. We can move them to another place (once fixed). But I wanted to benefit from the LaTeX rendering. An ascii version would be hard to read in my opinion.

Maybe we could move that to a maintainer-oriented section of the doc.

ogrisel · 2023-03-25T15:15:30Z

I have a proof sketch for the under-determined case using the method of Lagrange multipliers (for the minimization problem that does not include w_0 in the computation of the norm). I will try to finalize it on Monday.

ogrisel · 2023-03-27T14:01:44Z

Ok I pushed my proof for the underdetermined case here:

https://github.com/ogrisel/minimum-norm-ols/blob/main/minimum-norm-ols-intercept.pdf

/cc @lorentzenchr @agramfort @jjerphan

I will update the restructured text versions of the notes based on your feedback.

…external proof for now

jjerphan · 2023-03-27T14:26:20Z

Thank you for writing this down, @ogrisel.

To me, the derivation is correct, and we just need an argument showing that $X_c X_c^\top$ is invertible.

~~(Edit: also since is convex, the minimizer necessarily is the critical point and we directly can proceed with equivalences.)~~

ogrisel · 2023-03-27T14:31:22Z

These notes only concern the implementation but should not concern a user.

The only user impacting thing it that we would now guarantee that LinearRegression(with_intercept=True).fit(X, y).coef_ is the limit of Ridge(with_intercept=True, alpha=alpha).fit(X, y).coef_ when alpha goes to zero which was never made explicit before. But can just write it in the user guide and/or the docstring of Ridge's alpha and LinearRegression's "see also" section.

Should we really place a solution to a standard exercise of a stats/ml lecture in out user guide?

Well, standard lectures do not cover the case where we do not penalize the intercept or include it in the computation of the minimum norm. Making it explicit as I did my linked PDF is boring but not that easy to check, and we got the initial version of the fixture wrong because of this.

ogrisel · 2023-03-27T14:33:26Z

To me, the derivation is correct, and we just need an argument showing that
is invertible.

Indeed. The solution would still be valid with the pseudo-inverse instead of the inverse (this is what I have to do in the fixture to make it work with any value of global_random_seed) but I am too lazy to prove it :)

jjerphan · 2023-03-27T14:39:44Z

The minimizer of the objective function is a critical point, but the converse is not true generally. Here we proceed with the converse to derive the solution, but we still need an argument to show that there's an equivalence here.

ogrisel · 2023-03-27T15:11:51Z

The minimizer of the objective function is a critical point, but the converse is not true generally. Here we proceed with the converse to derive the solution, but we still need an argument to show that there's an equivalence here.

I think the equivalence comes from the Lagrange multiplier theorem:

https://en.wikipedia.org/wiki/Lagrange_multiplier#Statement

doc/modules/linear_model.rst

agramfort · 2023-04-09T08:42:14Z

doc/modules/linear_model.rst

+zero by setting :math:`\hat{w_0} = \bar{y} - \bar{X}^{T} \hat{w}`.
+
+Note that the same argument holds for any other penalized linear regression
+estimator (as long as the penalty is such that the solution is unique).


this explanation is nice and easy to follow.

When I do this in a class I say that at the optimum the gradient wrt w and w_0 should be zero and when you write the gradient wrt w_0 you get directly \hat{w_0} = \bar{y} - \bar{X}^{T} \hat{w}

sklearn/linear_model/_ridge.py

sklearn/linear_model/tests/test_ridge.py

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

ogrisel · 2023-04-14T19:58:48Z

Just for the record, I still plan to follow-up on this. Thanks @agramfort for the review. Do you have any opinion on how to document the under-determined case?

Do you think I should port the content of https://github.com/ogrisel/minimum-norm-ols/blob/main/minimum-norm-ols-intercept.pdf into our sphinx doc (assuming you agree with the content)?

lorentzenchr · 2023-04-14T21:06:08Z

@ogrisel I also plan to review this PR. From the glance I took, my main concern is the proposal to not include the intercept in the norm.

lorentzenchr · 2023-04-16T11:45:17Z

doc/modules/linear_model.rst


-.. math:: \min_{w} || X w - y||_2^2
+.. math:: \min_{w, w_0} || X w + w_0 - y||_2^2


We should make it then consistent throughout this document.

I agree. Do we introduce the 1_s vector everywhere for consistency?

I would not do it because it is harder to read and most people, I guess, will understand without it.

It could be used in a "mathematical details" section.

doc/modules/linear_model.rst

sklearn/linear_model/_ridge.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

ogrisel · 2023-04-21T10:01:32Z

@lorentzenchr @agramfort I finally took the time to conduct a numerical study of the impact of the problem formulation:

https://gist.github.com/ogrisel/18bbf02128a3890b03534d52b7fc8bd0#file-minimum_norm_ridge_limit-ipynb

The current formulation (ridge on centered data followed by analytical intercept computation) seems to beneficial for several reasons detailed in the notebooks. The main problems with what I named "type b" are:

the intercept estimate is numerically discontinuous when changing alpha to very low or large values, at least when keeping the rcond singular value threshold of the underlying SVD-based solver constant.
the high regularization value of the intercept is wrong, it does not converge to y_train.mean() as expected.
I don't see how it would be possible to get a continuous intercept when alpha goes to zero if the intercept is included in the computation of the minimum norm solution but not penalized in the ridge problem, even if we had not fixed the rcond parameter.

We could improve the notebook to check that those conclusions hold when changing the solver.

EDIT: I updated the linked notebook to add a ridge solvers based on lstsq instead of pinv and the conclusions still hold (although the lstsq solver seems slightly less stable for a reason I do not understand).

lorentzenchr · 2023-04-22T21:05:05Z

@ogrisel I‘ll have a deeper look at your great analysis. But I need some time. You could even consider to write a paper about it or a blog post.

ogrisel · 2023-04-22T21:11:55Z

Thanks. I won't have the time to dig deeper next week, but ideally we could try to understand better why/how the different solvers break on extreme values of the regularization parameter. We could also probably find out analytically why type b ridge (without intercept penalization) cannot reach the type b OLS (with intercept in the norm). We could do that either generally (for any X, y training set) or for some degenerate cases such as a single data point dataset: X = [[1.]], y = [1.0] for instance.

On the shorter term I would like to first finalize the updates of this PR to make sure that all solvers in scikit-learn behave correctly using the centered ridge formulation.

lorentzenchr · 2023-04-29T13:53:52Z

@ogrisel https://gist.github.com/ogrisel/18bbf02128a3890b03534d52b7fc8bd0#file-minimum_norm_ridge_limit-ipynb has a bug. For lstsq, one needs to take the square root of alpha. It then never fails and is faster than pinv, at least for smaller alpha.

I additionally added np.linalg.solve. It fails for alpha=0, but except for this case, it is as stable (well, X here is well behaved) as pinv and faster.

Fix + additional solver in https://github.com/lorentzenchr/notebooks/blob/master/minimum_norm_ridge_limit.ipynb.

My summary
I find the higher test R2 for the centered variant ("a") in addition to the smoother behaviour of the intercept as a function of alpha somehow striking.

github-actions · 2024-10-21T15:28:05Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 036552b. Link to the linter CI: here}

sklearn/linear_model/_base.py

ogrisel · 2024-10-22T16:04:37Z

@lorentzenchr I updated my copy of the notebook to include your fix + the numpy.linalg.solve solver and further added type "a" and type "b" solutions to the minimum norm OLS and ridge problems with the lsqr solver to the analysis.

For the latter, the results are very similar to what we observe with lstsq: for the type "b" formulation, with observe discontinuities (unstable) solution paths both for low and high values of alpha that are in turn detrimental to the test R2 performance. Type "a" is always stable, whatever the choice of the solver, except numpy.linalg.solve for small alpha values, but this is expected (as per its docstring).

I plan to resume the work on updating the tests in this PR and making them pass for all solvers and all data shapes using the type "a" formulation as reference solution for the minimum norm OLS problem.

Some related parallel fixes for LinearRegression:

sklearn/linear_model/tests/test_ridge.py

DOC document why we center linear regression and set intercept a-post…

d7d5ac0

…eriori

ogrisel mentioned this pull request Mar 22, 2023

The fit performance of LinearRegression is sub-optimal #22855

Open

ogrisel added 2 commits March 23, 2023 10:34

typo

abd5f4c

Reference the equivalence of solutions where appropriate in the code

977ad8b

lorentzenchr self-requested a review March 23, 2023 12:54

lorentzenchr changed the title ~~Fix least norm solution to unpenalized ridge / OLS when fit_intercept=True~~ FIX minimum norm solution of unpenalized ridge / OLS when fit_intercept=True Mar 23, 2023

ogrisel added 2 commits March 23, 2023 17:12

consistent notation for column vector dot products

1580e6c

WIP udpate test_ridge.py

c4d5e3b

ogrisel commented Mar 23, 2023

View reviewed changes

sklearn/linear_model/tests/test_ridge.py Outdated Show resolved Hide resolved

ogrisel commented Mar 23, 2023

View reviewed changes

doc/modules/linear_model.rst Outdated Show resolved Hide resolved

jjerphan previously approved these changes Mar 24, 2023

View reviewed changes

doc/modules/linear_model.rst Outdated Show resolved Hide resolved

ogrisel added 2 commits March 24, 2023 11:38

Update the doc to properly cover the underdetermined case and give mo…

06afd44

…tivation in the intro of the section

Fix header levels for the new section

a4c0295

ogrisel added the No Changelog Needed label Mar 24, 2023

ogrisel commented Mar 24, 2023

View reviewed changes

Use pinv to deal with singular X_c @ X_c.T in fixture for all admissi…

8475086

…ble values of global_random_seed

Merge branch 'main' into least-norm-ridge

ddd8f56

remove incorrect derivation for the underdetermined case and link to …

303dbe0

…external proof for now

Improve / fix the docstring of Ridge based on the findings of this PR

39a60e6

Fix typo in last commit

831137a

agramfort reviewed Apr 9, 2023

View reviewed changes

Typos and style

816d6bd

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

lorentzenchr reviewed Apr 16, 2023

View reviewed changes

ogrisel and others added 4 commits April 16, 2023 17:50

Typos / grammar fixes

bc4f8d6

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

More explicit phrasing

102bfd8

Take @agramfort's suggestion into account

00e4159

equation => expression

5ebfcfa

ogrisel mentioned this pull request Jul 11, 2023

MNT Remove DeprecationWarning for scipy.sparse.linalg.cg tol vs rtol argument #26814

Merged

ogrisel mentioned this pull request Jan 12, 2024

FEA ridge benchmarks soda-inria/sklearn-engine-benchmarks#18

Merged

8 tasks

This was referenced Oct 8, 2024

FIX LinearRegression sample weight bug (numpy solver) #30030

Closed

Fix LinearRegression's numerical stability on rank deficient data by setting the cond parameter in the call to scipy.linalg.lstsq #30040

Merged

Merge branch 'main' into least-norm-ridge

1730aa1

ogrisel commented Oct 21, 2024

View reviewed changes

sklearn/linear_model/_base.py Outdated Show resolved Hide resolved

Post merge conflict fix.

0305814

ogrisel added 3 commits October 22, 2024 18:12

Fix grammar and improve phrasing in linear_model.rst

d567350

Fix tests for unpenalized case

2375dfe

Add some TODOs for a latter PR

e73ac96

ogrisel commented Oct 23, 2024

View reviewed changes

sklearn/linear_model/tests/test_ridge.py Outdated Show resolved Hide resolved

ogrisel added 2 commits October 23, 2024 10:26

Remove wrong comment.

1795e3c

Improve notes in the docstring of the fixture itself

036552b

ogrisel changed the title ~~FIX minimum norm solution of unpenalized ridge / OLS when fit_intercept=True~~ FIX the tests for convergence to the minimum norm solution of unpenalized ridge / OLS Oct 23, 2024


		.. math:: \min_{w} \|\| X w - y\|\|_2^2
		.. math:: \min_{w, w_0} \|\| X w + w_0 - y\|\|_2^2

Uh oh!

FIX the tests for convergence to the minimum norm solution of unpenalized ridge / OLS #25948

Are you sure you want to change the base?

FIX the tests for convergence to the minimum norm solution of unpenalized ridge / OLS #25948

Conversation

ogrisel commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel Mar 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 24, 2023

Uh oh!

lorentzenchr commented Mar 24, 2023

Uh oh!

ogrisel commented Mar 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Mar 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Mar 27, 2023

Uh oh!

jjerphan commented Mar 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Mar 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Mar 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjerphan commented Mar 27, 2023

Uh oh!

ogrisel commented Mar 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

agramfort Apr 9, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Apr 14, 2023

Uh oh!

lorentzenchr commented Apr 14, 2023

Uh oh!

lorentzenchr Apr 16, 2023

Choose a reason for hiding this comment

Uh oh!

ogrisel Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Apr 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Apr 22, 2023

Uh oh!

ogrisel commented Apr 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

ogrisel commented Mar 22, 2023 •

edited

Loading

ogrisel Mar 24, 2023 •

edited

Loading

ogrisel commented Mar 24, 2023 •

edited

Loading

ogrisel commented Mar 25, 2023 •

edited

Loading

jjerphan commented Mar 27, 2023 •

edited

Loading

ogrisel commented Mar 27, 2023 •

edited

Loading

ogrisel commented Mar 27, 2023 •

edited

Loading

ogrisel commented Mar 27, 2023 •

edited

Loading

ogrisel commented Apr 21, 2023 •

edited

Loading

ogrisel commented Apr 22, 2023 •

edited

Loading

lorentzenchr commented Apr 29, 2023 •

edited

Loading

github-actions bot commented Oct 21, 2024 •

edited

Loading

ogrisel commented Oct 22, 2024 •

edited

Loading