ENH change Ridge tol to 1e-4 #24465

lorentzenchr · 2022-09-18T10:36:01Z

Reference Issues/PRs

Closes #19615.

What does this implement/fix? Explain your changes.

This PR changes the defaul tol of Ridge from 1e-3 to 1e-4 which is the default of many other linear models like ElasticNet and LogisticRegression.

Any other comments?

Does this warrant a deprecation cycle? This change might change model results with default values, but only in a beneficial way. The downside is a potentially longer fit time.

ogrisel

I have the feeling that introducing a FutureWarning will be more annoying than beneficial to most users.

I am +0 for this PR in its current state (that is with explicit documentation in the behavior change both in the changelog and the docstrings but without a FutureWarning).

sklearn/linear_model/_ridge.py

lorentzenchr · 2022-09-19T14:10:40Z

How to interpret an approval in combination with a "+0"? 😄
My motivation is to close #19615 and I think this PR does it.

ogrisel · 2022-09-23T15:26:56Z

How to interpret an approval in combination with a "+0"? 😄
My motivation is to close #19615 and I think this PR does it.

Ok +1 then ;)

ogrisel · 2022-09-23T15:28:01Z

@agramfort any opinion on this and #19615?

agramfort · 2022-09-23T20:37:57Z

tol can mean many different things depending on the stopping criteria employed (gradient norm, gain on the train objective of one update etc.) So the argument of consistency among estimators is not very strong me. Now this being said 1e-3 seems high. @lorentzenchr did you benchmark on a few datasets the impact of this in terms of cross-val score and fit time?

lorentzenchr · 2022-09-24T15:54:15Z

The impact is as follows:

auto: see lbfgs, sag, cholesky, sparse_cg
svd: tol has no impact
cholesky: tol has no impact
lsqr: tol is set as atol and btol of lsmr, which specifies conditions involving the residual and norms of matrix and coefficients.
sparse_cg: relative or absolute residual smaller tol
sag and saga: change of coef smaller than tol
lbfgs: (projected) gradient=residual smaller tol

It may be worth to add that to the docstring:smirk:

Here a simple benchmark:

import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import Ridge


X, y = make_regression(
    n_samples=100_000,
    n_features=100,
    n_informative=100,
    bias=10,
    random_state=42,
)

def max_abs_diff(a, b):
    return np.max(np.abs(a/b - 1))

%time svd = Ridge(solver="svd").fit(X, y)

Wall time: 692 ms

%time lsqr_3 = Ridge(solver="lsqr", tol=1e-3).fit(X, y)

Wall time: 125 ms

%time lsqr_4 = Ridge(solver="lsqr", tol=1e-4).fit(X, y)

Wall time: 114 ms

max_abs_diff(lsqr_3.coef_, svd.coef_), max_abs_diff(lsqr_4.coef_, svd.coef_)

(8.287694934638878e-05, 2.798488532462784e-06)

Same game for all solvers

Solver	time 1e-3	time 1e-4	accuracy 1e-3	accuracy 1e-4
lsmr	125 ms	125 ms	8.29e-05	2.80e-06
sparse_cg	126 ms	116 ms	0.00153	0.00153
sag	5.54 s	7.92 s	0.0341	0.00374
saga	1.57 s	2.11 s	0.0109	0.000807
lbfgs	115 ms	125 ms

I don't understand the results for sparse_cg.

agramfort · 2022-09-25T20:18:30Z

thanks @lorentzenchr

a few remarks:

what you call here accuracy is R2 and if so the number are super low suggesting that despite the n_samples >> n_features it does not learn much
the fact that the R2 are quiet different suggests solvers do not fully converge with a tol=1-4. The story may be simpler if the problem is easier and R2 is closer to 1.
the fact that you tests with n_samples >> n_features is necessary for SAG and SAGA to be descent solvers but it makes the problem very well posed with excellent conditioning. Picture could be different with n_features > n_samples.

wdyt?

lorentzenchr · 2022-09-25T20:47:48Z

With accuracy, I meant max_abs_diff, i.e. the maximum of relative differences of coefficients wrt the solution by svd. It shows that tol=1e-4 comes much closer to the correct values. Only sparse_cg seems odd.
These results are good evidence that decreasing default tol from 1e-3 to 1e-4 is beneficial for the found solutions (closer to correct values) with a comparatively small impact on runtime.

ogrisel · 2022-09-26T07:47:51Z

It may be worth to add that to the docstring😏

+1

ogrisel · 2022-09-26T07:51:13Z

Thanks for the experiments @lorentzenchr. I agree with your conclusions.

haiatn · 2022-10-04T22:37:13Z

Good job @lorentzenchr

agramfort · 2022-10-06T09:29:32Z

thanks @lorentzenchr

* ENH set defaul tol of Ridge to 1e-4 instead of 1e-3 * MNT rename internal least square method in discrimant_analysis * DOC add versionchanged * DOC add whats_new entry * DOC add pr to whatsnew * DOC better versionchanged message

lorentzenchr added 2 commits September 18, 2022 12:29

ENH set defaul tol of Ridge to 1e-4 instead of 1e-3

b558eab

MNT rename internal least square method in discrimant_analysis

70be289

github-actions bot added the module:linear_model label Sep 18, 2022

lorentzenchr changed the title ~~Ridge tol~~ Change Ridge tol to 1e-4 Sep 18, 2022

lorentzenchr changed the title ~~Change Ridge tol to 1e-4~~ ENH change Ridge tol to 1e-4 Sep 18, 2022

lorentzenchr added 3 commits September 18, 2022 12:39

DOC add versionchanged

ea5c29a

DOC add whats_new entry

69244f3

DOC add pr to whatsnew

adb352e

ogrisel approved these changes Sep 19, 2022

View reviewed changes

sklearn/linear_model/_ridge.py Outdated Show resolved Hide resolved

sklearn/linear_model/_ridge.py Outdated Show resolved Hide resolved

DOC better versionchanged message

815d795

agramfort approved these changes Oct 6, 2022

View reviewed changes

agramfort merged commit ab5ba84 into scikit-learn:main Oct 6, 2022

lorentzenchr deleted the ridge_tol branch October 6, 2022 15:45

This was referenced Oct 6, 2022

[RFC] Improve the Ridge solver speed / convergence tradeoff with better solver defaults #19615

Open

Clarification of stopping criteria tol of iterative solvers #22243

Closed

rprkh mentioned this pull request Feb 2, 2023

DOC impact of tol for solvers in RidgeClassifier #25530

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH change Ridge tol to 1e-4 #24465

ENH change Ridge tol to 1e-4 #24465

Uh oh!

lorentzenchr commented Sep 18, 2022 •

edited

Loading

Uh oh!

ogrisel left a comment

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Sep 19, 2022 •

edited

Loading

Uh oh!

ogrisel commented Sep 23, 2022

Uh oh!

ogrisel commented Sep 23, 2022

Uh oh!

agramfort commented Sep 23, 2022

Uh oh!

lorentzenchr commented Sep 24, 2022 •

edited

Loading

Uh oh!

agramfort commented Sep 25, 2022

Uh oh!

lorentzenchr commented Sep 25, 2022

Uh oh!

ogrisel commented Sep 26, 2022

Uh oh!

ogrisel commented Sep 26, 2022

Uh oh!

haiatn commented Oct 4, 2022

Uh oh!

agramfort commented Oct 6, 2022

Uh oh!

Uh oh!

Uh oh!

ENH change Ridge tol to 1e-4 #24465

ENH change Ridge tol to 1e-4 #24465

Uh oh!

Conversation

lorentzenchr commented Sep 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Sep 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Sep 23, 2022

Uh oh!

ogrisel commented Sep 23, 2022

Uh oh!

agramfort commented Sep 23, 2022

Uh oh!

lorentzenchr commented Sep 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agramfort commented Sep 25, 2022

Uh oh!

lorentzenchr commented Sep 25, 2022

Uh oh!

ogrisel commented Sep 26, 2022

Uh oh!

ogrisel commented Sep 26, 2022

Uh oh!

haiatn commented Oct 4, 2022

Uh oh!

agramfort commented Oct 6, 2022

Uh oh!

Uh oh!

lorentzenchr commented Sep 18, 2022 •

edited

Loading

lorentzenchr commented Sep 19, 2022 •

edited

Loading

lorentzenchr commented Sep 24, 2022 •

edited

Loading