-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH change Ridge tol to 1e-4 #24465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH change Ridge tol to 1e-4 #24465
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling that introducing a FutureWarning
will be more annoying than beneficial to most users.
I am +0 for this PR in its current state (that is with explicit documentation in the behavior change both in the changelog and the docstrings but without a FutureWarning
).
How to interpret an approval in combination with a "+0"? 😄 |
Ok +1 then ;) |
@agramfort any opinion on this and #19615? |
tol can mean many different things depending on the stopping criteria employed (gradient norm, gain on the train objective of one update etc.) So the argument of consistency among estimators is not very strong me. Now this being said 1e-3 seems high. @lorentzenchr did you benchmark on a few datasets the impact of this in terms of cross-val score and fit time? |
The impact is as follows:
It may be worth to add that to the docstring:smirk: Here a simple benchmark: import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import Ridge
X, y = make_regression(
n_samples=100_000,
n_features=100,
n_informative=100,
bias=10,
random_state=42,
)
def max_abs_diff(a, b):
return np.max(np.abs(a/b - 1))
%time svd = Ridge(solver="svd").fit(X, y) Wall time: 692 ms %time lsqr_3 = Ridge(solver="lsqr", tol=1e-3).fit(X, y) Wall time: 125 ms %time lsqr_4 = Ridge(solver="lsqr", tol=1e-4).fit(X, y) Wall time: 114 ms max_abs_diff(lsqr_3.coef_, svd.coef_), max_abs_diff(lsqr_4.coef_, svd.coef_) (8.287694934638878e-05, 2.798488532462784e-06) Same game for all solvers
I don't understand the results for sparse_cg. |
thanks @lorentzenchr a few remarks:
wdyt? |
With accuracy, I meant |
+1 |
Thanks for the experiments @lorentzenchr. I agree with your conclusions. |
Good job @lorentzenchr |
thanks @lorentzenchr |
* ENH set defaul tol of Ridge to 1e-4 instead of 1e-3 * MNT rename internal least square method in discrimant_analysis * DOC add versionchanged * DOC add whats_new entry * DOC add pr to whatsnew * DOC better versionchanged message
* ENH set defaul tol of Ridge to 1e-4 instead of 1e-3 * MNT rename internal least square method in discrimant_analysis * DOC add versionchanged * DOC add whats_new entry * DOC add pr to whatsnew * DOC better versionchanged message
Reference Issues/PRs
Closes #19615.
What does this implement/fix? Explain your changes.
This PR changes the defaul
tol
ofRidge
from1e-3
to1e-4
which is the default of many other linear models likeElasticNet
andLogisticRegression
.Any other comments?
Does this warrant a deprecation cycle? This change might change model results with default values, but only in a beneficial way. The downside is a potentially longer fit time.