-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
BUG: Fix SGD models(SGDRegressor etc.) convergence criteria #31856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Had to fix a test, that was not passing - large loss/objective spikes during convergence, due to tiny sample size.
At the end, epoch 8, did not converge. Same kind of issue with doctests, verbose output of doctest from _stochastic_gradient.py, SGDOneClassSVM, first without fix:
With PR:
Takes more epochs, but because too few samples per epoch, the iteration after which the optimization stops is pretty random. With tol=None, it converges:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @antoinebaker or @lorentzenchr could have a look.
This also needs a FIX changelog
@@ -2220,9 +2220,9 @@ class SGDOneClassSVM(OutlierMixin, BaseSGD): | |||
>>> import numpy as np | |||
>>> from sklearn import linear_model | |||
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) | |||
>>> clf = linear_model.SGDOneClassSVM(random_state=42) | |||
>>> clf = linear_model.SGDOneClassSVM(random_state=42, tol=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why these changes are necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to make the doctest pass. Tiny sample size of 4 with SGD leads to large loss fluctuations each epoch(each epoch only 4 updates), so the no improvement in 5 epochs default stopping criteria is meaningless. It "converges" by luck, so setting tol=None fixes that. Too few samples/updates per epoch to get meaningful estimate of loss. See the comment above #31856 (comment).
Reference Issues/PRs
Based on draft PR #30031. Closes #30027.
What does this implement/fix? Explain your changes.
Changes the SGD optimization loop in
sklearn/linear_model/_sgd_fast.pyx.tp
to use correct stopping criteria. Instead of using the raw error(loss), it now uses the full objective value. Full objective includes regularization for regression/classification, and the intercept term for one-class SVM model.This change prevents incorrect premature stopping of the optimization, often after 6 epochs. Especially pronounced with
SGDOneClassSVM
, but also affectsSGDRegressor
andSGDClassifier
.To implement, modifies the
WeightVector
class to also accumulate L1 norm. Calculates the objective value in the optimization loop.Also adds an additional test comparing SGDOneClassSVM to liblinear one-class SVM.
Before the fix(example from linked issue):
After the fix, model converges:
See linked issue for full code.
Any other comments?
This PR probably needs a changelog entry, since the output of SGD models(regressor, classifier, one class) can change for
tol != None
.