-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
FIX use objective instead of loss for convergence in SGD #30031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@ogrisel @jeremiedbb @antoinebaker I'm not myself super familiar with the SGD and I would be happy to have some eyes on this PR if you thinking that this would go in the right direction or it does not make any sense. |
To me it makes sense to set the stopping criterion on the objective function instead of the loss, at least because of the way convergence is checked here, i.e. small enough decrease. Indeed, the objective function is guaranteed to decrease at each epoch which is not the case for the loss. So there are 2 ways to fix this: use the objective function or check on a small enough change instead of decrease. I think the first option chose in this PR is fine. Looking at the diff here, it looks like the tol is compared with the absolute difference of the loss/objective. It's not user friendly because scaling the dataset by some factor will lead to a different result if you keep the same tol. I really think that we should compare the tol against a relative diff of the objective function. |
This is a good point and we should fix it. |
@glemaitre May I take this over? I think I can finish this off, moving the objective to the loop for proper averaging, and other stuff. |
closes #30027
When no early-stopping is used in SGD estimator, the stopping criterion should be based on the value of the objective function (loss function + regularization terms). However, currently only the loss function is used to decide whether or not the estimator converged.
So as a solution here, I proposed to modify the
WeightVector
Cython class such that it also compute the L1-norm on the fly and then in the case that early-stopping is not activated, then we compute the objective function instead of solely the loss function and use it as a stopping criterion.Things that I'm not sure at this stage: