-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Liblinear convergence failure everywhere? #11536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also, it's pretty weird that SVC has |
good question. Looking at liblinear code it appears we don't expose the different stopping criteria they have and we added a max_iter parameter they don't seem to have. I have no idea why it was set to 1000. Was there any benchmark done? |
Not that I can remember... |
No strong opinion. Too many warnings means that users don't look at warnings. The only thing that I can suggest is adding an option to control this behavior. I don't really like adding options (the danger is to have too many), but it seems here that there is no one-size-suits-all choice. |
we could also increase the tol? @ogrisel asked if we have this warning for logistic regression as well or if we ignore it there. |
Does this issue also happen with the LogisticRegression class? |
I am -1 on increasing the tol: it will mean that many users will wait longer. I think that there should be an option to control convergence warnings. |
increasing the tol meaning a larger tol. So if anything people will wait shorter. |
increasing the tol meaning a larger tol. So if anything people will wait shorter.
OK, I had understood you the wrong way.
+1 for that option.
|
Working on this. |
@ogrisel indeed |
As discussed with @agramfort I am a bit skeptical regarding bumping the
|
This is only about liblinear, so where the tol is 0.0001 right now. So it would be making it more consistent. We should probably run some benchmarks, though? |
Ah indeed, so maybe this is not as complex as I first thought. |
@samronsin yeah that would be good I think. This seems one of the last release blockers? |
btw the change that prompted all this is #10881 which basically was just a change in verbosity :-/ |
btw using the default solver, tol is 0.1 (!) in liblinear by default. https://github.com/cjlin1/liblinear/blob/master/README |
Wow. So many nice surprises in Liblinear...
|
@jnothman this is mostly our wrapper that has the surprises, I think? |
tbh I've not looked into it...
|
The liblinear command line actually has various tolerance defaults, depending on the sub-solver that is used. Do we want to use those? That would probably require switching the default to |
I want us to think about this after release :)
|
@jnothman I think we need to benchmark but possibly? |
|
Everywhere in the sklearn docs you specifically warn users that they need to scale data before use with many classifiers, etc. If one sets from sklearn.datasets import load_digits
from sklearn.svm import LinearSVC
digits = load_digits()
p = Pipeline([('s', StandardScaler()),
('c', LinearSVC(tol=1e-1, max_iter=1000))])
p.fit(digits.data, digits.target) |
@hermidalc just to be sure, are you running Windows or an Unix-like ? Indeed there is a known issue with windows (#13511 )- but it happens only when the number of features or samples is very large, so I guess this is not the issue you're facing. |
Linux. The only issue I've faced is the |
Don't want to add more to the pot... but is the convergence warning also OS specific because it should behave differently on each OS? I assumed not, but based on my findings it seems to be. I've tested on macOS 10.15.2 (Catalina) vs Linux Fedora 30. I ran the snap code from -> #11536 (comment) by @amueller and as you can see below for macOS that error does not show, but on linux it does show that error (same code!!!). I am not sure as the why? Is it because there might be different versions of Tested in both python major versions with old and recent libs and the results were the same.
mac result
fedora result
Any thoughts? |
It depends a bit on what you mean by "difficult". You could probably do something like #15583 and solve the original optimization problem quite well. I'm not saying it's a good idea to not scale your data, I'm just saying it's totally possible to solve the optimization problem well despite the user giving you badly scaled data if your optimization algorithm is robust enough. |
Sorry, what I was implying by difficult is relevant to this thread’s topic, meaning solving the optimization problem below a specific tolerance at or before a maximum number of iterations. Features that aren’t scaled make this harder to do with SVM unless, as you said, you use a very robust algorithm to solve the optimization problem. I thought LIBLINEAR uses coordinate descent isn’t this pretty robust? |
yes coordinate descent is pretty robust to data scaling.
… |
Liblinear has several solvers. I think they use their own TRON (trust region newton) by default. |
Also: we just changed our default away from liblinear... The question which kind of problems are "hard" is likely to depend on the solver, I think, or how you formulate the problem. |
@amueller could you please point me to the corresponding issue/pr ? I did not see that in the master codebase. Thanks! |
@smarie I was referring to logisticregression: |
Ah ok I mistakenly thought that this was about SVC. Thanks! |
To come back to this here’s some additional evidence challenging this belief when it comes to practical usage: From the creators of LIBSVM and LIBLINEAR: Section 2.2 Scaling |
@hermidalc I observed it to be a bit more stable than lbfgs in some settings I tried, see the preconditioning issue & pr. I'm not entirely sure how we can make the user experience better here :-/ I've seen plenty of convergence issues even after scaling, but I haven't had the time to compose them. |
I'm trying to remove issues which have been around for more than 2 releases from the milestones. But this one seems to be pressing and you really care about it @amueller . Leaving it in the milestone for 0.24, but we really should be better at following up on these. |
I have to say @amueller I do agree with you more now. With various high-dimensional datasets I've been working with these last few months, I've been seeing frequent convergence issues with The exact same workflows with The problem is that only Maybe the latest LIBLINEAR code has updates/fixes that have corrected what is the underlying problem? Looks like the main liblinear code in sklearn is from back in 2014. |
Recently we made liblinear report convergence failures.
Now this is reported in lots of places. I expect our users will start to see it everywhere. Should we change something? It's weird if most uses of LinearSVC result in a "failure" now.
The text was updated successfully, but these errors were encountered: