Skip to content

Multi-core training using liblinear and libsvm #6245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mannby opened this issue Jan 28, 2016 · 7 comments
Open

Multi-core training using liblinear and libsvm #6245

mannby opened this issue Jan 28, 2016 · 7 comments

Comments

@mannby
Copy link

mannby commented Jan 28, 2016

Has anyone looked into supporting a multi-core training extension to liblinear and lbsvm (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/multicore-liblinear/)?

There seems to be a patch available to do this:

https://github.com/fidlr/sklearn-openmp
http://fidlr.org/post/137303264732/scikit-learn-017-with-libsvm-openmp-support

@hlin117
Copy link
Contributor

hlin117 commented Feb 8, 2016

I don't think it's plausible that we would add an openmp dependency to the project. But if the algorithm is "embarrassingly parallel" using joblib, that's another story.

@amueller
Copy link
Member

amueller commented Oct 7, 2016

actually, I think it's very plausible. It's just a question of when and how ;)

@mannby
Copy link
Author

mannby commented Oct 7, 2016

Note that #6448 not only provides parallelization of multi-class linear regression, but also manages the memory better than a naive parallelization would, since liblinear creates unnecessary copies, which become highly problematic on large datasets.

I've been using the patch (6448) for a long time now, and it's robust.

@amueller
Copy link
Member

amueller commented Oct 8, 2016

@mannby I think there is issues if people try to also use joblib because it currently doesn't support nested parallelism. I'm not sure what the right issue is to track that. maybe joblib/joblib#256 or #3754? @ogrisel and @GaelVaroquaux know more ;)

@rth
Copy link
Member

rth commented Jun 20, 2019

@alexhenrie In case you might be interested in this. We can now use OpenMP in the code base (although there are still some open questions about how to control the number of threads). A PR would be welcome.

Their patch is very large though, I was hoping that could be reduced a bit.

also cc @jeremiedbb

@jilljenn
Copy link
Contributor

Hi, would like to reconsider that one.
I see that sklearn has its own liblinear fork. Is it following upstream changes or not?
Because there seems to be a not-so-new multicore liblinear https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/multicore-liblinear/
So n_jobs could simply send the parameter to that one, instead of being used only in the multiclass case. Is that right?

Context: we were having memory problems on a large dataset (99M samples) when using LogisticRegression (with any solver, liblinear or saga). So we used sklearn's dump_svmlight_file and then multicore liblinear, and that worked fine.

@avramdjo
Copy link

avramdjo commented Jun 3, 2022

Hi, is there still no way to parallelize liblinear?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants