[MRG] Support multi-threading of LibLinear L1 one-vs-rest LogisticRegression for # classes > 2 #6448

mannby · 2016-02-25T03:32:46Z

This branch subsumes the changes made to support larger feature sparse arrays. I don't know of a clean, simple way of separating the two without creating a new fork.
I suspect that if this change is of interest, the maintainers may want to change how the functionality is interfaced in the API, which may require a new branch anyway.

…63). This is needed for very large training sets. Feature indices (based on the number of distinct features), are unlikely to need 4 bytes per value, however.

Tweak comments

… multi-class liblinear classification.

…ti-class one-vs-rest invocations

agramfort · 2016-02-25T18:50:27Z

are all the changes to liblinear coming from upstream?

mannby · 2016-02-25T19:23:51Z

No, they're local modifications. There are liblinear forks on github, but I didn't find a convenient place to make changes, and actually wasn't 100% sure of the authoritative source.

agramfort · 2016-02-26T10:30:36Z

I am not sure what our policy currently is. @ogrisel ? @amueller ?

TomDLT · 2016-02-26T14:43:45Z

Liblinear has recently been patched to handle sample weights #5274. Yet it was smaller than this patch.

mannby · 2016-02-26T18:21:37Z

To better understand what this patch does, it:

Adds wrapper functions, to start/join threads using pthreads, that support Windows, OS X and other Unixes, as well as wrappers for mutexes and semaphores.
Targets only two liblinear solvers and cases where there are more than two classes.
Saves memory by performing a transposition that only applies to these two solvers once instead of once per invocation of train_one.
Starts up n threads, and has them chew away at the independent one-vs-rest training for each of the classes until they're all done.

Since there already is an n_jobs parameter on the LogisticRegression class, and to give the API control over whether threads are used, and how many, I introduced a new parameter on the LR class called n_threads, since this parallelization may or may not be desirable in conjunction with the grid search parallelization provided by n_jobs. If other scikit-learn classes than LogisticRegression use liblinear, presumably the n_threads parameter is defaulting to 1, because it compiles fine.

It's been tested many times on OS X and Ubuntu 15.10 on datasets that used to take over 7 days to train, and which now takes about 11 hours using 48 cores.

I added a compile switch to turn off the pthreads dependency completely, in case there are compilation issues under some circumstances.

There is also parallelization available for single runs of train_one, e.g. if one only has 2 target classes, but these are still somewhat experimental. The two best contenders in that space that I've found are Shotgun and Bundle CDN. But in my case, I'm happy with just parallelizing across different classes.

TODO: n_threads=-1 case not implemented as documented

This reverts commit 907ba05.

This reverts commit f98498c.

…into experiment-2016-02

adrinjalali · 2024-04-19T13:53:27Z

Since as our docstring now states, liblinear is only good for small datasets, I don't think we need to add this as other solvers handle parallelism.

Claes-Fredrik Mannby added 8 commits January 19, 2016 13:08

Support new scipy sparse array indices, which can now be > 2^31 (< 2^…

ddcb64d

…63). This is needed for very large training sets. Feature indices (based on the number of distinct features), are unlikely to need 4 bytes per value, however.

Also increase size of integer values in indptr in the next step.

c75c0b8

Use long for both arrays if scipy >= 0.14.

3ec2503

Tweak comments

Support hard-wired number of threads while processing one-versus-rest…

2bd203b

… multi-class liblinear classification.

Reduce memory consumption for L1 solvers, especially for parallel mul…

35806f2

…ti-class one-vs-rest invocations

Add previously ignored source file

dc8e3b8

Add parameter n_threads to LogisticRegression to control multi-threading

459816b

Isolate threading code for easy compilation without it

93c2967

Claes-Fredrik Mannby added 17 commits March 1, 2016 13:41

Define thread return properly for Windows

345cbfd

Fix n_threads=1 case

484334c

TODO: n_threads=-1 case not implemented as documented

Make n_threads optional, to fix regression tests

72914f4

Fix doctest to accommodate new parameter

68a964b

Fix doctest to accommodate new parameter

e7b57d8

Fix Windows build issue

5467e94

Fix Windows build issue

e0e3cb7

Fix flake8 complains related to branch changes

1427198

Fix flake8 complaints in general, to pass regression tests

907ba05

Revert "Fix flake8 complaints in general, to pass regression tests"

02c8637

This reverts commit 907ba05.

Smaller flake8 fixes

fdbb65c

Fix only some of the flake8 complaints

49980f7

Fix only some of the flake8 complaints

f98498c

Revert "Fix only some of the flake8 complaints"

db062d4

This reverts commit f98498c.

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

7eef8d4

…into experiment-2016-02

Fix remaining flake8 issues

6433761

Address doctest failure?

0234ae7

Address doctest failure? Arguments in alphabetic order

c29224d

mannby changed the title ~~Support multi-threading of LibLinear L1 one-vs-rest LogisticRegression for # classes > 2~~ [MRG] Support multi-threading of LibLinear L1 one-vs-rest LogisticRegression for # classes > 2 Sep 1, 2016

mannby mentioned this pull request Oct 7, 2016

Multi-core training using liblinear and libsvm #6245

Open

amueller added Waiting for Reviewer Needs Decision Requires decision labels Aug 5, 2019

github-actions bot added module:feature_extraction module:linear_model module:svm labels Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:49

thomasjpfan added the cython label Apr 13, 2021

cmarmo removed the Waiting for Reviewer label Feb 14, 2022

adrinjalali closed this Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Support multi-threading of LibLinear L1 one-vs-rest LogisticRegression for # classes > 2 #6448

[MRG] Support multi-threading of LibLinear L1 one-vs-rest LogisticRegression for # classes > 2 #6448

mannby commented Feb 25, 2016

agramfort commented Feb 25, 2016

mannby commented Feb 25, 2016

agramfort commented Feb 26, 2016 via email

TomDLT commented Feb 26, 2016

mannby commented Feb 26, 2016

adrinjalali commented Apr 19, 2024

[MRG] Support multi-threading of LibLinear L1 one-vs-rest LogisticRegression for # classes > 2 #6448

[MRG] Support multi-threading of LibLinear L1 one-vs-rest LogisticRegression for # classes > 2 #6448

Conversation

mannby commented Feb 25, 2016

agramfort commented Feb 25, 2016

mannby commented Feb 25, 2016

agramfort commented Feb 26, 2016 via email

TomDLT commented Feb 26, 2016

mannby commented Feb 26, 2016

adrinjalali commented Apr 19, 2024