[WIP] Use Joblib backend hints rather than hardcoding #11345

TomAugspurger · 2018-06-22T13:10:02Z

Reference Issues/PRs

Closes #8804
See also #9486
See also joblib/joblib#602

This includes the changes from #602, since that's required for this. I wanted to get this up early, in the hopes it could be reviewed and included in the release (if possible). Currently, my changes are in 28260d9

What does this implement/fix? Explain your changes.

Uses joblib's new backend hints in every place where we currently hardcode the backend.

Currently (I think) only ForestClassifier / ForestRegressor rely on shared memory for correctness, and only in their predict methods. Everywhere else has been changed to prefer='threads'.

Any other comments?

Open questions:

Testing: I've added a test for random forest. I think it's a useful test, but
I'm not sure whether I should write similar tests for other backends.
Documentation: I think the class docstring of every estimator that uses a
non-default backend should indicate that in the n_jobs parameter
description. Would that be OK?
Examples: I would be more than happy to include a Dask example :) It'd just
be using Dask's local cluster. If you all are OK with accepting Dask &
distributed as dependencies for the doc build then I'll get to work on that.
Linting: Should I (try to) write a check for uses of backend=, so that future PRs don't accidentally hardcode a backend?

…tests

Uses the new prefer / require keywords from joblib/joblib#602. This allows users to control how jobs are parallelized in more situations. For example, training a RandomForest on a cluster of machines with the dask backend. Closes scikit-learn#8804

sklearn-lgtm · 2018-06-22T13:35:06Z

This pull request introduces 52 alerts and fixes 3 when merging 28260d9 into 93382cc - view on LGTM.com

new alerts:

20 for Unused import
8 for Except block handles 'BaseException'
8 for Module is imported with 'import' and 'import from'
4 for Missing call to __init__ during object initialization
3 for 'import *' may pollute namespace
1 for Module is imported more than once
1 for Conflicting attributes in base classes
1 for Unused local variable
1 for Unreachable code
1 for Module-level cyclic import
1 for Variable defined multiple times
1 for Redundant assignment
1 for Wrong name for an argument in a class instantiation
1 for __init__ method calls overridden method

fixed alerts:

1 for Non-exception in 'except' clause
1 for Except block handles 'BaseException'
1 for Unnecessary 'else' clause in loop

Comment posted by LGTM.com

tomMoral · 2018-06-22T16:37:49Z

We just cherry-picked your commit in the joblib0.12 PR #9486 to try to land all the changes in one go.

TomAugspurger · 2018-06-22T18:21:11Z

Great, thanks.

I'll close this PR then. If there are any comments on those changes I can make PRs against the #9486 branch.

ogrisel and others added 5 commits June 22, 2018 11:59

WIP vendor loky + joblib dev branches to perform sklearn integration …

ff94b71

…tests

Include the vendored subpackages in the setup.py

74b333c

Test coverage thread-safety fix

addf05e

MTN vendor joblib 0.11.0dev0

e2ca63b

TomAugspurger closed this Jun 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Use Joblib backend hints rather than hardcoding #11345

[WIP] Use Joblib backend hints rather than hardcoding #11345

TomAugspurger commented Jun 22, 2018 •

edited

Loading

sklearn-lgtm commented Jun 22, 2018

tomMoral commented Jun 22, 2018

TomAugspurger commented Jun 22, 2018

[WIP] Use Joblib backend hints rather than hardcoding #11345

[WIP] Use Joblib backend hints rather than hardcoding #11345

Conversation

TomAugspurger commented Jun 22, 2018 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

sklearn-lgtm commented Jun 22, 2018

tomMoral commented Jun 22, 2018

TomAugspurger commented Jun 22, 2018

TomAugspurger commented Jun 22, 2018 •

edited

Loading