Skip to content

[WIP] Use Joblib backend hints rather than hardcoding #11345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Jun 22, 2018

Reference Issues/PRs

Closes #8804
See also #9486
See also joblib/joblib#602

This includes the changes from #602, since that's required for this. I wanted to get this up early, in the hopes it could be reviewed and included in the release (if possible). Currently, my changes are in 28260d9

What does this implement/fix? Explain your changes.

Uses joblib's new backend hints in every place where we currently hardcode the backend.

Currently (I think) only ForestClassifier / ForestRegressor rely on shared memory for correctness, and only in their predict methods. Everywhere else has been changed to prefer='threads'.

Any other comments?

Open questions:

  1. Testing: I've added a test for random forest. I think it's a useful test, but
    I'm not sure whether I should write similar tests for other backends.
  2. Documentation: I think the class docstring of every estimator that uses a
    non-default backend should indicate that in the n_jobs parameter
    description. Would that be OK?
  3. Examples: I would be more than happy to include a Dask example :) It'd just
    be using Dask's local cluster. If you all are OK with accepting Dask &
    distributed as dependencies for the doc build then I'll get to work on that.
  4. Linting: Should I (try to) write a check for uses of backend=, so that future PRs don't accidentally hardcode a backend?

ogrisel and others added 5 commits June 22, 2018 11:59
Uses the new prefer / require keywords from joblib/joblib#602.

This allows users to control how jobs are parallelized in more situations.
For example, training a RandomForest on a cluster of machines with the dask backend.

Closes scikit-learn#8804
@sklearn-lgtm
Copy link

This pull request introduces 52 alerts and fixes 3 when merging 28260d9 into 93382cc - view on LGTM.com

new alerts:

  • 20 for Unused import
  • 8 for Except block handles 'BaseException'
  • 8 for Module is imported with 'import' and 'import from'
  • 4 for Missing call to __init__ during object initialization
  • 3 for 'import *' may pollute namespace
  • 1 for Module is imported more than once
  • 1 for Conflicting attributes in base classes
  • 1 for Unused local variable
  • 1 for Unreachable code
  • 1 for Module-level cyclic import
  • 1 for Variable defined multiple times
  • 1 for Redundant assignment
  • 1 for Wrong name for an argument in a class instantiation
  • 1 for __init__ method calls overridden method

fixed alerts:

  • 1 for Non-exception in 'except' clause
  • 1 for Except block handles 'BaseException'
  • 1 for Unnecessary 'else' clause in loop

Comment posted by LGTM.com

@tomMoral
Copy link
Contributor

We just cherry-picked your commit in the joblib0.12 PR #9486 to try to land all the changes in one go.

@TomAugspurger
Copy link
Contributor Author

Great, thanks.

I'll close this PR then. If there are any comments on those changes I can make PRs against the #9486 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to override joblib backend for scikit-learn estimators
4 participants