Skip to content

Add ability to override joblib backend for scikit-learn estimators #8804

Closed
@jcrist

Description

@jcrist

Some calls to Parallel in scikit-learn hardcode what joblib backend to use. It would be nice if these could be overridden to use a different backend using the parallel_backend contextmanager. This would allow optionally using the dask.distributed backend in more places, which may provide speedups (see comment here).

One way to do this would be to check if there's a globally set backend (as set by the context manager) and use that, otherwise use the specified fallback. This might look like:

from sklearn.externals.joblib.parallel import _backend

def active_backend_or(default):
    """If there is an active joblib backend use that, otherwise use the default"""
    return getattr(_backend, 'backend_and_jobs', (default, None))[0]

# Use the active backend if set, otherwise use "threading"
Parallel(backend=active_backend_or("threading"))(...)

I'm not sure if this is a fix that should be implemented in scikit-learn or in joblib. Opening this for discussion.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions