BLD we should ensure continued support for joblib 0.11 #12350

jnothman · 2018-10-11T01:06:19Z

as joblib 0.12 may introduce issues for some users, we should continue to support 0.11, at least in 0.20.X. (Let me know if you think I should base this on that branch rather than master.)

This adds a travis run with older joblib.

rth · 2018-10-11T20:20:17Z

I also think it's an important use case. @ogrisel mentioned that the main technical difficulty is that the following options (currently used) are not supported in joblib 0.11,

Parallel(..., n_jobs=None) extensively used, which could fall back to n_jobs=1 on 0.11. Backporting n_jobs=None to joblib 0.11.1 joblib/joblib#786 could be a way to make supporting this less painful from the scikit-learn perspective.
Parallel(.., prefer='threads') is currently only used in a few places and should be fairly to straightforward to make conditional depending on the joblib version.

I think it would make sense to start treating joblib as dependency that it is. Also related conda-forge/scikit-learn-feedstock#75

rth · 2018-10-13T20:52:48Z

@jnothman I'll push a few commits to at least have an environment working so we can see what is the situation with joblib 0.11 support. Please don't force push on top :)

There was an install issue because joblib 0.11 is not built on conda for python 3.4. I added a python 3.5 build to the matrix, since anyway it looks like we are not testing it in any of the CI. Not sure if we want to keep it, but it can be a temporary solution to see where things are in this PR.

jnothman · 2018-10-16T12:38:25Z

Thanks for that @rth. (And thanks for pushing on 0.20.1 issues.)

require='sharedmem' is currently breaking this.

rth · 2018-10-16T12:50:19Z

require='sharedmem' is currently breaking this.

hmm it's strange that I wasn't seeing that locally. We can probably generalize,

Parallel(...,
         **_joblib_parallel_params(prefer='threads', require='sharedmem'))

and map those for joblib 0.11 as needed. Another thing I don't understand is why we are not seeing failures due to the new default n_jobs=None (haven't looked at it properly yet)..

amueller · 2018-10-16T15:32:43Z

We still support 3.4 in 0.20.1.

rth · 2018-10-16T15:35:00Z

We still support 3.4 in 0.20.1.

We do -- this adds a 3.5 build, doesn't affect the existing 3.4 one.

amueller · 2018-10-16T15:37:58Z

ah, right, sorry.
I have a couple of deadlines this week and haven't really been able to catch up with all the joblib stuff that's happening.

sklearn-lgtm · 2018-10-16T20:17:33Z

This pull request introduces 1 alert when merging 1621987 into a1d0e96 - view on LGTM.com

new alerts:

1 for Unused import

Comment posted by LGTM.com

rth · 2018-10-17T10:08:55Z

So CI is green, and n_jobs=None might not need anything special since it appears to work with joblib 0.11 joblib/joblib#786 (comment).

rth · 2018-10-17T10:13:03Z

@lesteve Would you be able to have a look at the diff (particularly in utils.fixes) to see if I got the mapping of parameters between joblib 0.11 and 0.12+ right.. Thanks!

jnothman

utils.fixes looks right as long as prefer!='processes' when requires='sharedmem'... I don't think we need to handle this case but an assert wouldn't hurt.

But I'm hardly the joblib expert around here.

rth · 2018-10-17T11:26:08Z

utils.fixes looks right as long as prefer!='processes' when requires='sharedmem'... I don't think we need to handle this case but an assert wouldn't hurt.

So currently, if prefer='processes' and requires='sharedmem' with joblib 0.11 it will select backend='threading', as that's how I understood the requires docstring,

require: ‘sharedmem’ or None, default None
Hard constraint to select the backend. If set to ‘sharedmem’, the selected backend will be single-host and thread-based even if the user asked for a non-thread based backend with parallel_backend.

but a confirmation from someone more familiar with joblib would be good to have maybe Loic or @tomMoral

jnothman

Thanks for working on this @rth. There's no approve button for me to press... It needs a what's new, and perhaps we should state supported versions where we talk about the SITE_JOBLIB setting

lesteve

The mapping from prefer+require to backend looks right to me.

lesteve · 2018-10-17T15:40:33Z

sklearn/utils/fixes.py

+def _joblib_parallel_args(prefer=None, require=None):
+    """Set joblib.Parallel arguments in a compatible way for 0.11 and 0.12+"""
+    from . import _joblib
+    if prefer not in ['threads', 'processes', None]:


I would move this argument checks to the case joblib <= 0.11 (in the 0.12+ case just forward the arguments without checks). Also you may want to check backend too.

lesteve · 2018-10-17T15:42:36Z

sklearn/utils/fixes.py

+        raise NotImplementedError('prefer=%s is not supported'
+                                  % prefer)
+    args = {}
+    if _joblib.__version__ >= LooseVersion('0.12.0'):


I would recommend using LooseVersion('0.12') (i.e. only major.minor). There are a few caveats with using LooseVersion with PEP0444 versions. In my experience, the fewer numbers you use the better off you are (for example):

In [1]: LooseVersion('0.12.dev') < LooseVersion('0.12.0') --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-16-f2e012a14c0d> in <module>() ----> 1 LooseVersion('0.12.dev') < LooseVersion('0.12.0') ~/miniconda3/lib/python3.6/distutils/version.py in __lt__(self, other) 50 51 def __lt__(self, other): ---> 52 c = self._cmp(other) 53 if c is NotImplemented: 54 return c ~/miniconda3/lib/python3.6/distutils/version.py in _cmp(self, other) 335 if self.version == other.version: 336 return 0 --> 337 if self.version < other.version: 338 return -1 339 if self.version > other.version: TypeError: '<' not supported between instances of 'str' and 'int' In [1]: LooseVersion('0.12.dev') < LooseVersion('0.12.0') --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-16-f2e012a14c0d> in <module>() ----> 1 LooseVersion('0.12.dev') < LooseVersion('0.12.0') ~/miniconda3/lib/python3.6/distutils/version.py in __lt__(self, other) 50 51 def __lt__(self, other): ---> 52 c = self._cmp(other) 53 if c is NotImplemented: 54 return c ~/miniconda3/lib/python3.6/distutils/version.py in _cmp(self, other) 335 if self.version == other.version: 336 return 0 --> 337 if self.version < other.version: 338 return -1 339 if self.version > other.version: TypeError: '<' not supported between instances of 'str' and 'int'

lesteve · 2018-10-18T06:11:25Z

I haven't followed the situation very closely about the joblib problems that motivate this change, but I have to admit that in my mind SKLEARN_SITE_JOBLIB is a developer thing: basically you are on your own if you use it and you have no guarantee that it will work. In my mind it was meant more to use with joblib development versions rather than allow to go back to older joblib versions.

jnothman · 2018-10-18T09:51:15Z

Yes, it was to allow use of the development joblib versions, when we thought we would release 0.20 with joblib 0.11.

tomMoral · 2018-10-18T10:14:24Z

sklearn/utils/fixes.py

+        if prefer is not None:
+            args['prefer'] = prefer
+        if require is not None:
+            args['require'] = require


Why not simply args = {'prefer': prefer, 'require': require'}?
The default value for both argument is None so this check is not useful.

To avoid having to re-writte this part if we add some arguments to joblib.Parallel, I would suggest a function such as:

def _joblib_parallel_args(**kwargs): if _joblib.__version__ >= LooseVersion('0.12'): return kwargs # If version is earlier than 0.12, mock the prefer and request arguments args = {'backend': None} prefer = kwargs.get('prefer') if prefer not in ['threads', 'processes', None]: raise NotImplementedError('prefer=%s is not supported' % prefer) args['backend'] = {'threads': 'threading', 'processes': 'multiprocessing' None: None}[prefer] if kwargs.get('sharedmem', False): args['backend'] = 'threading' return args

rth · 2018-10-22T13:02:38Z

Thanks for the reviews!

I adapted a bit @tomMoral 's version to not make assumptions about default values of parameters in joblib and explicitly fail on unhanded parameters.

Also added tests that should check all cases in sklearn.utils.fixes._joblib_parallel_args, independently of the joblib version installed.

This should address remaining comments, I think.

jnothman · 2018-10-23T08:48:03Z

We should add something to what's new...

rth · 2018-10-23T09:56:23Z

Added a what's new..

rth · 2018-11-06T16:53:43Z

Any other comments on this? I'll count Joel as +1, but we need a second review.

Maybe @amueller ? @lesteve I think I addressed all your comments. Thanks!

amueller · 2018-11-06T16:54:32Z

I'll try today or tomorrow maybe?

ogrisel

I did a pass and it looks great. +1 for merge when CI is green.

tomMoral

LGTM

Just a small nitpick on the name of the function.

tomMoral · 2018-11-06T17:11:40Z

sklearn/utils/fixes.py

@@ -332,3 +334,51 @@ def _object_dtype_isnan(X):
    from collections import Iterable as _Iterable  # noqa
    from collections import Mapping as _Mapping  # noqa
    from collections import Sized as _Sized  # noqa
+
+
+def _joblib_parallel_args(**kwargs):


Maybe just change it to _joblib_parallel_kwargs as you only set kwargs in it?

It would be clearer.

rth · 2018-11-06T21:09:33Z

Thanks for the reviews @ogrisel and @tomMoral !

Pushed another commit that fixes some issues with the merge conflict (flake8 and a new test that was added meanwhile to fix CI). CI is green now.

Maybe just change it to _joblib_parallel_kwargs as you only set kwargs in it?

I kind of prefer the name _joblib_parallel_args as it's more general: it maps the arguments for joblib parallel, the input is indeed kwargs only and the output is a dict. Currently the output is passed to kwargs, but it doesn't strictly have to be.
Also I find the signature _joblib_parallel_kwargs(**kwargs) a bit confusing as it's unclear if the kwargs in the name have something to do with the kwargs parameters.

Merging as it is, it's a private function, we can always change it later.

* upstream/master: joblib 0.13.0 (scikit-learn#12531) DOC tweak KMeans regarding cluster_centers_ convergence (scikit-learn#12537) DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (scikit-learn#12533) ALL Add HashingVectorizer to __all__ (scikit-learn#12534) BLD we should ensure continued support for joblib 0.11 (scikit-learn#12350) fix typo in whatsnew Fix dead link to numpydoc (scikit-learn#12532) [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (scikit-learn#12485) MNT (0.21) OPTiCS change the default `algorithm` to `auto` (scikit-learn#12529) FIX SkLearn `.score()` method generating error with Dask DataFrames (scikit-learn#12462) MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-learn#12514)

…ybutton * upstream/master: FIX YeoJohnson transform lambda bounds (scikit-learn#12522) [MRG] Additional Warnings in case OpenML auto-detected a problem with dataset (scikit-learn#12541) ENH Prefer threads for IsolationForest (scikit-learn#12543) joblib 0.13.0 (scikit-learn#12531) DOC tweak KMeans regarding cluster_centers_ convergence (scikit-learn#12537) DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (scikit-learn#12533) ALL Add HashingVectorizer to __all__ (scikit-learn#12534) BLD we should ensure continued support for joblib 0.11 (scikit-learn#12350) fix typo in whatsnew Fix dead link to numpydoc (scikit-learn#12532) [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (scikit-learn#12485) MNT (0.21) OPTiCS change the default `algorithm` to `auto` (scikit-learn#12529) FIX SkLearn `.score()` method generating error with Dask DataFrames (scikit-learn#12462) MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-learn#12514)

…12350)

…t-learn#12350)" This reverts commit 1e62eae.

…12350)

BLD we should ensure continued support for joblib 0.11

971da15

jnothman added this to the 0.20.1 milestone Oct 11, 2018

Fix joblib 0.11 vs python version compat on conda

3dfc5bf

Fix joblib 0.11 / 0.12+ compatibility for Parallel(.., prefer=..)

5220301

rth force-pushed the legacy-joblib branch from db6d3fc to 5220301 Compare October 16, 2018 09:53

Support require='sharedmem' in joblib compat fixes

1621987

rth added 3 commits October 16, 2018 22:31

Flake8 and skip test_backend_respected with joblib 0.11

59dfe19

Fix typo

ea4dedf

Fix distutils.version import

ed6a72b

Fix string equality testing

723d1da

jnothman commented Oct 17, 2018

View reviewed changes

lesteve reviewed Oct 18, 2018

View reviewed changes

tomMoral reviewed Oct 18, 2018

View reviewed changes

rth force-pushed the legacy-joblib branch from 7e77d2b to 6720e0f Compare October 22, 2018 09:29

Address review comments and add tests

269937e

rth force-pushed the legacy-joblib branch from 6720e0f to 269937e Compare October 22, 2018 12:48

rth added 2 commits October 23, 2018 11:33

Add what's new

ab87342

Merge branch 'master' into legacy-joblib

a197de9

This was referenced Oct 23, 2018

Package joblib pyodide/pyodide#212

Merged

[RFC] Define future dependence on joblib #12447

Closed

Merge branch 'master' into legacy-joblib

3f246c7

ogrisel approved these changes Nov 6, 2018

View reviewed changes

tomMoral approved these changes Nov 6, 2018

View reviewed changes

ogrisel mentioned this pull request Nov 6, 2018

isolation forest fit function uses way too much memory when n_jobs != 1 #12469

Closed

Fix flake8, and address new tests in the code base

daf046e

rth force-pushed the legacy-joblib branch from 6b97fc4 to daf046e Compare November 6, 2018 18:49

rth merged commit 1128094 into scikit-learn:master Nov 6, 2018

rth mentioned this pull request Nov 6, 2018

Package scikit-learn pyodide/pyodide#139

Closed

4 tasks

thoo pushed a commit to thoo/scikit-learn that referenced this pull request Nov 14, 2018

BLD we should ensure continued support for joblib 0.11 (scikit-learn#…

e874bb4

…12350)

thoo pushed a commit to thoo/scikit-learn that referenced this pull request Nov 14, 2018

BLD we should ensure continued support for joblib 0.11 (scikit-learn#…

6ee634b

…12350)

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Nov 14, 2018

BLD we should ensure continued support for joblib 0.11 (scikit-learn#…

67d8c98

…12350)

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Nov 14, 2018

BLD we should ensure continued support for joblib 0.11 (scikit-learn#…

8f7b362

…12350)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

BLD we should ensure continued support for joblib 0.11 (scikit-learn#…

1e62eae

…12350)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "BLD we should ensure continued support for joblib 0.11 (sciki…

05f512c

…t-learn#12350)" This reverts commit 1e62eae.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "BLD we should ensure continued support for joblib 0.11 (sciki…

da529b6

…t-learn#12350)" This reverts commit 1e62eae.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

BLD we should ensure continued support for joblib 0.11 (scikit-learn#…

c2eac5f

…12350)

Uh oh!

BLD we should ensure continued support for joblib 0.11 #12350

BLD we should ensure continued support for joblib 0.11 #12350

Uh oh!

Conversation

jnothman commented Oct 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rth commented Oct 11, 2018

Uh oh!

rth commented Oct 13, 2018

Uh oh!

jnothman commented Oct 16, 2018

Uh oh!

rth commented Oct 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Oct 16, 2018

Uh oh!

rth commented Oct 16, 2018

Uh oh!

amueller commented Oct 16, 2018

Uh oh!

sklearn-lgtm commented Oct 16, 2018

Uh oh!

rth commented Oct 17, 2018

Uh oh!

rth commented Oct 17, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

rth commented Oct 17, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

lesteve left a comment

Choose a reason for hiding this comment

Uh oh!

lesteve Oct 17, 2018

Choose a reason for hiding this comment

Uh oh!

lesteve Oct 17, 2018

Choose a reason for hiding this comment

Uh oh!

lesteve commented Oct 18, 2018

Uh oh!

jnothman commented Oct 18, 2018 via email

Uh oh!

tomMoral Oct 18, 2018

Choose a reason for hiding this comment

Uh oh!

rth commented Oct 22, 2018

Uh oh!

jnothman commented Oct 23, 2018

Uh oh!

rth commented Oct 23, 2018

Uh oh!

rth commented Nov 6, 2018

Uh oh!

amueller commented Nov 6, 2018

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

tomMoral left a comment

Choose a reason for hiding this comment

Uh oh!

tomMoral Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

rth commented Nov 6, 2018

Uh oh!

Uh oh!

jnothman commented Oct 11, 2018 •

edited

Loading

rth commented Oct 16, 2018 •

edited

Loading