-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
BLD we should ensure continued support for joblib 0.11 #12350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I also think it's an important use case. @ogrisel mentioned that the main technical difficulty is that the following options (currently used) are not supported in joblib 0.11,
I think it would make sense to start treating joblib as dependency that it is. Also related conda-forge/scikit-learn-feedstock#75 |
@jnothman I'll push a few commits to at least have an environment working so we can see what is the situation with joblib 0.11 support. Please don't force push on top :) There was an install issue because joblib 0.11 is not built on conda for python 3.4. I added a python 3.5 build to the matrix, since anyway it looks like we are not testing it in any of the CI. Not sure if we want to keep it, but it can be a temporary solution to see where things are in this PR. |
Thanks for that @rth. (And thanks for pushing on 0.20.1 issues.) require='sharedmem' is currently breaking this. |
hmm it's strange that I wasn't seeing that locally. We can probably generalize, Parallel(...,
**_joblib_parallel_params(prefer='threads', require='sharedmem')) and map those for joblib 0.11 as needed. Another thing I don't understand is why we are not seeing failures due to the new default |
We still support 3.4 in 0.20.1. |
We do -- this adds a 3.5 build, doesn't affect the existing 3.4 one. |
ah, right, sorry. |
This pull request introduces 1 alert when merging 1621987 into a1d0e96 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
So CI is green, and |
@lesteve Would you be able to have a look at the diff (particularly in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utils.fixes looks right as long as prefer!='processes' when requires='sharedmem'... I don't think we need to handle this case but an assert wouldn't hurt.
But I'm hardly the joblib expert around here.
So currently, if
but a confirmation from someone more familiar with joblib would be good to have maybe Loic or @tomMoral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @rth. There's no approve button for me to press... It needs a what's new, and perhaps we should state supported versions where we talk about the SITE_JOBLIB setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mapping from prefer+require to backend looks right to me.
sklearn/utils/fixes.py
Outdated
def _joblib_parallel_args(prefer=None, require=None): | ||
"""Set joblib.Parallel arguments in a compatible way for 0.11 and 0.12+""" | ||
from . import _joblib | ||
if prefer not in ['threads', 'processes', None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this argument checks to the case joblib <= 0.11 (in the 0.12+ case just forward the arguments without checks). Also you may want to check backend
too.
sklearn/utils/fixes.py
Outdated
raise NotImplementedError('prefer=%s is not supported' | ||
% prefer) | ||
args = {} | ||
if _joblib.__version__ >= LooseVersion('0.12.0'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend using LooseVersion('0.12')
(i.e. only major.minor). There are a few caveats with using LooseVersion with PEP0444 versions. In my experience, the fewer numbers you use the better off you are (for example):
In [1]: LooseVersion('0.12.dev') < LooseVersion('0.12.0')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-f2e012a14c0d> in <module>()
----> 1 LooseVersion('0.12.dev') < LooseVersion('0.12.0')
~/miniconda3/lib/python3.6/distutils/version.py in __lt__(self, other)
50
51 def __lt__(self, other):
---> 52 c = self._cmp(other)
53 if c is NotImplemented:
54 return c
~/miniconda3/lib/python3.6/distutils/version.py in _cmp(self, other)
335 if self.version == other.version:
336 return 0
--> 337 if self.version < other.version:
338 return -1
339 if self.version > other.version:
TypeError: '<' not supported between instances of 'str' and 'int'
In [1]: LooseVersion('0.12.dev') < LooseVersion('0.12.0')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-f2e012a14c0d> in <module>()
----> 1 LooseVersion('0.12.dev') < LooseVersion('0.12.0')
~/miniconda3/lib/python3.6/distutils/version.py in __lt__(self, other)
50
51 def __lt__(self, other):
---> 52 c = self._cmp(other)
53 if c is NotImplemented:
54 return c
~/miniconda3/lib/python3.6/distutils/version.py in _cmp(self, other)
335 if self.version == other.version:
336 return 0
--> 337 if self.version < other.version:
338 return -1
339 if self.version > other.version:
TypeError: '<' not supported between instances of 'str' and 'int'
I haven't followed the situation very closely about the joblib problems that motivate this change, but I have to admit that in my mind SKLEARN_SITE_JOBLIB is a developer thing: basically you are on your own if you use it and you have no guarantee that it will work. In my mind it was meant more to use with joblib development versions rather than allow to go back to older joblib versions. |
Yes, it was to allow use of the development joblib versions, when we
thought we would release 0.20 with joblib 0.11.
|
sklearn/utils/fixes.py
Outdated
if prefer is not None: | ||
args['prefer'] = prefer | ||
if require is not None: | ||
args['require'] = require |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply args = {'prefer': prefer, 'require': require'}
?
The default value for both argument is None
so this check is not useful.
To avoid having to re-writte this part if we add some arguments to joblib.Parallel
, I would suggest a function such as:
def _joblib_parallel_args(**kwargs):
if _joblib.__version__ >= LooseVersion('0.12'):
return kwargs
# If version is earlier than 0.12, mock the prefer and request arguments
args = {'backend': None}
prefer = kwargs.get('prefer')
if prefer not in ['threads', 'processes', None]:
raise NotImplementedError('prefer=%s is not supported'
% prefer)
args['backend'] = {'threads': 'threading',
'processes': 'multiprocessing'
None: None}[prefer]
if kwargs.get('sharedmem', False):
args['backend'] = 'threading'
return args
Thanks for the reviews! I adapted a bit @tomMoral 's version to not make assumptions about default values of parameters in joblib and explicitly fail on unhanded parameters. Also added tests that should check all cases in This should address remaining comments, I think. |
We should add something to what's new... |
Added a what's new.. |
I'll try today or tomorrow maybe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a pass and it looks great. +1 for merge when CI is green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just a small nitpick on the name of the function.
@@ -332,3 +334,51 @@ def _object_dtype_isnan(X): | |||
from collections import Iterable as _Iterable # noqa | |||
from collections import Mapping as _Mapping # noqa | |||
from collections import Sized as _Sized # noqa | |||
|
|||
|
|||
def _joblib_parallel_args(**kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just change it to _joblib_parallel_kwargs
as you only set kwargs
in it?
It would be clearer.
Thanks for the reviews @ogrisel and @tomMoral ! Pushed another commit that fixes some issues with the merge conflict (flake8 and a new test that was added meanwhile to fix CI). CI is green now.
I kind of prefer the name Merging as it is, it's a private function, we can always change it later. |
* upstream/master: joblib 0.13.0 (scikit-learn#12531) DOC tweak KMeans regarding cluster_centers_ convergence (scikit-learn#12537) DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (scikit-learn#12533) ALL Add HashingVectorizer to __all__ (scikit-learn#12534) BLD we should ensure continued support for joblib 0.11 (scikit-learn#12350) fix typo in whatsnew Fix dead link to numpydoc (scikit-learn#12532) [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (scikit-learn#12485) MNT (0.21) OPTiCS change the default `algorithm` to `auto` (scikit-learn#12529) FIX SkLearn `.score()` method generating error with Dask DataFrames (scikit-learn#12462) MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-learn#12514)
…ybutton * upstream/master: FIX YeoJohnson transform lambda bounds (scikit-learn#12522) [MRG] Additional Warnings in case OpenML auto-detected a problem with dataset (scikit-learn#12541) ENH Prefer threads for IsolationForest (scikit-learn#12543) joblib 0.13.0 (scikit-learn#12531) DOC tweak KMeans regarding cluster_centers_ convergence (scikit-learn#12537) DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (scikit-learn#12533) ALL Add HashingVectorizer to __all__ (scikit-learn#12534) BLD we should ensure continued support for joblib 0.11 (scikit-learn#12350) fix typo in whatsnew Fix dead link to numpydoc (scikit-learn#12532) [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (scikit-learn#12485) MNT (0.21) OPTiCS change the default `algorithm` to `auto` (scikit-learn#12529) FIX SkLearn `.score()` method generating error with Dask DataFrames (scikit-learn#12462) MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-learn#12514)
…t-learn#12350)" This reverts commit 1e62eae.
…t-learn#12350)" This reverts commit 1e62eae.
as joblib 0.12 may introduce issues for some users, we should continue to support 0.11, at least in 0.20.X. (Let me know if you think I should base this on that branch rather than master.)
This adds a travis run with older joblib.