Skip to content

[MRG+1] Restructure access to vendored/site Joblib #11471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 17, 2018

Conversation

jnothman
Copy link
Member

In order to fix #11408, this swaps joblib and _joblib. It however, allows users to access joblib's Memory or Parallel functionality without accessing sklearn.externals._joblib by importing Memory, Parallel, etc. into sklearn.utils.

@jnothman jnothman changed the title [MRG] Restructure access to vendored/site Joblib [WIP] Restructure access to vendored/site Joblib Jul 10, 2018
@sklearn-lgtm
Copy link

This pull request introduces 2 alerts and fixes 2 when merging 7093fc1 into a2b5645 - view on LGTM.com

new alerts:

  • 2 for Unused import

fixed alerts:

  • 2 for Unused import

Comment posted by LGTM.com

@sklearn-lgtm
Copy link

This pull request introduces 2 alerts and fixes 2 when merging 314daa1 into a2b5645 - view on LGTM.com

new alerts:

  • 2 for Unused import

fixed alerts:

  • 2 for Unused import

Comment posted by LGTM.com

@sklearn-lgtm
Copy link

This pull request fixes 2 alerts when merging e567b8f into a2b5645 - view on LGTM.com

fixed alerts:

  • 2 for Unused import

Comment posted by LGTM.com

@jnothman jnothman changed the title [WIP] Restructure access to vendored/site Joblib [MRG] Restructure access to vendored/site Joblib Jul 11, 2018
@jnothman
Copy link
Member Author

This is looking green. Ready for review.

@sklearn-lgtm
Copy link

This pull request fixes 2 alerts when merging 7785ab6 into a2b5645 - view on LGTM.com

fixed alerts:

  • 2 for Unused import

Comment posted by LGTM.com

@glemaitre glemaitre added this to the 0.20 milestone Jul 14, 2018
Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

+1 for merge.

Should we add in the travis matrix a job that tests with SKLEARN_SITE_JOBLIB=1 (this should probably be done in a separate PR).

@GaelVaroquaux GaelVaroquaux changed the title [MRG] Restructure access to vendored/site Joblib [MRG+1] Restructure access to vendored/site Joblib Jul 16, 2018
@ogrisel
Copy link
Member

ogrisel commented Jul 16, 2018

+1 as well, it sounds like a safe way to handle the switch.

Should we add in the travis matrix a job that tests with SKLEARN_SITE_JOBLIB=1 (this should probably be done in a separate PR).

It's already tested in the cron job that checks the master branch of everything.

We could also update one on the existing entries to use the site joblib. No need to extend the size of the build matrix for this.

@massich
Copy link
Contributor

massich commented Jul 16, 2018

LGTM

it works.

@ogrisel
Copy link
Member

ogrisel commented Jul 16, 2018

I will update the PR.

@@ -5,11 +5,12 @@

# An environment variable to use the site joblib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe sklearn/externals/_joblib.py can be moved to sklearn/utils/_joblib.py.

My reasoning is that _joblib.py is not really external but our own code to do the switching between sklearn.external.joblib and joblib.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will do that as well.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jul 16, 2018 via email

@ogrisel
Copy link
Member

ogrisel commented Jul 16, 2018

I added another test and found a bug: cab9e0a

Tests now pass both with and without SKLEARN_SITE_JOBLIB on my laptop. Let's wait for travis. I plan to merge without waiting for appveyor.

@lesteve
Copy link
Member

lesteve commented Jul 16, 2018

A few things we may want to think about:

  • is sklearn.utils the right place for joblib stuff?
  • we are making only a few joblib things available in sklearn.utils (Memory, Parallel, delayed, paralell_backend, cpu_count). Does that cover most of what people do with sklearn.externals.joblib?

@ogrisel
Copy link
Member

ogrisel commented Jul 16, 2018

we are making only a few joblib things available in sklearn.utils (Memory, Parallel, delayed, paralell_backend, cpu_count). Does that cover most of what people do with sklearn.externals.joblib?

Indeed. If we accept one more level of nesting we could have:

sklearn/utils/joblib.py where we have:

# An environment variable to use the site joblib
if _os.environ.get('SKLEARN_SITE_JOBLIB', False):
    from joblib import __all__
    from joblib import *
    from joblib import __version__
    from joblib import logger
else:
    from ..externals.joblib import __all__
    from ..externals.joblib import *
    from ..externals.joblib import __version__
    from ..externals.joblib import logger

to expose the full joblib namespace.

@rth
Copy link
Member

rth commented Jul 16, 2018

+1 for having this in sklearn.utils.joblib

@ogrisel
Copy link
Member

ogrisel commented Jul 16, 2018

@jnothman my last commit put the "public" joblib symbols under sklearn.utils.joblib instead of flattening them into the sklearn.utils level.

@jnothman
Copy link
Member Author

My reasoning for using sklearn.utils rather than sklearn.utils.joblib went something like:

  • In practice we only use a few things from joblib and only need to expose those to sklearn and users' libraries.
  • Indeed, site and vendored joblib might expose different names. We need to make sure we're importing get_cpu_count, parallel_backend, etc for scikit-learn to work. And we need to make sure that developers never import sklearn.utils.joblib.newfangled_thing
  • Importing from sklearn.utils.joblib looks a lot like importing from sklearn.externals.joblib which may confuse users, and may be easily missed by reviewers. from sklearn.utils import Memory gives a clean break in the paradigm of joblib usage.

@rth
Copy link
Member

rth commented Jul 16, 2018

from sklearn.utils import Memory gives a clean break in the paradigm of joblib usage.

I understand that we want to fix the parent issue but I was under impression that in the long run, we also wanted people to stop importing Parallel etc from from scikit-learn altogether and make them use joblib directly if they do require parallel functionality for their code? Otherwise issues get reported to the wrong repo etc.

sklearn.utils is semi-public, so does it mean that technically we have to write an API changes entry in scikit-learn if something changes in joblib? Why not use e.g.sklearn.utils._joblib internally and tell users to use joblib for user applications? Or am I missing something?

@ogrisel
Copy link
Member

ogrisel commented Jul 16, 2018

I agree with you @jnothman. However, I also have the feeling that if in the future joblib introduces a new set of toplevel symbols that we want to use in scikit-learn, we will start to crowd the sklearn.utils namespace quite heavily with possible (however unlikely) name conflicts.

Also I am wondering if it might not be useful to do:

>>> from sklearn.utils import joblib

to get a natural hold on the full joblib namespace as sklearn sees it in test scripts / interactive sessions.

This will still be possible with the later patterns though:

>>> from sklearn.utils import _joblib as joblib

so maybe this later point is not a big problem in the end. I am still worried about the first point though.

@jnothman
Copy link
Member Author

I understand that we want to fix the parent issue but I was under impression that in the long run, we also wanted people to stop importing Parallel etc from from scikit-learn altogether and make them use joblib directly if they do require parallel functionality for their code? Otherwise issues get reported to the wrong repo etc.

Parallel is different to Memory here. We ask users to sometimes pass a Memory object as a parameter. Should they have to separately install joblib to do so?

Also, does with parallel_backend... work if parallel_backend is site joblib.parallel_backend, but Parallel is the vendored sklearn.externals.joblib.parallel_backend? Surely the user code needs to be using the same joblib instance as we are internally.

@jnothman
Copy link
Member Author

we will start to crowd the sklearn.utils namespace quite heavily with possible (however unlikely) name conflicts.

The namespace is crowded already... I am not persuaded by this as a sufficient counter-argument. But shrug. It's not that important.

@ogrisel
Copy link
Member

ogrisel commented Jul 17, 2018

Also, does with parallel_backend... work if parallel_backend is site joblib.parallel_backend, but Parallel is the vendored sklearn.externals.joblib.parallel_backend? Surely the user code needs to be using the same joblib instance as we are internally.

Indeed, the user needs to use the parallel_backend that matches the active implementation of joblib.

@jnothman
Copy link
Member Author

jnothman commented Jul 17, 2018 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jul 17, 2018 via email

@ogrisel
Copy link
Member

ogrisel commented Jul 17, 2018

Alright, I'll revert my last commit.

@ogrisel
Copy link
Member

ogrisel commented Jul 17, 2018

I plan to merge this without waiting for appveyor if travis is green.

@GaelVaroquaux
Copy link
Member

PEP8 failure :(. You'll need "# noqa" on some lines.

@ogrisel
Copy link
Member

ogrisel commented Jul 17, 2018

We don't care no? Let me wait for the real tests to pass.

@GaelVaroquaux
Copy link
Member

OK. I tend not to like pyflakes warnings.

@ogrisel
Copy link
Member

ogrisel commented Jul 17, 2018

Alright pushed.

@ogrisel ogrisel merged commit 14e7c32 into scikit-learn:master Jul 17, 2018
@ogrisel
Copy link
Member

ogrisel commented Jul 17, 2018

Merged.

@jnothman
Copy link
Member Author

jnothman commented Jul 18, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression for unpickling due to #11166
9 participants