Skip to content

Make all non-canonical modules private? #9250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Jun 29, 2017 · 30 comments
Closed

Make all non-canonical modules private? #9250

amueller opened this issue Jun 29, 2017 · 30 comments
Milestone

Comments

@amueller
Copy link
Member

This is maybe for 1.0. Recently we started marking files like model_selection._split private, so that everything has a single canonical import:

from sklearn.model_selection import cross_val_score

For many (older?) models that's not the case, we have

from sklearn.linear_model.logistic import LogisticRegression
from sklearn.linear_model import LogisticRegression

etc.
I think it would be nice to make all the files that are not the canonical import (according to the API documentation) private (with deprecation obviously).
That might be a bit annoying for existing users that used long import paths, but it makes auto-complete much more helpful and the module structure much less confusing for newcomers.

For example sklearn.linear_model.ridge is a module, while sklearn.linear_model.ridge_regression is a function that implements ridge regression and sklearn.linear_model.Ridge is a class that implements ridge regression. From the names this is totally unclear.

@amueller amueller added the API label Jun 29, 2017
@amueller amueller added this to the 1.0 milestone Jun 29, 2017
@jmschrei
Copy link
Member

I would support this as it seems like it could clear up a lot of confusion. Do you believe there are any people directly trying to use the functions that would be made private?

@jnothman
Copy link
Member

jnothman commented Jun 29, 2017 via email

@adrinjalali
Copy link
Member

Since #14939 is merged, we seem to be deprecating them already for 0.22. This now seems like an easy one for some of the upcoming sprints. Although if we want it done before the end of october, we should do it ourselves maybe.

@NicolasHug
Copy link
Member

NicolasHug commented Sep 10, 2019

Below is the list of the folders whose files need to be preceded by an underscore. The procedure is pretty straightforward:

  • rename file.py into _file.py and create a new file.py with a similar content that raises a deprecation warning (like neural_network/rbm.py)
  • update __init__.py
  • update test parametrization of sklearn/tests/test_import_deprecations.py

Folders

(Note that this only addresses the folders. Files like sklearn/pipeline.py may still contain non-deprecated things.)

@NicolasHug
Copy link
Member

NicolasHug commented Sep 10, 2019

ping @thomasjpfan @glemaitre @adrinjalali let's pick one each and review each-other? this should go relatively fast

I'll pick cluster

@adrinjalali
Copy link
Member

I'm taking the tree and ensemble, since they're interrelated.

@glemaitre
Copy link
Member

I'll take preprocessing and inspection at first

@amueller
Copy link
Member Author

pretty sure folks have not been using git move correctly. Try doing a single commit that does the move and commit that, before adding anything else. Your PRs don't show files as moved, which means that something went wrong.

@jnothman
Copy link
Member

jnothman commented Sep 18, 2019 via email

@adrinjalali
Copy link
Member

I did move, as you see in this commit df7f54d but once you create a file with the same name, github gets confused.

@NicolasHug
Copy link
Member

I'm working on mixture and svm

@NicolasHug
Copy link
Member

Working on covariance

@NicolasHug
Copy link
Member

working on cross_decomposition

@NicolasHug
Copy link
Member

working on metrics and metrics/cluster

@thomasjpfan
Copy link
Member

I'll work on datasets

@thomasjpfan
Copy link
Member

Working on gaussian_process

@thomasjpfan
Copy link
Member

Working on feature_extraction

@thomasjpfan
Copy link
Member

Working on inspection

@thomasjpfan
Copy link
Member

Working on preprocessing

@thomasjpfan
Copy link
Member

Working on linear_models

@glemaitre
Copy link
Member

@thomasjpfan From the list, I only did not review the PR for the decomposition. I did not open a PR because you seem to have work on it. Could you open a PR?

@qinhanmin2014
Copy link
Member

ping @NicolasHug @thomasjpfan what about sklearn.feature_selection?

Is #15367 a blocker?

@thomasjpfan
Copy link
Member

Working on feature_selection

@NicolasHug
Copy link
Member

what about sklearn.feature_selection?

Thanks I missed this one

Is #15367 a blocker?

Yes but it's tracked by another issue

@lucasdavid
Copy link
Contributor

I had a ton of private datasets, such as names, text and geographical data. I used to use the following utils to download and manage these:

from sklearn.datasets.base import (get_data_home, RemoteFileMetadata, _fetch_remote)

What's the recommended approach now?

@glemaitre
Copy link
Member

you can import from sklearn.datasets._base import ...

However, now you are aware that _base will be private and you might have to change your code at a new scikit-learn release since we might not provide backward-compatible code.

This said, be aware that we are going to define a developer API which will define some backward support for these types of utilities used in third-party.

sebp added a commit to sebp/scikit-survival that referenced this issue Apr 10, 2020
- Deprecate presort (scikit-learn/scikit-learn#14907)
- Add Minimal Cost-Complexity Pruning to Decision Trees (scikit-learn/scikit-learn#12887)
- Add bootstrap sample size limit to forest ensembles (scikit-learn/scikit-learn#14682)
- Fix deprecated imports (scikit-learn/scikit-learn#9250)
sebp added a commit to sebp/scikit-survival that referenced this issue Apr 10, 2020
- Deprecate presort (scikit-learn/scikit-learn#14907)
- Add Minimal Cost-Complexity Pruning to Decision Trees (scikit-learn/scikit-learn#12887)
- Add bootstrap sample size limit to forest ensembles (scikit-learn/scikit-learn#14682)
- Fix deprecated imports (scikit-learn/scikit-learn#9250)

Do not add ccp_alpha to SurvivalTree, because
it relies node_impurity, which is not set for SurvivalTree.
sebp added a commit to sebp/scikit-survival that referenced this issue Apr 10, 2020
- Deprecate presort (scikit-learn/scikit-learn#14907)
- Add Minimal Cost-Complexity Pruning to Decision Trees (scikit-learn/scikit-learn#12887)
- Add bootstrap sample size limit to forest ensembles (scikit-learn/scikit-learn#14682)
- Fix deprecated imports (scikit-learn/scikit-learn#9250)

Do not add ccp_alpha to SurvivalTree, because
it relies node_impurity, which is not set for SurvivalTree.
sebp added a commit to sebp/scikit-survival that referenced this issue Apr 10, 2020
- Deprecate presort (scikit-learn/scikit-learn#14907)
- Add Minimal Cost-Complexity Pruning to Decision Trees (scikit-learn/scikit-learn#12887)
- Add bootstrap sample size limit to forest ensembles (scikit-learn/scikit-learn#14682)
- Fix deprecated imports (scikit-learn/scikit-learn#9250)

Do not add ccp_alpha to SurvivalTree, because
it relies node_impurity, which is not set for SurvivalTree.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants