-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] FIX Modify the API of Pipeline and FeatureUnion to match common scikit-learn estimators conventions #8350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1d4b5e6
9e3da89
a2d6e7d
0502635
3338171
de70e29
c8b10e1
d3d90eb
b1cdea9
7317ee2
4920ea7
e63ffdc
ec0a956
2e8c44a
2901ad4
c30b8ff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,16 +50,13 @@ it takes a variable number of estimators and returns a pipeline, | |
filling in the names automatically:: | ||
|
||
>>> from sklearn.pipeline import make_pipeline | ||
>>> from sklearn.naive_bayes import MultinomialNB | ||
>>> from sklearn.preprocessing import Binarizer | ||
>>> make_pipeline(Binarizer(), MultinomialNB()) # doctest: +NORMALIZE_WHITESPACE | ||
>>> make_pipeline(PCA(), SVC()) # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS | ||
Pipeline(memory=None, | ||
steps=[('binarizer', Binarizer(copy=True, threshold=0.0)), | ||
('multinomialnb', MultinomialNB(alpha=1.0, | ||
class_prior=None, | ||
fit_prior=True))]) | ||
steps=[('pca', PCA(copy=True,...)), | ||
('svc', SVC(C=1.0,...))]) | ||
|
||
The estimators of a pipeline are stored as a list in the ``steps`` attribute:: | ||
The original estimators of a pipeline are stored as a list in the ``steps`` | ||
attribute:: | ||
|
||
>>> pipe.steps[0] | ||
('reduce_dim', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None, | ||
|
@@ -71,6 +68,23 @@ and as a ``dict`` in ``named_steps``:: | |
PCA(copy=True, iterated_power='auto', n_components=None, random_state=None, | ||
svd_solver='auto', tol=0.0, whiten=False) | ||
|
||
Once the pipeline has been fitted, ``steps_`` and ``named_steps_`` have to be | ||
used. | ||
|
||
>>> from sklearn.datasets import load_iris | ||
>>> iris = load_iris() | ||
>>> pipe.fit(iris.data, iris.target) | ||
... # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS | ||
Pipeline(memory=None, | ||
steps=[('reduce_dim', PCA(copy=True,...)), | ||
('clf', SVC(C=1.0,...))]) | ||
>>> pipe.named_steps_['reduce_dim'] # doctest: +NORMALIZE_WHITESPACE | ||
PCA(copy=True, iterated_power='auto', n_components=None, random_state=None, | ||
svd_solver='auto', tol=0.0, whiten=False) | ||
>>> pipe.steps_[0] # doctest: +NORMALIZE_WHITESPACE | ||
('reduce_dim', PCA(copy=True, iterated_power='auto', n_components=None, | ||
random_state=None, svd_solver='auto', tol=0.0, whiten=False)) | ||
|
||
Parameters of the estimators in the pipeline can be accessed using the | ||
``<estimator>__<parameter>`` syntax:: | ||
|
||
|
@@ -152,48 +166,6 @@ object:: | |
>>> # Clear the cache directory when you don't need it anymore | ||
>>> rmtree(cachedir) | ||
|
||
.. warning:: **Side effect of caching transfomers** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. merge issue? |
||
|
||
Using a :class:`Pipeline` without cache enabled, it is possible to | ||
inspect the original instance such as:: | ||
|
||
>>> from sklearn.datasets import load_digits | ||
>>> digits = load_digits() | ||
>>> pca1 = PCA() | ||
>>> svm1 = SVC() | ||
>>> pipe = Pipeline([('reduce_dim', pca1), ('clf', svm1)]) | ||
>>> pipe.fit(digits.data, digits.target) | ||
... # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS | ||
Pipeline(memory=None, | ||
steps=[('reduce_dim', PCA(...)), ('clf', SVC(...))]) | ||
>>> # The pca instance can be inspected directly | ||
>>> print(pca1.components_) # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS | ||
[[ -1.77484909e-19 ... 4.07058917e-18]] | ||
|
||
Enabling caching triggers a clone of the transformers before fitting. | ||
Therefore, the transformer instance given to the pipeline cannot be | ||
inspected directly. | ||
In following example, accessing the :class:`PCA` instance ``pca2`` | ||
will raise an ``AttributeError`` since ``pca2`` will be an unfitted | ||
transformer. | ||
Instead, use the attribute ``named_steps`` to inspect estimators within | ||
the pipeline:: | ||
|
||
>>> cachedir = mkdtemp() | ||
>>> pca2 = PCA() | ||
>>> svm2 = SVC() | ||
>>> cached_pipe = Pipeline([('reduce_dim', pca2), ('clf', svm2)], | ||
... memory=cachedir) | ||
>>> cached_pipe.fit(digits.data, digits.target) | ||
... # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS | ||
Pipeline(memory=..., | ||
steps=[('reduce_dim', PCA(...)), ('clf', SVC(...))]) | ||
>>> print(cached_pipe.named_steps['reduce_dim'].components_) | ||
... # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS | ||
[[ -1.77484909e-19 ... 4.07058917e-18]] | ||
>>> # Remove the cache directory | ||
>>> rmtree(cachedir) | ||
|
||
.. topic:: Examples: | ||
|
||
* :ref:`sphx_glr_auto_examples_plot_compare_reduction.py` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,6 +56,7 @@ Enhancements | |
- :class:`multioutput.MultiOutputRegressor` and :class:`multioutput.MultiOutputClassifier` | ||
now support online learning using `partial_fit`. | ||
issue: `8053` by :user:`Peng Yu <yupbank>`. | ||
|
||
- :class:`pipeline.Pipeline` allows to cache transformers | ||
within a pipeline by using the ``memory`` constructor parameter. | ||
By :issue:`7990` by :user:`Guillaume Lemaitre <glemaitre>`. | ||
|
@@ -251,6 +252,13 @@ API changes summary | |
:func:`sklearn.model_selection.cross_val_predict`. | ||
:issue:`2879` by :user:`Stephen Hoover <stephen-hoover>`. | ||
|
||
- In the future, the estimators in the ``steps`` and ``named_steps`` | ||
attributes will no longer have their ``fit()`` methods called directly. | ||
Users will have to access fitted Pipeline steps in ``steps_`` | ||
and ``named_steps_`. The warning was introduced in 0.19 and will take | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing backtick after named_steps_ |
||
effect in 0.22. | ||
:issue:`8350` by :user:`Guillaume Lemaitre <glemaitre>`. | ||
|
||
.. _changes_0_18_1: | ||
|
||
Version 0.18.1 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -129,7 +129,7 @@ def nudge_dataset(X, Y): | |
# Plotting | ||
|
||
plt.figure(figsize=(4.2, 4)) | ||
for i, comp in enumerate(rbm.components_): | ||
for i, comp in enumerate(classifier.named_steps_['rbm'].components_): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We want to remove this use-case, right? (the way it was previously). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no way to warn a user doing this, right? |
||
plt.subplot(10, 10, i + 1) | ||
plt.imshow(comp.reshape((8, 8)), cmap=plt.cm.gray_r, | ||
interpolation='nearest') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have to be used? Or contain the fitted models?