Skip to content
25 changes: 17 additions & 8 deletions doc/modules/compose.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,23 @@
Pipelines and composite estimators
==================================

Transformers are usually combined with classifiers, regressors or other
estimators to build a composite estimator. The most common tool is a
:ref:`Pipeline <pipeline>`. Pipeline is often used in combination with
:ref:`FeatureUnion <feature_union>` which concatenates the output of
transformers into a composite feature space. :ref:`TransformedTargetRegressor
<transformed_target_regressor>` deals with transforming the :term:`target`
(i.e. log-transform :term:`y`). In contrast, Pipelines only transform the
observed data (:term:`X`).
To build a composite estimator, transformers are usually combined with other
transformers or with :term:`predictors` (such as classifiers or regressors).
The most common tool used for composing estimators is a :ref:`Pipeline
<pipeline>`. Pipelines require all steps except the last to be a
:term:`transformer`. The last step can be anything, a transformer, a
:term:`predictor`, or a clustering estimator which might have or not have a
`.predict(...)` method. A pipeline exposes all methods provided by the last
estimator: if the last step provides a `transform` method, then the pipeline
would have a `transform` method and behave like a transformer. If the last step
provides a `predict` method, then the pipeline would expose that method, and
given a data :term:`X`, use all steps except the last to transform the data,
and then give that transformed data to the `predict` method of the last step of
the pipeline. `Pipeline` is often used in combination with :ref:`Column
Transformer <column_transformer>` or :ref:`FeatureUnion <feature_union>` which
concatenate the output of transformers into a composite feature space.
:ref:`TransformedTargetRegressor <transformed_target_regressor>` deals with
transforming the :term:`target` (i.e. log-transform :term:`y`).

.. _pipeline:

Expand Down
26 changes: 15 additions & 11 deletions sklearn/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,15 @@ def check(self):

class Pipeline(_BaseComposition):
"""
Pipeline of transforms with a final estimator.
A sequence of data transformers with an optional final predictor.

`Pipeline` allows you to sequentially apply a list of transformers to
preprocess the data and, if desired, conclude the sequence with a final
:term:`predictor` for predictive modeling.

Sequentially apply a list of transforms and a final estimator.
Intermediate steps of the pipeline must be 'transforms', that is, they
must implement `fit` and `transform` methods.
The final estimator only needs to implement `fit`.
The final :term:`estimator` only needs to implement `fit`.
The transformers in the pipeline can be cached using ``memory`` argument.

The purpose of the pipeline is to assemble several steps that can be
Expand All @@ -81,10 +84,11 @@ class Pipeline(_BaseComposition):

Parameters
----------
steps : list of tuple
List of (name, transform) tuples (implementing `fit`/`transform`) that
are chained in sequential order. The last transform must be an
estimator.
steps : list of tuples
List of (name of step, estimator) tuples that are to be chained in
sequential order. To be compatible with the scikit-learn API, all steps
must define `fit`. All non-last steps must also define `transform`. See
:ref:`Combining Estimators <combining_estimators>` for more details.

memory : str or object with the joblib.Memory interface, default=None
Used to cache the fitted transformers of the pipeline. The last step
Expand Down Expand Up @@ -414,7 +418,7 @@ def _fit(self, X, y=None, routed_params=None):
def fit(self, X, y=None, **params):
"""Fit the model.

Fit all the transformers one after the other and transform the
Fit all the transformers one after the other and sequentially transform the
data. Finally, fit the transformed data using the final estimator.

Parameters
Expand Down Expand Up @@ -478,9 +482,9 @@ def _can_fit_transform(self):
def fit_transform(self, X, y=None, **params):
"""Fit the model and transform with the final estimator.

Fits all the transformers one after the other and transform the
data. Then uses `fit_transform` on transformed data with the final
estimator.
Fit all the transformers one after the other and sequentially transform
the data. Only valid if the final estimator either implements
`fit_transform` or `fit` and `transform`.

Parameters
----------
Expand Down