Skip to content

[MRG+2] TransformedTargetRegressor #9041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 83 commits into from
Dec 13, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
21a2ff3
implement target transformer
amueller Jun 5, 2017
7b1f7e8
make example use log and ex
amueller Jun 5, 2017
b306fa5
some docstrings
amueller Jun 5, 2017
7d9badf
EHN/TST advance TTR
glemaitre Jun 6, 2017
97da7a3
FIX call fit of the transformer at validation time
glemaitre Jun 6, 2017
61a543a
EHN/TST ravel y when needed
glemaitre Jun 7, 2017
de8dbb4
FIX address comment Andy
glemaitre Jun 7, 2017
254fac2
EHN add LinearRegression
glemaitre Jun 7, 2017
53c7c81
Merge remote-tracking branch 'origin/master' into targettransformer
glemaitre Jun 7, 2017
693de84
EHN move to target file
glemaitre Jun 7, 2017
3dafc8f
FIX/EXA fix example in the docstring
glemaitre Jun 7, 2017
27f1c43
Merge remote-tracking branch 'origin/master' into targettransformer
glemaitre Jun 8, 2017
73bbcaf
ENH address comments
glemaitre Jun 8, 2017
e6a4e7d
Merge remote-tracking branch 'origin/master' into targettransformer
glemaitre Jun 8, 2017
503a985
DOC narrative doc for ttr
glemaitre Jun 8, 2017
63dbe9a
DOC update whats new and docstring
glemaitre Jun 9, 2017
9feafda
Update whats new
glemaitre Jun 9, 2017
32a85a6
Remove useless changes
glemaitre Jun 9, 2017
af51cf8
Update whats new
glemaitre Jun 9, 2017
dcae366
address comments
glemaitre Jun 9, 2017
d8310ad
Merge branch 'targettransformer' of github.com:glemaitre/scikit-learn…
glemaitre Jun 9, 2017
4c3ab11
DOC change to bostong dataset
glemaitre Jun 9, 2017
49ea3c4
Remove score
glemaitre Jun 9, 2017
f1a7289
add the estimator to multioutput
glemaitre Jun 9, 2017
ffe6892
Rename the class
glemaitre Jun 9, 2017
2a868ee
gael comments
glemaitre Jun 9, 2017
18c66c6
revert example
glemaitre Jun 9, 2017
85a8865
FIX docstring and commont test
glemaitre Jun 10, 2017
7a10796
FIX solve issue circular import
glemaitre Jun 10, 2017
086fba0
FIX circular import
glemaitre Jun 10, 2017
44ea999
Merge remote-tracking branch 'origin/master' into targettransformer
glemaitre Jun 10, 2017
6c4734e
DOC/FIX/TST test 1d/2d y support transformer and vlad comments
glemaitre Jun 10, 2017
3ecde9f
TST apply change of manoj
glemaitre Jun 14, 2017
5e7d6c9
TST apply change of manoj
glemaitre Jun 14, 2017
db4bf57
TST factorize single- multi-output test
glemaitre Jun 14, 2017
0fe1622
FIX ensure at least 1d array
glemaitre Jun 14, 2017
8b94056
TST add test for support of sample weight
glemaitre Jun 14, 2017
01d94e2
EXA add example
glemaitre Jun 14, 2017
a0b84c4
DOC fix
glemaitre Jun 14, 2017
437dfaa
DOC revert author
glemaitre Jun 14, 2017
36968ba
FIX minor fix in example and doc
glemaitre Jun 14, 2017
d253fcd
DOC fixes
glemaitre Jun 15, 2017
451dfd3
FIX remove useless import
glemaitre Jun 15, 2017
19a6f94
Remove absolute tolerance
glemaitre Jul 26, 2017
9e07197
Merge branch 'master' into targettransformer
glemaitre Jul 26, 2017
0ddfee0
TST single to multi and regressor checking
glemaitre Aug 1, 2017
8392cc5
pass sample_weight directly
glemaitre Aug 2, 2017
a0bf0b0
PEP8
glemaitre Aug 2, 2017
18bcec0
use is_regressor instead of tag
glemaitre Aug 4, 2017
f3e151f
Merge branch 'master' into targettransformer
glemaitre Aug 17, 2017
85cc14c
TST split tests
glemaitre Aug 18, 2017
075bf92
Merge remote-tracking branch 'glemaitre/targettransformer' into targe…
glemaitre Aug 18, 2017
51583c2
TST fix multi to single test
glemaitre Aug 21, 2017
9853552
solve the issue if the function are not 2d
glemaitre Aug 21, 2017
a1998fa
DOC update docstring
glemaitre Aug 21, 2017
7c8c0ca
DOC fix docstring
glemaitre Aug 22, 2017
97330b0
Merge remote-tracking branch 'origin/master' into targettransformer
glemaitre Sep 3, 2017
7f13b9a
TST check compatibility 1D 2D fuction even if supposidely not supported
glemaitre Sep 3, 2017
ae973f8
TST relax equality in prediction
glemaitre Sep 3, 2017
129373d
TST remove single to multi case
glemaitre Sep 3, 2017
9064f24
Address olivier and johel comments
glemaitre Sep 4, 2017
3d80728
not enforcing regressor
glemaitre Sep 4, 2017
5f9db73
Renamed to TransformedTargetRegressor
glemaitre Sep 4, 2017
58c5506
DOC reformat plot titles
ogrisel Sep 5, 2017
d0f83fa
change naming functions
glemaitre Sep 7, 2017
4e61395
DOC fixing title
glemaitre Sep 7, 2017
687703b
Merge remote-tracking branch 'origin/master' into targettransformer
glemaitre Oct 25, 2017
500a77c
DOC fix merge git mess
glemaitre Oct 25, 2017
35cb75d
TST/EHN only squeeze when ndim == 1
glemaitre Oct 30, 2017
9a939f3
TST forgot to call fit
glemaitre Oct 30, 2017
00e6d78
Merge remote-tracking branch 'origin/master' into targettransformer
glemaitre Oct 31, 2017
04dc4a7
FIX pass check_inverse to FunctionTransformer
glemaitre Oct 31, 2017
f757c10
DOC remove blank lines
glemaitre Nov 6, 2017
214fde6
Add comments and lift constraint upon X
glemaitre Nov 10, 2017
68c5b7e
avoid type conversion since this is in check_array
glemaitre Nov 10, 2017
9976ace
Merge branch 'master' into targettransformer
glemaitre Nov 28, 2017
3c99cde
TST check that y is always converted to array before transformer call
glemaitre Nov 28, 2017
0b364f6
reverse right of plot_ols
glemaitre Nov 28, 2017
64f5d52
address joels comments
glemaitre Nov 29, 2017
5929f81
Merge branch 'master' into targettransformer
glemaitre Dec 13, 2017
d637038
MAINT rename module name
glemaitre Dec 13, 2017
790c86a
DOC fix indent
glemaitre Dec 13, 2017
bbee2be
FIX add the new module
glemaitre Dec 13, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1206,6 +1206,7 @@ Model validation
preprocessing.QuantileTransformer
preprocessing.RobustScaler
preprocessing.StandardScaler
preprocessing.TransformedTargetRegressor

.. autosummary::
:toctree: generated/
Expand Down
71 changes: 70 additions & 1 deletion doc/modules/preprocessing_targets.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

.. currentmodule:: sklearn.preprocessing

.. _preprocessing_targets:
Expand All @@ -7,6 +6,76 @@
Transforming the prediction target (``y``)
==========================================

Transforming target in regression
---------------------------------

:class:`TransformedTargetRegressor` transforms the targets ``y`` before fitting a
regression model. The predictions are mapped back to the original space via an
inverse transform. It takes as an argument the regressor that will be used for
prediction, and the transformer that will be applied to the target variable::

>>> import numpy as np
>>> from sklearn.datasets import load_boston
>>> from sklearn.preprocessing import (TransformedTargetRegressor,
... QuantileTransformer)
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.model_selection import train_test_split
>>> boston = load_boston()
>>> X = boston.data
>>> y = boston.target
>>> transformer = QuantileTransformer(output_distribution='normal')
>>> regressor = LinearRegression()
>>> regr = TransformedTargetRegressor(regressor=regressor,
... transformer=transformer)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
>>> regr.fit(X_train, y_train) # doctest: +ELLIPSIS
TransformedTargetRegressor(...)
>>> print('R2 score: {0:.2f}'.format(regr.score(X_test, y_test)))
R2 score: 0.67
>>> raw_target_regr = LinearRegression().fit(X_train, y_train)
>>> print('R2 score: {0:.2f}'.format(raw_target_regr.score(X_test, y_test)))
R2 score: 0.64

For simple transformations, instead of a Transformer object, a pair of
functions can be passed, defining the transformation and its inverse mapping::

>>> from __future__ import division
>>> def func(x):
... return np.log(x)
>>> def inverse_func(x):
... return np.exp(x)

Subsequently, the object is created as::

>>> regr = TransformedTargetRegressor(regressor=regressor,
... func=func,
... inverse_func=inverse_func)
>>> regr.fit(X_train, y_train) # doctest: +ELLIPSIS
TransformedTargetRegressor(...)
>>> print('R2 score: {0:.2f}'.format(regr.score(X_test, y_test)))
R2 score: 0.65

By default, the provided functions are checked at each fit to be the inverse of
each other. However, it is possible to bypass this checking by setting
``check_inverse`` to ``False``::

>>> def inverse_func(x):
... return x
>>> regr = TransformedTargetRegressor(regressor=regressor,
... func=func,
... inverse_func=inverse_func,
... check_inverse=False)
>>> regr.fit(X_train, y_train) # doctest: +ELLIPSIS
TransformedTargetRegressor(...)
>>> print('R2 score: {0:.2f}'.format(regr.score(X_test, y_test)))
R2 score: -4.50

.. note::

The transformation can be triggered by setting either ``transformer`` or the
pair of functions ``func`` and ``inverse_func``. However, setting both
options will raise an error.

Label binarization
------------------

Expand Down
5 changes: 5 additions & 0 deletions doc/whats_new/v0.20.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,11 @@ Model evaluation
- Added :class:`multioutput.RegressorChain` for multi-target
regression. :issue:`9257` by :user:`Kumar Ashutosh <thechargedneutron>`.

- Added the :class:`preprocessing.TransformedTargetRegressor` which transforms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason this had disappeared from what's new and I've just reinserted it :\

the target y before fitting a regression model. The predictions are mapped
back to the original space via an inverse transform. :issue:`9041` by
`Andreas Müller`_ and :user:`Guillaume Lemaitre <glemaitre>`.

Enhancements
............

Expand Down
205 changes: 205 additions & 0 deletions examples/preprocessing/plot_transformed_target.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
======================================================
Effect of transforming the targets in regression model
======================================================

In this example, we give an overview of the
:class:`sklearn.preprocessing.TransformedTargetRegressor`. Two examples
illustrate the benefit of transforming the targets before learning a linear
regression model. The first example uses synthetic data while the second
example is based on the Boston housing data set.

"""

# Author: Guillaume Lemaitre <guillaume.lemaitre@inria.fr>
# License: BSD 3 clause

from __future__ import print_function, division

import numpy as np
import matplotlib.pyplot as plt

print(__doc__)

###############################################################################
# Synthetic example
###############################################################################

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RidgeCV
from sklearn.preprocessing import TransformedTargetRegressor
from sklearn.metrics import median_absolute_error, r2_score

###############################################################################
# A synthetic random regression problem is generated. The targets ``y`` are
# modified by: (i) translating all targets such that all entries are
# non-negative and (ii) applying an exponential function to obtain non-linear
# targets which cannot be fitted using a simple linear model.
#
# Therefore, a logarithmic and an exponential function will be used to
# transform the targets before training a linear regression model and using it
# for prediction.


def log_transform(x):
return np.log(x + 1)


def exp_transform(x):
return np.exp(x) - 1


X, y = make_regression(n_samples=10000, noise=100, random_state=0)
y = np.exp((y + abs(y.min())) / 200)
y_trans = log_transform(y)

###############################################################################
# The following illustrate the probability density functions of the target
# before and after applying the logarithmic functions.

f, (ax0, ax1) = plt.subplots(1, 2)

ax0.hist(y, bins='auto', normed=True)
ax0.set_xlim([0, 2000])
ax0.set_ylabel('Probability')
ax0.set_xlabel('Target')
ax0.set_title('Target distribution')

ax1.hist(y_trans, bins='auto', normed=True)
ax1.set_ylabel('Probability')
ax1.set_xlabel('Target')
ax1.set_title('Transformed target distribution')

f.suptitle("Synthetic data", y=0.035)
f.tight_layout(rect=[0.05, 0.05, 0.95, 0.95])

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

###############################################################################
# At first, a linear model will be applied on the original targets. Due to the
# non-linearity, the model trained will not be precise during the
# prediction. Subsequently, a logarithmic function is used to linearize the
# targets, allowing better prediction even with a similar linear model as
# reported by the median absolute error (MAE).

f, (ax0, ax1) = plt.subplots(1, 2, sharey=True)

regr = RidgeCV()
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)

ax0.scatter(y_test, y_pred)
ax0.plot([0, 2000], [0, 2000], '--k')
ax0.set_ylabel('Target predicted')
ax0.set_xlabel('True Target')
ax0.set_title('Ridge regression \n without target transformation')
ax0.text(100, 1750, r'$R^2$=%.2f, MAE=%.2f' % (
r2_score(y_test, y_pred), median_absolute_error(y_test, y_pred)))
ax0.set_xlim([0, 2000])
ax0.set_ylim([0, 2000])

regr_trans = TransformedTargetRegressor(regressor=RidgeCV(),
func=log_transform,
inverse_func=exp_transform)
regr_trans.fit(X_train, y_train)
y_pred = regr_trans.predict(X_test)

ax1.scatter(y_test, y_pred)
ax1.plot([0, 2000], [0, 2000], '--k')
ax1.set_ylabel('Target predicted')
ax1.set_xlabel('True Target')
ax1.set_title('Ridge regression \n with target transformation')
ax1.text(100, 1750, r'$R^2$=%.2f, MAE=%.2f' % (
r2_score(y_test, y_pred), median_absolute_error(y_test, y_pred)))
ax1.set_xlim([0, 2000])
ax1.set_ylim([0, 2000])

f.suptitle("Synthetic data", y=0.035)
f.tight_layout(rect=[0.05, 0.05, 0.95, 0.95])

###############################################################################
# Real-world data set
###############################################################################

###############################################################################
# In a similar manner, the boston housing data set is used to show the impact
# of transforming the targets before learning a model. In this example, the
# targets to be predicted corresponds to the weighted distances to the five
# Boston employment centers.

from sklearn.datasets import load_boston
from sklearn.preprocessing import QuantileTransformer, quantile_transform

dataset = load_boston()
target = np.array(dataset.feature_names) == "DIS"
X = dataset.data[:, np.logical_not(target)]
y = dataset.data[:, target].squeeze()
y_trans = quantile_transform(dataset.data[:, target],
output_distribution='normal').squeeze()

###############################################################################
# A :class:`sklearn.preprocessing.QuantileTransformer` is used such that the
# targets follows a normal distribution before applying a
# :class:`sklearn.linear_model.RidgeCV` model.

f, (ax0, ax1) = plt.subplots(1, 2)

ax0.hist(y, bins='auto', normed=True)
ax0.set_ylabel('Probability')
ax0.set_xlabel('Target')
ax0.set_title('Target distribution')

ax1.hist(y_trans, bins='auto', normed=True)
ax1.set_ylabel('Probability')
ax1.set_xlabel('Target')
ax1.set_title('Transformed target distribution')

f.suptitle("Boston housing data: distance to employment centers", y=0.035)
f.tight_layout(rect=[0.05, 0.05, 0.95, 0.95])

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

###############################################################################
# The effect of the transformer is weaker than on the synthetic data. However,
# the transform induces a decrease of the MAE.

f, (ax0, ax1) = plt.subplots(1, 2, sharey=True)

regr = RidgeCV()
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)

ax0.scatter(y_test, y_pred)
ax0.plot([0, 10], [0, 10], '--k')
ax0.set_ylabel('Target predicted')
ax0.set_xlabel('True Target')
ax0.set_title('Ridge regression \n without target transformation')
ax0.text(1, 9, r'$R^2$=%.2f, MAE=%.2f' % (
r2_score(y_test, y_pred), median_absolute_error(y_test, y_pred)))
ax0.set_xlim([0, 10])
ax0.set_ylim([0, 10])

regr_trans = TransformedTargetRegressor(
regressor=RidgeCV(),
transformer=QuantileTransformer(output_distribution='normal'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller, maybe this is somewhere we can illustrate PowerTransformer rather than changing #10210

regr_trans.fit(X_train, y_train)
y_pred = regr_trans.predict(X_test)

ax1.scatter(y_test, y_pred)
ax1.plot([0, 10], [0, 10], '--k')
ax1.set_ylabel('Target predicted')
ax1.set_xlabel('True Target')
ax1.set_title('Ridge regression \n with target transformation')
ax1.text(1, 9, r'$R^2$=%.2f, MAE=%.2f' % (
r2_score(y_test, y_pred), median_absolute_error(y_test, y_pred)))
ax1.set_xlim([0, 10])
ax1.set_ylim([0, 10])

f.suptitle("Boston housing data: distance to employment centers", y=0.035)
f.tight_layout(rect=[0.05, 0.05, 0.95, 0.95])

plt.show()
3 changes: 2 additions & 1 deletion sklearn/preprocessing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@
from .data import OneHotEncoder
from .data import PowerTransformer
from .data import CategoricalEncoder

from .data import PolynomialFeatures

from .label import label_binarize
from .label import LabelBinarizer
from .label import LabelEncoder
from .label import MultiLabelBinarizer

from ._target import TransformedTargetRegressor
from .imputation import Imputer


Expand All @@ -53,6 +53,7 @@
'PowerTransformer',
'RobustScaler',
'StandardScaler',
'TransformedTargetRegressor',
'add_dummy_feature',
'PolynomialFeatures',
'binarize',
Expand Down
Loading