[MRG+2] TransformedTargetRegressor #9041

glemaitre · 2017-06-07T14:02:03Z

Continutation of #8988

TODO:

User guide doc
API doc
What's new entry

glemaitre · 2017-06-07T16:20:42Z

@jnothman @amueller @dengemann @agramfort Can you have a look before that I am writing some narrative doc accordingly

jnothman

Despite the multitude of comments, overall I think this is what we want. Good work

jnothman · 2017-06-08T00:21:08Z

sklearn/preprocessing/target.py

+
+
+class TransformedTargetRegressor(BaseEstimator, RegressorMixin):
+    """Meta-estimator to apply a transformation to the target before fitting.


"regress on a transformed target" might be simpler than "apply a ..."

jnothman · 2017-06-08T00:21:10Z

sklearn/preprocessing/target.py

+
+    Parameters
+    ----------
+    estimator : object, (default=LinearRegression())


maybe should call this regressor, because transformer is also an estimator.

jnothman · 2017-06-08T00:21:12Z

sklearn/preprocessing/target.py

+    Parameters
+    ----------
+    estimator : object, (default=LinearRegression())
+        Estimator object derived from ``RegressorMixin``.


we don't usually require inheritance, as long as appropriate methods are available. You could say "such as derived from" ...

Perhaps mention that it will be cloned.

jnothman · 2017-06-08T00:21:14Z

sklearn/preprocessing/target.py

+        Estimator object derived from ``RegressorMixin``.
+
+    transformer : object, (default=None)
+        Estimator object derived from ``TransformerMixin``. Cannot be set at


We don't usually require inheritance.

Perhaps mention that it will be cloned.

jnothman · 2017-06-08T00:21:16Z

sklearn/preprocessing/target.py

+        ``func`` and ``inverse_func`` are ``None`` as well, the transformer
+        will be an identity transformer.
+
+    func : function, (default=None)


I'd prefer "optional" to "default=None" which has no clear semantics.

Though in this case the semantics of "optional" are not obvious either; the reader needs to look 2 lines below anyway.

I hardly see that as a problem, given the context.

Just sayin', if this says "optional" instead of "default=None", the docstring below should say "if not passed" instead of "If None". (or the reader needs to scroll and check that the default value is None)

jnothman · 2017-06-08T00:21:59Z

sklearn/preprocessing/target.py

+        self._validate_transformer(y_2d)
+        self.estimator_ = clone(self.estimator)
+        self.estimator_.fit(X, self.transformer_.transform(y_2d),
+                            sample_weight=sample_weight)


I think the current convention is to pass sample_weight only when sample_weight is not None

jnothman · 2017-06-08T00:22:01Z

sklearn/preprocessing/target.py

+            return pred
+
+    def score(self, X, y, sample_weight=None):
+        """Returns the coefficient of determination R^2 of the prediction.


Should state here that scoring is performed in the original, not the transformed, space.

jnothman · 2017-06-08T00:22:04Z

sklearn/preprocessing/tests/test_label.py

 from sklearn.preprocessing.label import _inverse_binarize_thresholding
 from sklearn.preprocessing.label import _inverse_binarize_multiclass

 from sklearn import datasets

 iris = datasets.load_iris()
+friedman = datasets.make_friedman1(random_state=0)


jnothman · 2017-06-08T00:22:07Z

sklearn/preprocessing/tests/test_target.py

+    assert_array_almost_equal((y - y_mean) / y_std, y_tran)
+    assert_array_almost_equal(y, np.ravel(clf.transformer_.inverse_transform(
+        y_tran.reshape(-1, 1))))
+    assert_equal(y.shape, pred.shape)


You've failed to test that clf.estimator_ was passed the transformed y.

A better test would just check the equivalence between clf.estimator_.coef_ and LinearRegression().fit(X, StandardScaler().fit_transform(y[:, None])[:, 0]).coef_.

You've also not tested the handling of sample_weight.

yeah this would be a good test, too.
I think testing coef_ and also testing pred would be good.

jnothman · 2017-06-08T00:22:58Z

sklearn/preprocessing/target.py

+        # memorize if y should be a multi-output
+        self.y_ndim_ = y.ndim
+        if y.ndim == 1:
+            y_2d = y.reshape(-1, 1)


I suspect we don't want to do this when func and inverse_func are provided?

I come back on this point. I am not sure this is a great idea since that it changes the behaviour between if you pass a function or a transformer. We could still make the transform to 2d and build the FunctionTransformer with validate=True and the behaviour will always be the same.

I don't see a case in which the user would define a function which working on a 1D array but would failed on a 2D array

jnothman · 2017-06-08T11:27:26Z

As long as the value of having inverse_func is clear in the docs, I don't mind

…

On 8 Jun 2017 9:13 pm, "Guillaume Lemaitre" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/preprocessing/target.py <#9041 (comment)> : > + ---------- + estimator : object, (default=LinearRegression()) + Estimator object derived from ``RegressorMixin``. + + transformer : object, (default=None) + Estimator object derived from ``TransformerMixin``. Cannot be set at + the same time as ``func`` and ``inverse_func``. If ``None`` and + ``func`` and ``inverse_func`` are ``None`` as well, the transformer + will be an identity transformer. + + func : function, (default=None) + Function to apply to ``y`` before passing to ``fit``. Cannot be set at + the same time than ``transformer``. If ``None`` and ``transformer`` is + ``None`` as well, the function used will be the identity function. + + inverse_func : function, (default=None) Since None will lead to the identity function and that we don't enforce func and inverse_func to be be actually the inverse of each other, I am not sure that inverse_func should be required. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9041 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66KUL8Dsk0gJ8GD5sNskIppWzVzvks5sB9dggaJpZM4NywA7> .

glemaitre · 2017-06-09T00:14:26Z

@jnothman I miss the what's new but I added some doc and address almost all comments. I am just unsure about fit + transform vs fit_transform and there implications.

jnothman · 2017-06-09T02:10:06Z

sklearn/preprocessing/target.py

+        self._fit_transformer(y_2d, sample_weight)
+        self.regressor_ = clone(self.regressor)
+        if sample_weight is not None:
+            self.regressor_.fit(X, self.transformer_.transform(y_2d),


I mean that we should really be using fit_transform to produce the downstream y here

jnothman · 2017-06-09T02:10:50Z

fit_transform is mostly used for efficiency

…

On 9 Jun 2017 10:14 am, "Guillaume Lemaitre" ***@***.***> wrote: @jnothman <https://github.com/jnothman> I miss the what's new but I added some doc and address almost all comments. I am just unsure about fit + transform vs fit_transform and there implications. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9041 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6_rPW2YLTp-Io7H1I2ev5xFXULsmks5sCI5kgaJpZM4NywA7> .

vene · 2017-06-09T09:51:08Z

doc/whats_new.rst

@@ -31,6 +31,10 @@ Changelog
 New features
 ............

+   - Added the :class:`sklearn.preprocessing.TransformedTargetRegressor` which
+     is a meta-estimator to regress on a modified ``y``. :issue:`9041` by


just to make this more relatable, describe a small use case, maybe
"for example, to perform regression in log-space"

or something?

amueller · 2017-11-28T22:53:50Z

How about "andydoesntcryhimselftosleepatnightanymorebecausehefinallyhasthetoolshewants"? Or to generic?

glemaitre · 2017-11-28T22:54:00Z

Let's make a separate issue for this so that we don't crowd out the discussion here.

Then this PR will be invaded by a pink unicorn :)

GaelVaroquaux · 2017-11-28T22:54:23Z

If glue is too generic, how about sklearn.adhesive? haha...

You nailed my problem with "glue". Glue is not something that I use to connect tubes. How about "tubes", "connectors".

jnothman

A couple of small things to test. And the naming/placement questions stand. Apart from which LGTM.

jnothman · 2017-11-28T22:23:01Z

doc/modules/preprocessing_targets.rst

+  >>> def inverse_func(x):
+  ...     return x
+  >>> regr = TransformedTargetRegressor(regressor=regressor,
+  ...                                 func=func,


indentation

This is not fixed

jnothman · 2017-11-28T22:24:13Z

examples/preprocessing/plot_transformed_target.py

+# non-negative and (ii) applying an exponential function to obtain non-linear
+# targets which cannot be fitted using a simple linear model.
+#
+# Therefore, a logarithmic and an exponential functions will be used to


functions -> function

jnothman · 2017-11-28T22:25:44Z

examples/preprocessing/plot_transformed_target.py

+
+regr_trans = TransformedTargetRegressor(
+    regressor=RidgeCV(),
+    transformer=QuantileTransformer(output_distribution='normal'))


@amueller, maybe this is somewhere we can illustrate PowerTransformer rather than changing #10210

jnothman · 2017-11-28T22:27:03Z

sklearn/preprocessing/target.py

+    >>> from sklearn.linear_model import LinearRegression
+    >>> from sklearn.preprocessing import TransformedTargetRegressor
+    >>> tt = TransformedTargetRegressor(regressor=LinearRegression(),
+    ...                               func=np.log, inverse_func=np.exp)


indentation

jnothman · 2017-11-28T22:27:51Z

sklearn/preprocessing/target.py

+    -----
+    Internally, the target ``y`` is always converted into a 2-dimensional array
+    to be used by scikit-learn transformers. At the time of prediction, the
+    output will be reshaped to a have the same number of dimension as ``y``.


*dimensions

jnothman · 2017-11-28T23:48:30Z

sklearn/preprocessing/tests/test_target.py

+    regr = TransformedTargetRegressor(regressor=LinearRegression(),
+                                      func=np.sqrt, inverse_func=np.log,
+                                      check_inverse=False)
+    # the transformer/functions are not checked to be invertible the fitting


I'd drop this comment, but would make it clearer by replacing the previous statement with regr.set_params(check_inverse=False)

jnothman · 2017-11-28T23:49:26Z

sklearn/preprocessing/tests/test_target.py

+    y_tran = regr.transformer_.transform(y)
+    assert_allclose(np.log(y), y_tran)
+    assert_allclose(y, regr.transformer_.inverse_transform(y_tran))
+    assert_equal(y.shape, y_pred.shape)


with pytest, we can just use a bare assert y.shape == y_pred.shape, which I find much more legible.

jnothman · 2017-11-28T23:52:40Z

sklearn/preprocessing/tests/test_target.py

+    assert_allclose(regr.regressor_.coef_, lr.coef_)
+
+
+def test_transform_target_regressor_1d_transformer_multioutput():


It is hard to see how similar or different the code is here from the previous test. Perhaps use a loop or or a check function or pytest.mark.parametrize

jnothman · 2017-11-28T23:53:23Z

sklearn/preprocessing/tests/test_target.py

+    assert_allclose(regr.regressor_.coef_, lr.coef_)
+
+
+def test_transform_target_regressor_2d_transformer_multioutput():


same here that it's hard to see how similar or different the tests are.

jnothman · 2017-11-28T23:59:20Z

sklearn/preprocessing/tests/test_target.py

+    # check that the target ``y`` passed to the transformer will always be a
+    # numpy array
+    X, y = friedman
+    tt = TransformedTargetRegressor(transformer=DummyTransformer(),


Can you please check similarly that the predictor receives X as a list? Thanks.

glemaitre · 2017-11-29T13:44:59Z

Done

amueller · 2017-12-12T19:16:31Z

Is the +1 from me or @jnothman or @ogrisel? hm...

amueller · 2017-12-12T19:23:39Z

the real world example is not very convincing... though I'm not sure there's a better one with the data we have....

jnothman · 2017-12-13T01:44:27Z

It's your +1 in the title, I think, or at least it's not mine...

jnothman · 2017-12-13T01:45:08Z

I think this is also just waiting on where to put in...

glemaitre · 2017-12-13T07:23:48Z

The +1 is from Andy during the scikit learn sprint :D

amueller · 2017-12-13T16:59:44Z

I think it's fine where it is ;)

amueller · 2017-12-13T17:00:43Z

We can always move before the release, I think delaying features for module naming bike-shedding will get us in trouble...

GaelVaroquaux · 2017-12-13T17:04:44Z

I am ok with having it in preprocessing in the interest of moving forward.

jnothman · 2017-12-13T22:30:19Z

Let's do it! Thanks, @glemaitre.

amueller · 2017-12-14T21:05:15Z

One less hack for my class! This is moving forward quite nicely lol. (PowerTransformer was another). Can we do KNN imputation, missing value features and ColumnTransformer next? Oh and blanced random forests (though actually imblearn has it now :)? I think then I'm good... just need to implement a decent time series library in python, or something...

Add a new meta-regressor which transforms y for training.

jnothman · 2018-02-28T03:27:18Z

doc/whats_new/v0.20.rst

@@ -77,6 +77,11 @@ Model evaluation
 - Added :class:`multioutput.RegressorChain` for multi-target
  regression. :issue:`9257` by :user:`Kumar Ashutosh <thechargedneutron>`.

+- Added the :class:`preprocessing.TransformedTargetRegressor` which transforms


For some reason this had disappeared from what's new and I've just reinserted it :\

amueller and others added 10 commits June 5, 2017 17:35

implement target transformer

21a2ff3

make example use log and ex

7b1f7e8

some docstrings

b306fa5

EHN/TST advance TTR

7d9badf

FIX call fit of the transformer at validation time

97da7a3

EHN/TST ravel y when needed

61a543a

FIX address comment Andy

de8dbb4

EHN add LinearRegression

254fac2

Merge remote-tracking branch 'origin/master' into targettransformer

53c7c81

EHN move to target file

693de84

glemaitre changed the title ~~Targettransformer~~ [WIP] TransformedTargetRegressor Jun 7, 2017

FIX/EXA fix example in the docstring

3dafc8f

jnothman reviewed Jun 8, 2017

View reviewed changes

Merge remote-tracking branch 'origin/master' into targettransformer

27f1c43

glemaitre added 3 commits June 8, 2017 15:18

ENH address comments

73bbcaf

Merge remote-tracking branch 'origin/master' into targettransformer

e6a4e7d

DOC narrative doc for ttr

503a985

glemaitre changed the title ~~[WIP] TransformedTargetRegressor~~ [MRG] TransformedTargetRegressor Jun 8, 2017

glemaitre changed the title ~~[MRG] TransformedTargetRegressor~~ [WIP] TransformedTargetRegressor Jun 8, 2017

jnothman reviewed Jun 9, 2017

View reviewed changes

DOC update whats new and docstring

63dbe9a

glemaitre changed the title ~~[WIP] TransformedTargetRegressor~~ [MRG] TransformedTargetRegressor Jun 9, 2017

vene reviewed Jun 9, 2017

View reviewed changes

glemaitre added 3 commits June 9, 2017 11:56

Update whats new

9feafda

Remove useless changes

32a85a6

Update whats new

af51cf8

jnothman mentioned this pull request Nov 28, 2017

Potential new subpackage for pipeline and featureunion-like tools #10215

Closed

jnothman reviewed Nov 29, 2017

View reviewed changes

jnothman mentioned this pull request Nov 29, 2017

[MRG+1] Feature: Implement PowerTransformer #10210

Merged

address joels comments

64f5d52

glemaitre added 4 commits December 13, 2017 18:05

Merge branch 'master' into targettransformer

5929f81

MAINT rename module name

d637038

DOC fix indent

790c86a

FIX add the new module

bbee2be

jnothman changed the title ~~[MRG + 1] TransformedTargetRegressor~~ [MRG+2] TransformedTargetRegressor Dec 13, 2017

jnothman merged commit 4f710cd into scikit-learn:master Dec 13, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

FEA TransformedTargetRegressor (scikit-learn#9041)

4ca2bf8

Add a new meta-regressor which transforms y for training.

jnothman mentioned this pull request Feb 13, 2018

WIP/RFC Targettransformer #8988

Closed

jnothman reviewed Feb 28, 2018

View reviewed changes

glemaitre mentioned this pull request Dec 13, 2019

TransformedTargetRegressor returns the wrong _estimator_type with a classifier #15872

Closed

lorentzenchr mentioned this pull request May 10, 2020

Bias correction for TransformedTargetRegressor #15881

Open

glemaitre mentioned this pull request Sep 8, 2020

Add sample_weight fit param for Pipeline #18159

Closed



		class TransformedTargetRegressor(BaseEstimator, RegressorMixin):
		"""Meta-estimator to apply a transformation to the target before fitting.

		assert_allclose(regr.regressor_.coef_, lr.coef_)


		def test_transform_target_regressor_1d_transformer_multioutput():

		assert_allclose(regr.regressor_.coef_, lr.coef_)


		def test_transform_target_regressor_2d_transformer_multioutput():

[MRG+2] TransformedTargetRegressor #9041

[MRG+2] TransformedTargetRegressor #9041

Conversation

glemaitre commented Jun 7, 2017 • edited Loading

glemaitre commented Jun 7, 2017 • edited Loading

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Jun 8, 2017 via email

glemaitre commented Jun 9, 2017

Choose a reason for hiding this comment

jnothman commented Jun 9, 2017 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Nov 28, 2017

glemaitre commented Nov 28, 2017

GaelVaroquaux commented Nov 28, 2017 via email

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Nov 29, 2017

amueller commented Dec 12, 2017

amueller commented Dec 12, 2017

jnothman commented Dec 13, 2017

jnothman commented Dec 13, 2017

glemaitre commented Dec 13, 2017 via email

amueller commented Dec 13, 2017

amueller commented Dec 13, 2017

GaelVaroquaux commented Dec 13, 2017 via email

jnothman commented Dec 13, 2017

amueller commented Dec 14, 2017

Choose a reason for hiding this comment

glemaitre commented Jun 7, 2017 •

edited

Loading

glemaitre commented Jun 7, 2017 •

edited

Loading