-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] TransformedTargetRegressor #9041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jnothman @amueller @dengemann @agramfort Can you have a look before that I am writing some narrative doc accordingly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the multitude of comments, overall I think this is what we want. Good work
sklearn/preprocessing/target.py
Outdated
|
||
|
||
class TransformedTargetRegressor(BaseEstimator, RegressorMixin): | ||
"""Meta-estimator to apply a transformation to the target before fitting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"regress on a transformed target" might be simpler than "apply a ..."
sklearn/preprocessing/target.py
Outdated
|
||
Parameters | ||
---------- | ||
estimator : object, (default=LinearRegression()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe should call this regressor
, because transformer
is also an estimator.
sklearn/preprocessing/target.py
Outdated
Parameters | ||
---------- | ||
estimator : object, (default=LinearRegression()) | ||
Estimator object derived from ``RegressorMixin``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't usually require inheritance, as long as appropriate methods are available. You could say "such as derived from" ...
Perhaps mention that it will be cloned.
sklearn/preprocessing/target.py
Outdated
Estimator object derived from ``RegressorMixin``. | ||
|
||
transformer : object, (default=None) | ||
Estimator object derived from ``TransformerMixin``. Cannot be set at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't usually require inheritance.
Perhaps mention that it will be cloned.
sklearn/preprocessing/target.py
Outdated
``func`` and ``inverse_func`` are ``None`` as well, the transformer | ||
will be an identity transformer. | ||
|
||
func : function, (default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer "optional" to "default=None" which has no clear semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though in this case the semantics of "optional" are not obvious either; the reader needs to look 2 lines below anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hardly see that as a problem, given the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sayin', if this says "optional" instead of "default=None", the docstring below should say "if not passed" instead of "If None
". (or the reader needs to scroll and check that the default value is None)
sklearn/preprocessing/target.py
Outdated
self._validate_transformer(y_2d) | ||
self.estimator_ = clone(self.estimator) | ||
self.estimator_.fit(X, self.transformer_.transform(y_2d), | ||
sample_weight=sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current convention is to pass sample_weight
only when sample_weight
is not None
sklearn/preprocessing/target.py
Outdated
return pred | ||
|
||
def score(self, X, y, sample_weight=None): | ||
"""Returns the coefficient of determination R^2 of the prediction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should state here that scoring is performed in the original, not the transformed, space.
from sklearn.preprocessing.label import _inverse_binarize_thresholding | ||
from sklearn.preprocessing.label import _inverse_binarize_multiclass | ||
|
||
from sklearn import datasets | ||
|
||
iris = datasets.load_iris() | ||
friedman = datasets.make_friedman1(random_state=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert?
assert_array_almost_equal((y - y_mean) / y_std, y_tran) | ||
assert_array_almost_equal(y, np.ravel(clf.transformer_.inverse_transform( | ||
y_tran.reshape(-1, 1)))) | ||
assert_equal(y.shape, pred.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've failed to test that clf.estimator_
was passed the transformed y.
A better test would just check the equivalence between clf.estimator_.coef_
and LinearRegression().fit(X, StandardScaler().fit_transform(y[:, None])[:, 0]).coef_
.
You've also not tested the handling of sample_weight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this would be a good test, too.
I think testing coef_ and also testing pred would be good.
sklearn/preprocessing/target.py
Outdated
# memorize if y should be a multi-output | ||
self.y_ndim_ = y.ndim | ||
if y.ndim == 1: | ||
y_2d = y.reshape(-1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we don't want to do this when func
and inverse_func
are provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I come back on this point. I am not sure this is a great idea since that it changes the behaviour between if you pass a function or a transformer. We could still make the transform to 2d and build the FunctionTransformer with validate=True
and the behaviour will always be the same.
I don't see a case in which the user would define a function which working on a 1D array but would failed on a 2D array
As long as the value of having inverse_func is clear in the docs, I don't
mind
…On 8 Jun 2017 9:13 pm, "Guillaume Lemaitre" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/preprocessing/target.py
<#9041 (comment)>
:
> + ----------
+ estimator : object, (default=LinearRegression())
+ Estimator object derived from ``RegressorMixin``.
+
+ transformer : object, (default=None)
+ Estimator object derived from ``TransformerMixin``. Cannot be set at
+ the same time as ``func`` and ``inverse_func``. If ``None`` and
+ ``func`` and ``inverse_func`` are ``None`` as well, the transformer
+ will be an identity transformer.
+
+ func : function, (default=None)
+ Function to apply to ``y`` before passing to ``fit``. Cannot be set at
+ the same time than ``transformer``. If ``None`` and ``transformer`` is
+ ``None`` as well, the function used will be the identity function.
+
+ inverse_func : function, (default=None)
Since None will lead to the identity function and that we don't enforce
func and inverse_func to be be actually the inverse of each other, I am
not sure that inverse_func should be required.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9041 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66KUL8Dsk0gJ8GD5sNskIppWzVzvks5sB9dggaJpZM4NywA7>
.
|
@jnothman I miss the what's new but I added some doc and address almost all comments. I am just unsure about |
sklearn/preprocessing/target.py
Outdated
self._fit_transformer(y_2d, sample_weight) | ||
self.regressor_ = clone(self.regressor) | ||
if sample_weight is not None: | ||
self.regressor_.fit(X, self.transformer_.transform(y_2d), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that we should really be using fit_transform to produce the downstream y here
fit_transform is mostly used for efficiency
…On 9 Jun 2017 10:14 am, "Guillaume Lemaitre" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> I miss the what's new but I added
some doc and address almost all comments. I am just unsure about fit +
transform vs fit_transform and there implications.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9041 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_rPW2YLTp-Io7H1I2ev5xFXULsmks5sCI5kgaJpZM4NywA7>
.
|
doc/whats_new.rst
Outdated
@@ -31,6 +31,10 @@ Changelog | |||
New features | |||
............ | |||
|
|||
- Added the :class:`sklearn.preprocessing.TransformedTargetRegressor` which | |||
is a meta-estimator to regress on a modified ``y``. :issue:`9041` by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make this more relatable, describe a small use case, maybe
"for example, to perform regression in log-space"
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
How about "andydoesntcryhimselftosleepatnightanymorebecausehefinallyhasthetoolshewants"? Or to generic? |
Then this PR will be invaded by a pink unicorn :) |
If glue is too generic, how about sklearn.adhesive? haha...
You nailed my problem with "glue". Glue is not something that I use to
connect tubes.
How about "tubes", "connectors".
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small things to test. And the naming/placement questions stand. Apart from which LGTM.
>>> def inverse_func(x): | ||
... return x | ||
>>> regr = TransformedTargetRegressor(regressor=regressor, | ||
... func=func, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not fixed
# non-negative and (ii) applying an exponential function to obtain non-linear | ||
# targets which cannot be fitted using a simple linear model. | ||
# | ||
# Therefore, a logarithmic and an exponential functions will be used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functions -> function
|
||
regr_trans = TransformedTargetRegressor( | ||
regressor=RidgeCV(), | ||
transformer=QuantileTransformer(output_distribution='normal')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn/preprocessing/target.py
Outdated
>>> from sklearn.linear_model import LinearRegression | ||
>>> from sklearn.preprocessing import TransformedTargetRegressor | ||
>>> tt = TransformedTargetRegressor(regressor=LinearRegression(), | ||
... func=np.log, inverse_func=np.exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
sklearn/preprocessing/target.py
Outdated
----- | ||
Internally, the target ``y`` is always converted into a 2-dimensional array | ||
to be used by scikit-learn transformers. At the time of prediction, the | ||
output will be reshaped to a have the same number of dimension as ``y``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*dimensions
regr = TransformedTargetRegressor(regressor=LinearRegression(), | ||
func=np.sqrt, inverse_func=np.log, | ||
check_inverse=False) | ||
# the transformer/functions are not checked to be invertible the fitting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop this comment, but would make it clearer by replacing the previous statement with regr.set_params(check_inverse=False)
y_tran = regr.transformer_.transform(y) | ||
assert_allclose(np.log(y), y_tran) | ||
assert_allclose(y, regr.transformer_.inverse_transform(y_tran)) | ||
assert_equal(y.shape, y_pred.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with pytest, we can just use a bare assert y.shape == y_pred.shape
, which I find much more legible.
assert_allclose(regr.regressor_.coef_, lr.coef_) | ||
|
||
|
||
def test_transform_target_regressor_1d_transformer_multioutput(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to see how similar or different the code is here from the previous test. Perhaps use a loop or or a check function or pytest.mark.parametrize
assert_allclose(regr.regressor_.coef_, lr.coef_) | ||
|
||
|
||
def test_transform_target_regressor_2d_transformer_multioutput(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here that it's hard to see how similar or different the tests are.
# check that the target ``y`` passed to the transformer will always be a | ||
# numpy array | ||
X, y = friedman | ||
tt = TransformedTargetRegressor(transformer=DummyTransformer(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please check similarly that the predictor receives X as a list? Thanks.
Done |
the real world example is not very convincing... though I'm not sure there's a better one with the data we have.... |
It's your +1 in the title, I think, or at least it's not mine... |
I think this is also just waiting on where to put in... |
The +1 is from Andy during the scikit learn sprint :D
|
I think it's fine where it is ;) |
We can always move before the release, I think delaying features for module naming bike-shedding will get us in trouble... |
I am ok with having it in preprocessing in the interest of moving
forward.
|
Let's do it! Thanks, @glemaitre. |
One less hack for my class! This is moving forward quite nicely lol. (PowerTransformer was another). Can we do KNN imputation, missing value features and ColumnTransformer next? Oh and blanced random forests (though actually imblearn has it now :)? I think then I'm good... just need to implement a decent time series library in python, or something... |
Add a new meta-regressor which transforms y for training.
@@ -77,6 +77,11 @@ Model evaluation | |||
- Added :class:`multioutput.RegressorChain` for multi-target | |||
regression. :issue:`9257` by :user:`Kumar Ashutosh <thechargedneutron>`. | |||
|
|||
- Added the :class:`preprocessing.TransformedTargetRegressor` which transforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason this had disappeared from what's new and I've just reinserted it :\
Continutation of #8988
TODO: