transform should not accept y #8174

jnothman · 2017-01-08T12:26:55Z

I suspect that whenever y=None is included in the transform method signature (after mandatory X), this is done in error. The y parameter should be deprecated and eventually removed. It only confuses the user about the isolation of y from transformation.

The text was updated successfully, but these errors were encountered:

jnothman · 2017-01-08T12:30:46Z

$ git grep def.transform.*y=None
sklearn/cluster/birch.py:    def transform(self, X, y=None):
sklearn/cluster/k_means_.py:    def transform(self, X, y=None):
sklearn/decomposition/base.py:    def transform(self, X, y=None):
sklearn/decomposition/dict_learning.py:    def transform(self, X, y=None):
sklearn/decomposition/fastica_.py:    def transform(self, X, y=None, copy=True):
sklearn/decomposition/pca.py:    def transform(self, X, y=None):
sklearn/feature_extraction/dict_vectorizer.py:    def transform(self, X, y=None):
sklearn/feature_extraction/hashing.py:    def transform(self, raw_X, y=None):
sklearn/feature_extraction/text.py:    def transform(self, X, y=None):
sklearn/kernel_approximation.py:    def transform(self, X, y=None):
sklearn/kernel_approximation.py:    def transform(self, X, y=None):
sklearn/kernel_approximation.py:    def transform(self, X, y=None):
sklearn/neighbors/approximate.py:    def transform(self, X, y=None):
sklearn/preprocessing/_function_transformer.py:    def transform(self, X, y=None):
sklearn/preprocessing/data.py:    def transform(self, X, y=None, copy=None):
sklearn/preprocessing/data.py:    def transform(self, X, y=None):
sklearn/preprocessing/data.py:    def transform(self, X, y=None):
sklearn/preprocessing/data.py:    def transform(self, X, y=None):
sklearn/preprocessing/data.py:    def transform(self, X, y=None, copy=None):
sklearn/preprocessing/data.py:    def transform(self, X, y=None, copy=None):
sklearn/preprocessing/data.py:    def transform(self, K, y=None, copy=True):
sklearn/preprocessing/data.py:    def transform(self, X, y=None):
sklearn/random_projection.py:    def transform(self, X, y=None):
sklearn/tests/test_base.py:        def transform(self, X, y=None):
sklearn/tests/test_pipeline.py:    def transform(self, X, y=None):
sklearn/tests/test_pipeline.py:    def transform(self, X, y=None):

Akshay0724 · 2017-01-08T13:59:56Z

Hello @jnothman, I'd like to work on this issue.

Are you talking about modification like this--

def transform(self,X,y=None):
if y is not None:
warning.warn("Parameter y is Deprecated for this method as it is not required and will be removed in future version")

jnothman · 2017-01-08T21:39:33Z

* Please check at the same time that it is not actually used, and note when it is so we can review. * Make y='deprecated' by default so that you can warn when it is explicitly set to None. * Will be removed in version 0.21 to be specific.

…

On 9 January 2017 at 00:59, akshay0724 ***@***.***> wrote: Hello ***@***.*** <https://github.com/jnothman>*, I'd like to work on this issue. In every desired place we should make following changes-- def transform(self,X,y=None): if y is not None: warning.warn("Parameter y is Deprecated as it is not required and will be removed in future version") — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8174 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz61p2EJidBUHWxTbGZEMs5xl282IJks5rQOvdgaJpZM4LdrVG> .

jnothman · 2017-01-08T21:42:00Z

It looks like someone has beaten you to it in #8177

…

On 9 January 2017 at 08:39, Joel Nothman ***@***.***> wrote: * Please check at the same time that it is not actually used, and note when it is so we can review. * Make y='deprecated' by default so that you can warn when it is explicitly set to None. * Will be removed in version 0.21 to be specific. On 9 January 2017 at 00:59, akshay0724 ***@***.***> wrote: > Hello ***@***.*** <https://github.com/jnothman>*, I'd like to work on > this issue. > > In every desired place we should make following changes-- > > def transform(self,X,y=None): > if y is not None: > warning.warn("Parameter y is Deprecated as it is not required and will be > removed in future version") > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#8174 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAEz61p2EJidBUHWxTbGZEMs5xl282IJks5rQOvdgaJpZM4LdrVG> > . >

GaelVaroquaux · 2017-01-08T21:42:44Z

Are we sure that we will never have transformers that use y in their transforms? I can quite clearly see usecases for that. Given that, I think that I am -1 on this.

jnothman · 2017-01-08T21:50:12Z

@GaelVaroquaux please clarify examples. IMO:

I can't think of any case y is used in transform atm
It is certainly not forwarded by Pipeline or FeatureUnion
We have inconsistency as to whether y=None is included in transform's signature
But I think including it in transform's signature when it is generally unused confuses users and people writing their own transformers.
The documentation has transform with a single positional arg as the prototype.

jnothman · 2017-01-08T22:20:41Z

PLS is the current exception, but there Y means something quite different to y elsewhere.

…

On 9 January 2017 at 08:42, Gael Varoquaux ***@***.***> wrote: Are we sure that we will never have transformers that use y in their transforms? I can quite clearly see usecases for that. Given that, I think that I am -1 on this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8174 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz68WwO4HmUrgSnSjDfN7J5iGtptW8ks5rQVhVgaJpZM4LdrVG> .

Akshay0724 · 2017-01-09T06:09:03Z

Hello @jnothman, I think it is authentic before starting working on a PR is to ask weather some one else is working on or not. I had started working on this issue and was going to make a PR, in the mean time @tzano directly make a PR.

jnothman · 2017-01-09T06:57:31Z

Yes, I don't think it's great that someone else came in in the meantime, but given the short timeframe it's hard to avoid if you don't make an initial "WIP" pull request. Sorry about that.

…

On 9 Jan 2017 5:09 pm, "akshay0724" ***@***.***> wrote: Hello ***@***.*** <https://github.com/jnothman>*, I think it is authentic before starting working on a PR is to ask weather some one else is working on or not. I had started working on this issue and was going to make a PR, in the mean time ***@***.*** <https://github.com/tzano>* directly make a PR. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8174 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz64oqj_ngkjXf-1zLQEO3z8rocJjAks5rQc8AgaJpZM4LdrVG> .

Akshay0724 · 2017-02-17T11:37:50Z

Is work is still do be done in this issue.

jnothman · 2017-02-19T12:47:58Z

There isn't clear consensus on whether this is desirable. @GaelVaroquaux objected to my suggestion that y is not usually useful and is potentially confusing to the user. I'm still not sure I understand his argument.

…

On 17 February 2017 at 22:37, akshay0724 ***@***.***> wrote: Is work is still do be done in this issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8174 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66WV4QXP2Unlx-0EW35HeSL3R7P6ks5rdYaPgaJpZM4LdrVG> .

GaelVaroquaux · 2017-02-19T17:16:01Z

Interestingly, I just realize that this discussion is somewhat tied to the transform changing the number of samples. In the way that we use transforms we cannot accept y, as transform is used at predict time. So, I agree that transform cannot accept y. If we were to have a pipeline that change the number of samples, we might have to define different transforms on train and test. The train one would need to accept.

jnothman · 2017-02-19T22:31:44Z

I'm glad that we have an agreed understanding! is that all we need? a separate method that defines a resampler (fit_resample?) which a pipeline applies to training data while doing nothing to test?

…

On 20 Feb 2017 4:16 am, "Gael Varoquaux" ***@***.***> wrote: Interestingly, I just realize that this discussion is somewhat tied to the transform changing the number of samples. In the way that we use transforms we cannot accept y, as transform is used at predict time. So, I agree that transform cannot accept y. If we were to have a pipeline that change the number of samples, we might have to define different transforms on train and test. The train one would need to accept. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8174 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz64xqHsX8Z9688ogsrnvNOCseBYUoks5reHjSgaJpZM4LdrVG> .

amueller · 2017-03-03T22:03:46Z

I argued for this before, but @GaelVaroquaux wanted to wait on a ruling on what we're gonna do in terms of a new interface before deprecating anything. I'm still totally for it.

jnothman · 2017-03-04T20:57:14Z

I think we've ruled out an interface that transforms y with the same method name.

…

On 4 March 2017 at 09:03, Andreas Mueller ***@***.***> wrote: I argued for this before, but @GaelVaroquaux <https://github.com/GaelVaroquaux> wanted to wait on a ruling on what we're gonna do in terms of a new interface before deprecating anything. I'm still totally for it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8174 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz65SQargBJsp2r3eRxgItCrvtqGkJks5riI5DgaJpZM4LdrVG> .

jnothman · 2018-01-14T21:49:47Z

Fixed

matlotpib · 2019-05-09T15:02:56Z

@jnothman
@GaelVaroquaux

"I can't think of any case y is used in transform atm"

"So, I agree that transform cannot accept y."

I have a use case for a transform accepting 'y'.

During the training phase:

Fit(X,y):

for a given feature X['feature'], fit model that predicts that feature based on 'y' (and maybe other features)

Transform(X,y):

if an instance of the given feature is determined to be an outlier, replace it with the value predicted by the model created in 'fit' (which requires 'y').

...Then, during the predict phase:

Transform(X,y==None):

there is no 'y', so do nothing

To summarise: during training, you use a model based on 'y' to predict 'x', and use it to replace outliers in X.
This may allow you to train a more reasonable classifier.

Then, during predict, you do nothing in the transform, since 'y' is what you are predicting.

For this, you want the option of being able to pass in 'y' during the training phase, but not passing it in during the predict phase.

Edit: but maybe there is a better way of doing this? I just wanted everything to be in the pipeline, rather than doing this step separately.

jnothman · 2019-05-21T00:12:45Z

@matlotpib isn't this what fit_transform allows for? Or am I missing something?

DavidMertz · 2019-06-20T00:57:01Z

I am using FunctionTransformer for the first time, and I absolutely need y for my purposes. What I wish to do is remove "non-modal" (i.e. atypical) target values. I have a training set where I can assume that these uncommon target values are bad data. Only one of the mode values should be permitted for a particular combination of features (by default the identity of a person, but the interface allows selecting other fields to define "equivalent" persons).

def _modal_filter(X, y, fields='_member_key'):
    df = X.copy()
    df['TARGET'] = y
    df = modal_filter(df, fields)
    df.drop([c for c in df.columns if c.startswith(('TARGET', '_'))], 
            axis=1, inplace=True)
    return df

def ModalFilter(fields):
    fn = partial(_modal_filter, fields=fields)
    return FunctionTransformer(fn, validate=False, pass_y=True)

If I cannot pass in y, there does not seem to be any way to accommodate this in a pipeline. You can see that the underlying function modal_filter() works fine. but accepts a DataFrame that has a TARGET column in it. This is a good function, but cannot be pipelined directly (as far as I can see)

jnothman · 2019-06-20T01:12:45Z

So you're actually interested in removing samples at training time, and using the full dataset at test time when the target is not observed? This is not the relevant issue. #9630 <#9630> is closer to the mark. See also imbalanced-learn.

mhorbach-tibco · 2022-06-09T21:21:23Z

y is needed in this case: you want to make a FunctionTransformer that either calls CountEncoder, or TargetEncoder, the choice is passed in with method='target_encoder' (default). Both encoders are from catagory_encoders. TargetEncoder encodes X but not y. It just uses y to do the encoding.

jnothman added the Need Contributor label Jan 8, 2017

tzano mentioned this issue Jan 12, 2017

[MRG] Fixes #8174: deprecation warning #8177

Closed

tzano mentioned this issue Feb 19, 2017

[MRG+1] Deprecate Y parameter on transform() #8403

Closed

amueller removed the Need Contributor label Mar 3, 2017

amueller mentioned this issue Jun 16, 2017

[WIP] we do not transform y #9075

Closed

jnothman closed this as completed Jan 14, 2018

jmmcd mentioned this issue Dec 4, 2023

Remove "filtering" from the description of Transformer API #27900

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transform should not accept y #8174

transform should not accept y #8174

jnothman commented Jan 8, 2017

jnothman commented Jan 8, 2017

Akshay0724 commented Jan 8, 2017 •

edited

Loading

jnothman commented Jan 8, 2017 via email

jnothman commented Jan 8, 2017 via email

GaelVaroquaux commented Jan 8, 2017 via email

jnothman commented Jan 8, 2017

jnothman commented Jan 8, 2017 via email

Akshay0724 commented Jan 9, 2017

jnothman commented Jan 9, 2017 via email

Akshay0724 commented Feb 17, 2017

jnothman commented Feb 19, 2017 via email

GaelVaroquaux commented Feb 19, 2017 via email

jnothman commented Feb 19, 2017 via email

amueller commented Mar 3, 2017

jnothman commented Mar 4, 2017 via email

jnothman commented Jan 14, 2018

matlotpib commented May 9, 2019 •

edited

Loading

jnothman commented May 21, 2019

DavidMertz commented Jun 20, 2019

jnothman commented Jun 20, 2019 via email

mhorbach-tibco commented Jun 9, 2022 •

edited

Loading

transform should not accept y #8174

transform should not accept y #8174

Comments

jnothman commented Jan 8, 2017

jnothman commented Jan 8, 2017

Akshay0724 commented Jan 8, 2017 • edited Loading

jnothman commented Jan 8, 2017 via email

jnothman commented Jan 8, 2017 via email

GaelVaroquaux commented Jan 8, 2017 via email

jnothman commented Jan 8, 2017

jnothman commented Jan 8, 2017 via email

Akshay0724 commented Jan 9, 2017

jnothman commented Jan 9, 2017 via email

Akshay0724 commented Feb 17, 2017

jnothman commented Feb 19, 2017 via email

GaelVaroquaux commented Feb 19, 2017 via email

jnothman commented Feb 19, 2017 via email

amueller commented Mar 3, 2017

jnothman commented Mar 4, 2017 via email

jnothman commented Jan 14, 2018

matlotpib commented May 9, 2019 • edited Loading

jnothman commented May 21, 2019

DavidMertz commented Jun 20, 2019

jnothman commented Jun 20, 2019 via email

mhorbach-tibco commented Jun 9, 2022 • edited Loading

Akshay0724 commented Jan 8, 2017 •

edited

Loading

matlotpib commented May 9, 2019 •

edited

Loading

mhorbach-tibco commented Jun 9, 2022 •

edited

Loading