-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
LabelBinarizer and LabelEncoder fit and transform signatures not compatible with Pipeline #3112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think you're right to fix that. On 26 April 2014 19:37, hxu notifications@github.com wrote:
|
In #3113 we have decided this is not to be fixed because label encoding doesn't really belong in a |
@jnothman, just to know: what should I be doing instead if I happen to need to vectorize a categorical feature in a pipeline? |
You might be best off writing your own |
Instead of using LabelBinarizer in a pipeline I just implemented my own transformer:
Seems to do the trick! edit: this is a better solution: |
I see that there have been a lot of negative reactions on this page. I think there has been a long misunderstanding of the purpose of LabelBinarizer and LabelEncoder. These are for targets, not features. Although admittedly they were designed (and poorly named) before my time. Although I think users could have been using CountVectorizer (or DictVectorizer with I hope this satisfies the needs of a clearly disgruntled populace. I must say that as someone who has been volunteering enormous quantities of free time for the development of this project for nearly five years now (and recently has been employed to work on it too), seeing the magnitude of negative reactions, rather than constructive contributions to the library is quite saddening. Although admittedly my response above that you should write a new Pipeline-like thing, rather than a new transformer for categorical inputs was a misunderstanding on my part (and should/could have been corrected by others), which I hope is understandable while working through the enormous workload that is maintaining this project. |
I get this error when I try to use
LabelBinarizer
andLabelEncoder
in a Pipeline:It seems like this is because the classes'
fit
andtransform
signatures are different from most other estimators and only accept a single argument.I think this is a pretty easy fix (just change the signature to
def(self, X, y=None)
) that I'd be happy to send a pull request for, but I wanted to check if there were any other reasons that the signatures are the way they are that I didn't think of.The text was updated successfully, but these errors were encountered: