Skip to content

[MRG] add drop_first option to OneHotEncoder #12884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Dec 28, 2018

Reference Issues/PRs

Closes #6488

What does this implement/fix? Explain your changes.

This PR adds a drop_first option to OneHotEncoder.
Each feature is encoded into n_unique_values - 1 columns instead of n_unique_values columns. The first one is dropped, resulting in all of the others being zero.

Any other comments?

This is incompatible with handle_missing='ignore' because the ignored missing categories result in all of the one-hot columns being zeros, which is also how the first category is treated when drop_first=True. So by allowing both, there would be no way to distinguish between a missing category and the first one.

@NicolasHug NicolasHug changed the title [WIP] add drop_first option to OneHotEncoder [MRG] add drop_first option to OneHotEncoder Dec 29, 2018
@NicolasHug
Copy link
Member Author

Note to reviewers: #12908 is more general so maybe review this one instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OneHotEncoder - add option for 1 of k-1 encoding
1 participant