-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Added CategoricalEncoder class deprecating OneHotEncoder #6559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Added CategoricalEncoder class deprecating OneHotEncoder #6559
Conversation
@MechCoder @amueller |
All the tests pass now. I would be glad if someone could review this. |
Apologies if this comment is not helpful. Isn't the name |
I don't think it's "famous". I think most people don't know what it means. People from R are very confused that you need to do anything to work with categorical data. |
@rvraghav93 Can you do a review of the code ? |
sparse : boolean, default=True | ||
Will return sparse matrix if set True else will return an array. | ||
|
||
handle_unknown : str, 'error' or 'ignore' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default?
Somewhat along the lines of @rvraghav93 (and why reviewing the whole thing might not be so interesting until this is resolved): It seems as if this is functionally a superset of the current |
@jnothman Because the attributes of |
The point is that the vast majority of users of |
@jnothman What will those property getters return ? |
In the cases where the new functionality is identical to the former, they will provide |
Closing in favor of #6602 |
Following the discussion with @amueller and @MechCoder
New Features in
CategoricalEncoder
classes
parameter instead ofn_values
.LabelEncoder
instance for each column.Changes
_transform_selected
to_apply_selected
giving it the ability to optionally not return transformed values._apply_selected
can no longer accept lists. It has to be given anp.array
object. This is done because the input can be ofnp.object
type and it cannot be always cast as a whole tonp.int
ornp.float
type. The transformed and non-transformed parts of the array are converted to the specified type before returning.