Skip to content

Please provide option to set unknown_values during test time to same as encoded min_frequency in OrdinalEncoder(Infrequent categories) #27629

@abhishek0093

Description

@abhishek0093

Describe the workflow you want to enable

It seems that OneHotEncoder has a parameter for setting handle_unknown='infrequent_if_exist' but the same is missing in OrdinalEncoder . Currently unknown_value and the value encoded by setting the parameter min_frequency seems to be different. There is always workaround to figure out the encoded value on min_frequency and pass the same to unknown_values but I think having something similar to OneHotEncoder's parameter handle_unknown='infrequent_if_exist' seems intuitive as we would want to treat unseen values as infrequent ones. Not sure if this feature already exists and I'm missing it somehow.

Describe your proposed solution

Implement parameter option similar to OneHotEncoder's parameter handle_unknown='infrequent_if_exist' where unknown (unseen values during training) get similar encoding as happened for infrequent_categories during training.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Discussion

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions