Closed
Description
Describe the workflow you want to enable
I want to be able to limit the number of output levels from the OrdinalEncoder and also have a new "unknown_value" mapped to this same encoded level. This is motivated in the following bug report
Describe your proposed solution
The functionality should work the same as in OneHotEncoder for consistency
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
There are three components:
- max_categories parameter
- min_frequency parameter
- Option for ‘infrequent_if_exist’ in handle_unknown parameter
Describe alternatives you've considered, if relevant
I currently do this before the encoding but it is limited by lack of integration with handle_unknown
Additional context
No response