Skip to content

Suggest to use SimpleImputer for str categorical data with None values #16739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Mar 21, 2020

This is a follow up to #16713 to better tackle #16702 and #16703.

I found the this specific case when loading the Ames housing dataset from fetch_openml:

X, y = fetch_openml("house_prices", as_frame=True, return_X_y=True)

Note that #11996 also discusses how to natively handle missing values in categorical encoders which could be an alternative to this PR, assuming we support both None and np.nan as missing value markers in object-dtyped columns.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to raise this error for LabelEncoder?

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
le.fit_transform([None, 'a', 'b', 'a'])

@ogrisel
Copy link
Member Author

ogrisel commented May 2, 2020

Indeed, displaying this message in the context of LabelEncoder is weird.

Base automatically changed from master to main January 22, 2021 10:52
@thomasjpfan
Copy link
Member

Now that we have #19069 and #17317 merged this is less of an issue?

OneHotEncoder encodes None and np.nan separately. OrdinalEncoder passes np.nan through and encodes None. (OrdinalEncoder encoding of None was on main before #19069)

@ogrisel
Copy link
Member Author

ogrisel commented Apr 13, 2021

Maybe yes. Let's close.

@ogrisel ogrisel closed this Apr 13, 2021
@ogrisel ogrisel deleted the more-helpful-error-message-encoders branch April 13, 2021 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants