[MRG] Support unknown_value=np.nan in OrdinalEncoder #18406

NicolasHug · 2020-09-16T00:57:46Z

This PR adds support for unknown_value=np.nan in OrdinalEncoder.

(Parameter was introduced in #17406 by @FelixWick)

CC @thomasjpfan @ogrisel

thomasjpfan

Thank you for working on this!

sklearn/preprocessing/_encoders.py

…_nan

thomasjpfan

LGTM

thomasjpfan

LGTM

rth

Thanks @NicolasHug !

mayer79 · 2020-12-04T12:02:37Z

Excellent work. Just to clarify: Will the new options allow to both

keep np.nan as they are and
map new unknown categories to np.nan?

If yes, this will be super good news for fitting boosted trees!

NicolasHug · 2020-12-04T12:08:44Z

this PR supports 2 but 1 is still not supported. An error is raised when nans are present in the training data: it's unclear where to map them, as the output of OrdinalEncoder is supposed to be interpreted as ordered quantities.

NicolasHug · 2020-12-04T12:10:14Z

If yes, this will be super good news for fitting boosted trees

HistGradientBoostingClassifier and the correspoding regressor natively support both missing values (as nans) and categorical data now :)

https://scikit-learn.org/stable/modules/ensemble.html#histogram-based-gradient-boosting

mfeurer · 2020-12-04T13:03:02Z

It's great to see all the progress in handling categorical data with native scikit-learn components!

Is what @mayer79 asked in 1. what's proposed in #17123 ?

NicolasHug · 2020-12-04T13:08:01Z

yes, with 1 and 2 being inverted

mayer79 · 2020-12-04T13:34:42Z

@NicolasHug : Thx for clarifying. From a practical perspective, it is not desirable that remaining nans would raise an error. If my subsequent model algorithm cannot natively deal with nans, we can simply add an imputer after the encoder and voila.

FelixWick · 2020-12-05T18:43:12Z

@mayer79 Couldn’t you just run the encoder for not nans only to get the desired behavior?

support np.nan

b495815

NicolasHug mentioned this pull request Sep 16, 2020

ENH Add Categorical support for HistGradientBoosting #18394

Merged

github-actions bot added the module:preprocessing label Sep 16, 2020

thomasjpfan reviewed Sep 16, 2020

View reviewed changes

sklearn/preprocessing/_encoders.py Outdated Show resolved Hide resolved

sklearn/preprocessing/_encoders.py Outdated Show resolved Hide resolved

NicolasHug added 3 commits September 15, 2020 22:21

use is_scalar_nan

4fc2758

Merge branch 'master' of github.com:scikit-learn/scikit-learn into oe…

9e59dcd

…_nan

disallow object dtype

28bf0db

thomasjpfan approved these changes Sep 17, 2020

View reviewed changes

cmarmo added the Waiting for Reviewer label Sep 21, 2020

whatsnew

acea21c

rth approved these changes Sep 23, 2020

View reviewed changes

rth merged commit 4aada4e into scikit-learn:master Sep 23, 2020

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

ENH Support unknown_value=np.nan in OrdinalEncoder (scikit-learn#18406)

ed0b81a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Support unknown_value=np.nan in OrdinalEncoder #18406

[MRG] Support unknown_value=np.nan in OrdinalEncoder #18406

Uh oh!

NicolasHug commented Sep 16, 2020

Uh oh!

thomasjpfan left a comment

Uh oh!

Uh oh!

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan left a comment

Uh oh!

rth left a comment

Uh oh!

mayer79 commented Dec 4, 2020 •

edited

Loading

Uh oh!

NicolasHug commented Dec 4, 2020

Uh oh!

NicolasHug commented Dec 4, 2020

Uh oh!

mfeurer commented Dec 4, 2020

Uh oh!

NicolasHug commented Dec 4, 2020

Uh oh!

mayer79 commented Dec 4, 2020

Uh oh!

FelixWick commented Dec 5, 2020

Uh oh!

Uh oh!

Uh oh!

[MRG] Support unknown_value=np.nan in OrdinalEncoder #18406

[MRG] Support unknown_value=np.nan in OrdinalEncoder #18406

Uh oh!

Conversation

NicolasHug commented Sep 16, 2020

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

mayer79 commented Dec 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Dec 4, 2020

Uh oh!

NicolasHug commented Dec 4, 2020

Uh oh!

mfeurer commented Dec 4, 2020

Uh oh!

NicolasHug commented Dec 4, 2020

Uh oh!

mayer79 commented Dec 4, 2020

Uh oh!

FelixWick commented Dec 5, 2020

Uh oh!

Uh oh!

mayer79 commented Dec 4, 2020 •

edited

Loading