[MRG] Fixes get_feature_names results when using drop functionality #13894

jamesmyatt · 2019-05-16T14:39:18Z

The new "drop" functionality in OneHotEncoder from v0.21 is not taken into account in the "get_feature_names" method. This fixes that problem.

jeremiedbb

I agree this behavior is better. LGTM.

NicolasHug · 2019-05-16T16:12:27Z

sklearn/preprocessing/tests/test_encoders.py

@@ -590,6 +590,25 @@ def test_one_hot_encoder_feature_names_unicode():
    assert_array_equal(['n👍me_c❤t1', 'n👍me_dat2'], feature_names)


+def test_one_hot_encoder_feature_names_drop():
+    # Assume that this is OK for manual drop, if OK for first


I'd rather not assume this ;)

I suspected not 😏

maybe you can parametrize this with something like

@pytest.mark.parametrize( 'drop, expected_feature_names', [ ('first', [...]), (['Female', 41, 'girl', 10], [...]) ])

I think that the "expected_feature_names" might get too long.

I have implemented your suggestion @NicolasHug. Are you able to give your approval so we can add to the release?

jamesmyatt · 2019-05-16T20:34:24Z

sklearn/preprocessing/tests/test_encoders.py

+    expected_names = list(ohe_base.get_feature_names())
+    for i, cat_drop in enumerate(drop_cats):
+        feat_drop = "x{}_{}".format(i, cat_drop)
+        expected_names.remove(feat_drop)


This is a little bit involved. Seems like a good way to construct the expected_names without just reimplementing it. It's also robust to any changes in basic ordering.

I don't think you benefit from this logic. Simpler (and more obviously correct) just to have the expected list of feature names as input to the function.

Fair enough. I think you're probably right. I don't think I realised how much I'd simplified the test case anyway such that the expected names are now quite short.

jnothman

Thanks!

jnothman · 2019-05-21T11:10:00Z

sklearn/preprocessing/tests/test_encoders.py

+    expected_names = list(ohe_base.get_feature_names())
+    for i, cat_drop in enumerate(drop_cats):
+        feat_drop = "x{}_{}".format(i, cat_drop)
+        expected_names.remove(feat_drop)


I don't think you benefit from this logic. Simpler (and more obviously correct) just to have the expected list of feature names as input to the function.

jnothman · 2019-05-21T11:10:29Z

Please add an entry to the change log at doc/whats_new/v0.21.rst under Version 0.21.1. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

thomasjpfan · 2019-05-21T11:45:29Z

We are trying to move from using :issue: to :pr: (although they both work)

jnothman · 2019-05-21T11:48:52Z

Yes, I need to update my boilerplate message, sorry

jnothman · 2019-05-21T23:24:06Z

And fwiw, @jamesmyatt, I'd like to include this in the 0.21.2 release which might occur as soon as Friday to ship a fix for a severe and hidden bug in euclidean distances.

jnothman · 2019-05-23T00:19:19Z

@jamesmyatt, I'll wrap this up for you so it can be put into the release

jamesmyatt · 2019-05-23T09:16:32Z

@jnothman , Thanks for updating this. I agree it's better like this. I don't think I'd realised how much simpler the expected names are now than they were before.

Is there anything else that you need from me to get this merged? The CI failures look unrelated.

jnothman · 2019-05-23T09:27:26Z

No I'm waiting on another core developer to give their approval

doc/whats_new/v0.21.rst

ogrisel · 2019-05-23T12:40:26Z

Thank you @jamesmyatt. @jnothman I let you backport this as part of the 0.21.2 release PR.

…cikit-learn#13894)

Fix get_feature_names results when using drop functionality

d08efd1

jeremiedbb approved these changes May 16, 2019

View reviewed changes

NicolasHug reviewed May 16, 2019

View reviewed changes

Test names for all drop methods

15d1eaf

jamesmyatt commented May 16, 2019

View reviewed changes

jamesmyatt changed the title ~~Fixes get_feature_names results when using drop functionality~~ [MRG] Fixes get_feature_names results when using drop functionality May 16, 2019

jnothman added this to the 0.21.2 milestone May 21, 2019

jnothman reviewed May 21, 2019

View reviewed changes

jnothman added the Bug label May 21, 2019

jnothman added 3 commits May 23, 2019 10:23

TST Make expected_names a parameter

fadc4de

Merge branch 'master' into HEAD

f92cf9e

DOC what's new

72f7369

jnothman approved these changes May 23, 2019

View reviewed changes

jnothman mentioned this pull request May 23, 2019

Release 0.21.2 #13915

Merged

NicolasHug approved these changes May 23, 2019

View reviewed changes

doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved

ogrisel approved these changes May 23, 2019

View reviewed changes

[ci skip] fix formatting

8d9eb61

ogrisel merged commit e35f040 into scikit-learn:master May 23, 2019

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request May 23, 2019

[MRG] Fixes get_feature_names results when using drop functionality (s…

45ab4c6

…cikit-learn#13894)

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

[MRG] Fixes get_feature_names results when using drop functionality (s…

94347fb

…cikit-learn#13894)

Uh oh!

[MRG] Fixes get_feature_names results when using drop functionality #13894

[MRG] Fixes get_feature_names results when using drop functionality #13894

Uh oh!

Conversation

jamesmyatt commented May 16, 2019

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesmyatt May 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented May 21, 2019

Uh oh!

thomasjpfan commented May 21, 2019

Uh oh!

jnothman commented May 21, 2019 via email

Uh oh!

jnothman commented May 21, 2019

Uh oh!

jnothman commented May 23, 2019

Uh oh!

jamesmyatt commented May 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented May 23, 2019 via email

Uh oh!

Uh oh!

ogrisel commented May 23, 2019

Uh oh!

Uh oh!

jamesmyatt May 23, 2019 •

edited

Loading

jamesmyatt commented May 23, 2019 •

edited

Loading