[MRG] Fix incorrect error when OneHotEncoder.transform called prior to fit #12443

dillongardner · 2018-10-23T17:02:13Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Fix checks for categories_ attribute when transform is called.

The edge case when class is instantiated with categorical_features set to all False, _legacy_mode is set to True but _legacy_fit_transform never gets called. To fix this, explicitly tested for this and set categories_ to an empty list. This fix also allows get_feature_names to return an empty array.

import numpy as np
from sklearn.preprocessing import OneHotEncoder

X = np.array([[3, 2, 1], [0, 1, 1]])

# Edge case: all non-categorical
cat = [False, False, False]
enc = OneHotEncoder(categorical_features=cat)
enc.fit(X)
print('Categories are {}'.format(enc.categories_))
print('Feature names are {}'.format(enc.get_feature_names()))

Categories are []
Feature names are []

Any other comments?

rth

Please add a non regression test and rename the title to a summary of what this PR does.

rth · 2018-10-23T17:48:24Z

sklearn/preprocessing/_encoders.py

+                n_features = X.shape[1]
+                sel = np.zeros(n_features, dtype=bool)
+                sel[np.asarray(self.categorical_features)] = True
+                if sum(sel) == 0:


I have not looked at the code in detail, but isn't this equivalent to,

not self.categorical_features.any()

or something similar ?

categorical_features can be a array of indices or a mask. This works in either case. I followed the pattern in sklearn.preprocessing.base._transform_selected

Can you add a comment above this code block to explain the specific case that this code is checking for?

amueller · 2018-10-23T22:02:56Z

the failure is because you don't ignore the FutureWarning, I think.

sklearn/preprocessing/_encoders.py

jnothman · 2018-10-28T07:05:27Z

sklearn/preprocessing/tests/test_encoders.py

+        X_tr = enc.fit_transform(X)
+    expected_features = np.array(list(), dtype='object')
+    assert_array_equal(X, X_tr)
+    assert_equal(enc.categories_, list())


With the adoption of pytest, we are phasing out use of test helpers assert_equal, assert_true, etc. Please use bare assert statements, e.g. assert x == y, assert not x, etc.

Is this true for assert_array_equal? Is it preferable to do assert np.array_equal(x, y)

jorisvandenbossche

Add a note in the whatsnew file?

jorisvandenbossche · 2018-10-29T21:16:27Z

sklearn/preprocessing/_encoders.py

+                n_features = X.shape[1]
+                sel = np.zeros(n_features, dtype=bool)
+                sel[np.asarray(self.categorical_features)] = True
+                if sum(sel) == 0:


Can you add a comment above this code block to explain the specific case that this code is checking for?

amueller · 2018-11-12T16:14:33Z

this looks good. I'm not sure if we need a whatsnew for a better error message. I'm fine with doing it this way for the 0.20.1 release and we can to a no-fitting transformation later. Wdyt? @jnothman @ogrisel ?

jnothman · 2018-11-12T22:49:59Z

Thanks @dillongardner

…ybutton * upstream/master: Fix max_depth overshoot in BFS expansion of trees (scikit-learn#12344) TST don't test utils.fixes docstrings (scikit-learn#12576) DOC Fix typo (scikit-learn#12563) FIX Workaround limitation of cloudpickle under PyPy (scikit-learn#12566) MNT bare asserts (scikit-learn#12571) FIX incorrect error when OneHotEncoder.transform called prior to fit (scikit-learn#12443)

…cikit-learn#12443)

…ikit-learn into add_codeblock_copybutton * 'add_codeblock_copybutton' of https://github.com/thoo/scikit-learn: Move an extension under sphinx_copybutton/ Move css/js file under sphinxext/ Fix max_depth overshoot in BFS expansion of trees (scikit-learn#12344) TST don't test utils.fixes docstrings (scikit-learn#12576) DOC Fix typo (scikit-learn#12563) FIX Workaround limitation of cloudpickle under PyPy (scikit-learn#12566) MNT bare asserts (scikit-learn#12571) FIX incorrect error when OneHotEncoder.transform called prior to fit (scikit-learn#12443) Retrigger travis:max time limit error DOC: Clarify `cv` parameter description in `GridSearchCV` (scikit-learn#12495) FIX remove FutureWarning in _object_dtype_isnan and add test (scikit-learn#12567) DOC Add 's' to "correspond" in docs for Hamming Loss. (scikit-learn#12565) EXA Fix comment in plot-iris-logistic example (scikit-learn#12564) FIX stop words validation in text vectorizers with custom preprocessors / tokenizers (scikit-learn#12393) DOC Add skorch to related projects (scikit-learn#12561) MNT Don't change self.n_values in OneHotEncoder.fit (scikit-learn#12286) MNT Remove unused assert_true imports (scikit-learn#12560) TST autoreplace assert_true(...==...) with plain assert (scikit-learn#12547) DOC: add a testimonial from JP Morgan (scikit-learn#12555)

…cikit-learn#12443)

arnau126 · 2018-11-26T10:30:45Z

This change breaks transform in _legacy_mode. It raises NotFittedError when it's actually fitted. This is because check_is_fitted checks categories_ when _legacy_transform doesn't use it at all.

In _legacy_mode we should check _feature_indices_, _n_values_ and _active_features_ instead.

I propose:

def transform(self, X):
    if self._legacy_mode:
        check_is_fitted(self, ('_feature_indices_', '_n_values_', '_active_features_'))
        return _transform_selected(X, self._legacy_transform, self.dtype,
                                                    self._categorical_features, copy=True)
    else:
        check_is_fitted(self, 'categories_')
        return self._transform_new(X)

…cikit-learn#12443)

… to fit (scikit-learn#12443)" This reverts commit 6d389ba.

…cikit-learn#12443)

dillongardner added 2 commits October 21, 2018 15:42

Check if fitted prior to transform

b7d94e4

Fix edge case of no categorical

fc326f5

rth reviewed Oct 23, 2018

View reviewed changes

dillongardner changed the title ~~[MRG] Fix #12395~~ [MRG] Fix incorrect error when OneHotEncoder.transform called prior to fit Oct 23, 2018

Added tests

7e68098

Add deprecation

b6dae0c

jnothman reviewed Oct 28, 2018

View reviewed changes

Addressed style issues

79c08ae

jorisvandenbossche approved these changes Oct 29, 2018

View reviewed changes

Added comment

b825212

amueller approved these changes Nov 12, 2018

View reviewed changes

jnothman merged commit 2afee93 into scikit-learn:master Nov 12, 2018

thoo pushed a commit to thoo/scikit-learn that referenced this pull request Nov 13, 2018

FIX incorrect error when OneHotEncoder.transform called prior to fit (s…

509604f

…cikit-learn#12443)

thoo pushed a commit to thoo/scikit-learn that referenced this pull request Nov 14, 2018

FIX incorrect error when OneHotEncoder.transform called prior to fit (s…

dc90014

…cikit-learn#12443)

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 14, 2018

FIX incorrect error when OneHotEncoder.transform called prior to fit (s…

368d3ca

…cikit-learn#12443)

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 14, 2018

FIX incorrect error when OneHotEncoder.transform called prior to fit (s…

b519452

…cikit-learn#12443)

jnothman mentioned this pull request Nov 26, 2018

False alarm NotFittedError in transform for legacy OneHotEncoder #12680

Closed

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

FIX incorrect error when OneHotEncoder.transform called prior to fit (s…

6d389ba

…cikit-learn#12443)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX incorrect error when OneHotEncoder.transform called prior…

7edaffa

… to fit (scikit-learn#12443)" This reverts commit 6d389ba.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX incorrect error when OneHotEncoder.transform called prior…

1387a03

… to fit (scikit-learn#12443)" This reverts commit 6d389ba.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FIX incorrect error when OneHotEncoder.transform called prior to fit (s…

4563f63

…cikit-learn#12443)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Fix incorrect error when OneHotEncoder.transform called prior to fit #12443

[MRG] Fix incorrect error when OneHotEncoder.transform called prior to fit #12443

Uh oh!

dillongardner commented Oct 23, 2018 •

edited

Loading

Uh oh!

rth left a comment

Uh oh!

rth Oct 23, 2018

Uh oh!

dillongardner Oct 23, 2018

Uh oh!

jorisvandenbossche Oct 29, 2018

Uh oh!

amueller commented Oct 23, 2018

Uh oh!

Uh oh!

Uh oh!

jnothman Oct 28, 2018

Uh oh!

dillongardner Oct 28, 2018

Uh oh!

jorisvandenbossche left a comment

Uh oh!

jorisvandenbossche Oct 29, 2018

Uh oh!

amueller commented Nov 12, 2018

Uh oh!

jnothman commented Nov 12, 2018

Uh oh!

arnau126 commented Nov 26, 2018

Uh oh!

Uh oh!

Uh oh!

[MRG] Fix incorrect error when OneHotEncoder.transform called prior to fit #12443

[MRG] Fix incorrect error when OneHotEncoder.transform called prior to fit #12443

Uh oh!

Conversation

dillongardner commented Oct 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

rth Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

dillongardner Oct 23, 2018

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Oct 29, 2018

Choose a reason for hiding this comment

Uh oh!

amueller commented Oct 23, 2018

Uh oh!

Uh oh!

Uh oh!

jnothman Oct 28, 2018

Choose a reason for hiding this comment

Uh oh!

dillongardner Oct 28, 2018

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Oct 29, 2018

Choose a reason for hiding this comment

Uh oh!

amueller commented Nov 12, 2018

Uh oh!

jnothman commented Nov 12, 2018

Uh oh!

arnau126 commented Nov 26, 2018

Uh oh!

Uh oh!

dillongardner commented Oct 23, 2018 •

edited

Loading