-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fix incorrect error when OneHotEncoder.transform called prior to fit #12443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a non regression test and rename the title to a summary of what this PR does.
n_features = X.shape[1] | ||
sel = np.zeros(n_features, dtype=bool) | ||
sel[np.asarray(self.categorical_features)] = True | ||
if sum(sel) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not looked at the code in detail, but isn't this equivalent to,
not self.categorical_features.any()
or something similar ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
categorical_features
can be a array of indices or a mask. This works in either case. I followed the pattern in sklearn.preprocessing.base._transform_selected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment above this code block to explain the specific case that this code is checking for?
the failure is because you don't ignore the FutureWarning, I think. |
X_tr = enc.fit_transform(X) | ||
expected_features = np.array(list(), dtype='object') | ||
assert_array_equal(X, X_tr) | ||
assert_equal(enc.categories_, list()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the adoption of pytest, we are phasing out use of test helpers assert_equal
, assert_true
, etc. Please use bare assert
statements, e.g. assert x == y
, assert not x
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true for assert_array_equal
? Is it preferable to do assert np.array_equal(x, y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note in the whatsnew file?
n_features = X.shape[1] | ||
sel = np.zeros(n_features, dtype=bool) | ||
sel[np.asarray(self.categorical_features)] = True | ||
if sum(sel) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment above this code block to explain the specific case that this code is checking for?
Thanks @dillongardner |
…ybutton * upstream/master: Fix max_depth overshoot in BFS expansion of trees (scikit-learn#12344) TST don't test utils.fixes docstrings (scikit-learn#12576) DOC Fix typo (scikit-learn#12563) FIX Workaround limitation of cloudpickle under PyPy (scikit-learn#12566) MNT bare asserts (scikit-learn#12571) FIX incorrect error when OneHotEncoder.transform called prior to fit (scikit-learn#12443)
…ikit-learn into add_codeblock_copybutton * 'add_codeblock_copybutton' of https://github.com/thoo/scikit-learn: Move an extension under sphinx_copybutton/ Move css/js file under sphinxext/ Fix max_depth overshoot in BFS expansion of trees (scikit-learn#12344) TST don't test utils.fixes docstrings (scikit-learn#12576) DOC Fix typo (scikit-learn#12563) FIX Workaround limitation of cloudpickle under PyPy (scikit-learn#12566) MNT bare asserts (scikit-learn#12571) FIX incorrect error when OneHotEncoder.transform called prior to fit (scikit-learn#12443) Retrigger travis:max time limit error DOC: Clarify `cv` parameter description in `GridSearchCV` (scikit-learn#12495) FIX remove FutureWarning in _object_dtype_isnan and add test (scikit-learn#12567) DOC Add 's' to "correspond" in docs for Hamming Loss. (scikit-learn#12565) EXA Fix comment in plot-iris-logistic example (scikit-learn#12564) FIX stop words validation in text vectorizers with custom preprocessors / tokenizers (scikit-learn#12393) DOC Add skorch to related projects (scikit-learn#12561) MNT Don't change self.n_values in OneHotEncoder.fit (scikit-learn#12286) MNT Remove unused assert_true imports (scikit-learn#12560) TST autoreplace assert_true(...==...) with plain assert (scikit-learn#12547) DOC: add a testimonial from JP Morgan (scikit-learn#12555)
This change breaks In I propose:
|
… to fit (scikit-learn#12443)" This reverts commit 6d389ba.
… to fit (scikit-learn#12443)" This reverts commit 6d389ba.
Reference Issues/PRs
Fixes #12395
What does this implement/fix? Explain your changes.
Fix checks for
categories_
attribute when transform is called.The edge case when class is instantiated with
categorical_features
set to all False,_legacy_mode
is set to True but_legacy_fit_transform
never gets called. To fix this, explicitly tested for this and setcategories_
to an empty list. This fix also allowsget_feature_names
to return an empty array.Any other comments?