Skip to content

ENH Checks n_features_in_ in preprocessing module #18577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Oct 21, 2020

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Continues #18514 for the preprocessing module.

raise ValueError('X does not have the same number of features as'
' the previously fitted data. Got {} instead of'
' {}.'.format(X.shape[1],
self.quantiles_.shape[1]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the fact that this breaks stateless-ness for the Normalizer. I think we should respect the fact that it's possible to directly call transform without calling fit on stateless transformers.

WDYT?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this PR the way it is but we first need to reach an agreement on #18011 first before merging this PR.

@ogrisel
Copy link
Member

ogrisel commented Oct 13, 2020

Now that #18011 is merged, I believe this is ready for merge.

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
@ogrisel
Copy link
Member

ogrisel commented Oct 14, 2020

Weird, there is still conflicts in the whats_new/0.24.rst merge commit. I believe there should not be any change in that file from this PR right? I will give it a shot.

@ogrisel
Copy link
Member

ogrisel commented Oct 14, 2020

Ok, hopefully that fixed it.

@ogrisel
Copy link
Member

ogrisel commented Oct 14, 2020

@NicolasHug I think this is ready for quick merge.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomasjpfan

@@ -1310,12 +1310,10 @@ def test_quantile_transform_check_error():

X_bad_feat = np.transpose([[0, 25, 50, 0, 0, 0, 75, 0, 0, 100],
[0, 0, 2.6, 4.1, 0, 0, 2.3, 0, 9.5, 0.1]])
err_msg = ("X does not have the same number of features as the previously"
" fitted " "data. Got 2 instead of 3.")
err_msg = ("X has 2 features, but QuantileTransformer is expecting "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this check since we have a common check?

@@ -104,7 +104,8 @@ def test_fit_transform_n_bins_array(strategy, expected):
def test_invalid_n_features():
est = KBinsDiscretizer(n_bins=3).fit(X)
bad_X = np.arange(25).reshape(5, -1)
err_msg = "Incorrect number of features. Expecting 4, received 5"
err_msg = ("X has 5 features, but KBinsDiscretizer is expecting 4 "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, should this just be removed?

@glemaitre glemaitre self-requested a review October 21, 2020 15:53
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just added in_fit in one of the docstring because it was missing.

@glemaitre glemaitre merged commit d933c20 into scikit-learn:master Oct 21, 2020
thomasjpfan added a commit to thomasjpfan/scikit-learn that referenced this pull request Oct 28, 2020
Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants