-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Checks n_features_in_ in preprocessing module #18577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Checks n_features_in_ in preprocessing module #18577
Conversation
raise ValueError('X does not have the same number of features as' | ||
' the previously fitted data. Got {} instead of' | ||
' {}.'.format(X.shape[1], | ||
self.quantiles_.shape[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except the fact that this breaks stateless-ness for the Normalizer
. I think we should respect the fact that it's possible to directly call transform
without calling fit
on stateless transformers.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this PR the way it is but we first need to reach an agreement on #18011 first before merging this PR.
Now that #18011 is merged, I believe this is ready for merge. |
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Weird, there is still conflicts in the |
Ok, hopefully that fixed it. |
…features_in_preprocessing
@NicolasHug I think this is ready for quick merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @thomasjpfan
@@ -1310,12 +1310,10 @@ def test_quantile_transform_check_error(): | |||
|
|||
X_bad_feat = np.transpose([[0, 25, 50, 0, 0, 0, 75, 0, 0, 100], | |||
[0, 0, 2.6, 4.1, 0, 0, 2.3, 0, 9.5, 0.1]]) | |||
err_msg = ("X does not have the same number of features as the previously" | |||
" fitted " "data. Got 2 instead of 3.") | |||
err_msg = ("X has 2 features, but QuantileTransformer is expecting " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove this check since we have a common check?
@@ -104,7 +104,8 @@ def test_fit_transform_n_bins_array(strategy, expected): | |||
def test_invalid_n_features(): | |||
est = KBinsDiscretizer(n_bins=3).fit(X) | |||
bad_X = np.arange(25).reshape(5, -1) | |||
err_msg = "Incorrect number of features. Expecting 4, received 5" | |||
err_msg = ("X has 5 features, but KBinsDiscretizer is expecting 4 " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, should this just be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I just added in_fit
in one of the docstring because it was missing.
Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com> Co-authored-by: Joel Nothman <joel.nothman@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Continues #18514 for the preprocessing module.