-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MNT add n_features_in_ through the feature_extraction module #20180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT add n_features_in_ through the feature_extraction module #20180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 remarks
# test idf transform with incompatible n_features | ||
X = [[1, 1, 5], | ||
[1, 1, 0]] | ||
t3.fit(X) | ||
X_incompt = [[1, 3], | ||
[1, 3]] | ||
with pytest.raises(ValueError): | ||
t3.transform(X_incompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now checked through the common tests
sklearn/utils/estimator_checks.py
Outdated
if ("2darray" not in tags["X_types"] and "sparse" not in tags["X_types"] or | ||
tags["no_validation"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TfidfTransformer has tags["X_types"] = "sparse". I guess it does not hurt to allow it as well since sparse is 2d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Part of #19333
the feature_extraction module contains CountVectorizer, DictVectorizer, FeatureHasher, HashingVectorizer, PatchExtractor, TfidfTransformer and TfidfVectorizer.
n_features_in_
is only relevant for the TfidfTransformer.