-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] add get_feature_names to PolynomialFeatures #6372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ping @jakevdp ;) |
48eb9e7
to
30e591e
Compare
@@ -1182,6 +1182,35 @@ def powers_(self): | |||
return np.vstack(np.bincount(c, minlength=self.n_input_features_) | |||
for c in combinations) | |||
|
|||
def get_feature_names(self, input_features=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been suggesting for a while that this is an appropriate way to deal with feature names in extractor/transformer pipelines. I'm happy to see this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Very nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wanting to do it but didn't have time. I'm just writing the book chapter about preprocessing and I'm embarrassed not having that feature ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, who knew writing books would be good for something?
LGTM. |
I qualify that LGTM. I think we need to decide whether A further small issue: should these be |
---------- | ||
input_features : list of string, length n_features, optional | ||
String names for input features if available. By default, | ||
"x0", "x1", ... "x_n_features" is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have underscores (x_0, x_1
) to match code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be cleaner without underscores? I find things like x_1^2 x_2^1
harder to visually parse than x1^2 x2^1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah... well I was thinking about latex. But maybe that's silly.
30e591e
to
ec4d92d
Compare
I changed the naming scheme to |
ec4d92d
to
bff115c
Compare
added a unicode test. Is that what you had in mind @jnothman ? |
# test some unicode | ||
poly = PolynomialFeatures(degree=1, include_bias=True).fit(X) | ||
feature_names = poly.get_feature_names([u"\u0001F40D", u"\u262E", u"\u05D0"]) | ||
assert_array_equal(["1", u"\u0001F40D^1", u"\u262E^1", u"\u05D0^1"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit awkward that 1 is not unicode
. Can we find a simple convention for such things?
That consistency (I assume you mean with visualisation) might be worthwhile, though I know that the use of the first subscript (as opposed to |
hm, I can't reproduce the error... it seems to be in |
|
bff115c
to
ddc1740
Compare
should be fixed. I have no opinion re |
I would prefer |
9a11f46
to
897f8b6
Compare
+1 for merge |
+1 |
|
||
Returns | ||
------- | ||
output_feature_names : list of string, length n_output_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nitpick here:
Maybe there should be a "." in the end of this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conventionally not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see.
So this line:
https://github.com/amueller/scikit-learn/blob/poly_feature_names/sklearn/preprocessing/data.py#L1586
ends with a "." because it's a new sentence used to describe output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is description, not type.
should be good now. |
Once tests pass I'd say we can merge |
Thanks @amueller! |
[MRG+2] add get_feature_names to PolynomialFeatures
Fixes #6185, replaces #6216