-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Adds feature_names_out to impute module #21078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Adds feature_names_out to impute module #21078
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasjpfan Thanks for your dedication for feature names!
sklearn/impute/_base.py
Outdated
Transformed feature names. | ||
""" | ||
input_features = _check_feature_names_in(self, input_features) | ||
return input_features[self.features_] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is our rule for when to prefix and when not? The indicator output column of SimpleImputer
is prefixed with "indicator_"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we have a rule. I went with SimpleImputer
prefixing "indicator_"
because it needed to distinguish between the imputed features and the indicator.
Looking at this again, I think we can go with MissingIndcator
to add the prefix. This way SimpleImputer
only needs to combine them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this is fine to preserve the original names for the imputed features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Apart from the previous review comments, I would like to use a more explicit prefix:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Remark: The convention introduced here is tor prefix with prefix = self.__class__.__name__.lower()
.
Before merging, should we wait for #21334 and then also use |
I do not think we need to wait. This PR is slightly different. |
FYI: This PR has been merged into the
Took me quite a while to figure that out. |
It is still not included 1.0.2. Should it be this way? |
@ademyanchuk Yes, it should be this way. I believe scikit-learn follows the process of semantic versioning:
|
Thank you, @timotk |
As info: Scikit-learn does not follow SemVer. But new features as this one, are usually rolled out in a minor version like 1.1.0 or 1.2.0. Bugfix versions like 1.0.1 are for bugfixes only. |
Reference Issues/PRs
Continues #18444
Fixes #21200
What does this implement/fix? Explain your changes.
Adds
feature_names_out
to theimpute
module.Any other comments?
There is an edge case I am concerned about:
For this edge case, we can add
sklearn
to the prefix resulting insklearn_indicator_a
?