-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Handle with type problem for OneHotEncoder.get_feature_names #16593 #21754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- cast features names into strings - raise a warning to inform user that non-string features names could cause problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we intended it the way it was but in the face of use-cases we can maybe change our mind.
A PR can be a good way to have a concrete discussions but it requires detailed test to set the explicit behavior we want to implement on concrete cases.
Furthermore, if we decide to go this way, we should probably factor-out this input validation logic into a utility function _check_input_feature_names
in charge of raising the warning with an informative error message and using this function consistently in all the get_feature_names
methods of scikit-learn. This means that we will also need a new common test to check this on invalid inputs for all the scikit-learn estimators with a get_feature_names
method.
Even if we decide against the automatic str
conversion with the warning, we probably want to instead raise a TypeError
with an informative error message instead.
@@ -692,7 +692,9 @@ def get_feature_names(self, input_features=None): | |||
|
|||
feature_names = [] | |||
for i in range(len(cats)): | |||
names = [input_features[i] + "_" + str(t) for t in cats[i]] | |||
if type(input_features) is not str: | |||
warnings.warn("Features names aren't strings, errors could occur") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide always the observed invalid value in error message, for instance:
warnings.warn(
"Unexpected type of feature name feature_names[{i}]={feature_names[i]!r}. "
"Converting to str now. To silence this warning, please make sure encode "
"feature names as `str` from the start."
)
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Reference Issues/PRs
Handle with type problem for get_feature_names #16593
Hello, I have to work on this issue for a school project.
I propose to cast integer-columns to string-columns and put a warning to inform the user that it is possible that others errors will happen later.
I don't think we should force the user to put strings because some of the cases stated above are common (like csv files without headers). We can't change all the consistency problems of Pandas. """