Skip to content

get_feature_names handles integer column names #16670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

arunavkonwar
Copy link
Contributor

Fixes #16593

What does this implement/fix? Explain your changes.

OneHotEncoder.get_feature_names now handles integer column names.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the conclusion was to raise when input feature names are integers, not silently convert them to strings.

@arunavkonwar
Copy link
Contributor Author

Hi, @adrinjalali thanks for the comment. I have updated the code to reflect the needed changes for the issue. Please have a look. Thanks :)

@amueller
Copy link
Member

amueller commented Mar 13, 2020

I think we should discuss this more widely. So basically we're saying that integer column names are fine as long as we don't want feature names. What would that do once we propagate feature names?

@adrinjalali
Copy link
Member

I'd say as soon as we have to "look" at the feature names, we raise if they're not strings. We could raise in fit w/o feature name propagation if the user passes integer column names, but I'm okay with being "lazy" there ans raising only if they're used.

@Sahanave
Copy link

Sahanave commented Jun 6, 2020

I am not sure whether it's merged. I have a suggestion. Do you think we should have a function that raises an error in utils?[or where ever common functions are being stored] For example:
function check_whether_X_has_interger_column_names:
if yes:
raise error
I am wondering if there would be other places where such a functionality check is required.
I would really like to hear what you all think ( @adrinjalali ,@arunavkonwar and @amueller).

@adrinjalali
Copy link
Member

I think this would be fixed in #17407

@amueller
Copy link
Member

amueller commented Nov 30, 2020

I'd say as soon as we have to "look" at the feature names, we raise if they're not strings. We could raise in fit w/o feature name propagation if the user passes integer column names, but I'm okay with being "lazy" there ans raising only if they're used.

I disagree and would fight not to have this solution. I don't care that what pandas does is bad. I'm happy to deprecate integer column names once pandas deprecated them [i.e. I veto this change so you can't do it without a core dev vote or convincing me :P ].

Base automatically changed from master to main January 22, 2021 10:52
@thomasjpfan
Copy link
Member

Closing as get_feature_names is deprecated in favor of get_feature_names_out.

@thomasjpfan thomasjpfan closed this Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OneHotEncoder.get_feature_names doesn't work with integer column names
5 participants