Skip to content

[MRG] Check dataframe columns names for StandardScaler methods #11607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

NicolasHug
Copy link
Member

Reference Issues/PRs

This addresses #7242 for the StandardScaler class. This goal of this PR is to evaluate how many changes would be needed to completely fix #7242.

What does this implement/fix? Explain your changes.

This PR adds the _check_column_names method to BaseEstimator and uses it in StandardScaler.

_check_column_names is a helper to verify that the dataframes passed as input to fit(), predict(), transform() (etc.) have the same column names. This prevents user issues (#7242)

Any other comments?

I definitely don't have a sufficiently clear overview of the whole project to be certain, but the changes to the estimators code seem relatively small: only a line here and there.

cc @amueller

@amueller
Copy link
Member

Not sure if you heard the discussion earlier but I'll write a proposal for this. It's good to have the implementation, though.

@NicolasHug NicolasHug changed the title [WIP] Check dataframe columns names for StandardScaler methods [MRG] Check dataframe columns names for StandardScaler methods Jul 17, 2018
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite nice, the biggest problem being that it adds additional validation to every estimator, where perhaps it should be replacing check_X_y with a stateful version on the class.

@NicolasHug
Copy link
Member Author

Closing, supersedded by many stuff (n_features_in_ and subsequent checks)

@NicolasHug NicolasHug closed this Feb 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision Requires decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential error caused by different column order
3 participants