[MRG] Check dataframe columns names for StandardScaler methods #11607
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
This addresses #7242 for the StandardScaler class. This goal of this PR is to evaluate how many changes would be needed to completely fix #7242.
What does this implement/fix? Explain your changes.
This PR adds the
_check_column_names
method toBaseEstimator
and uses it inStandardScaler
._check_column_names
is a helper to verify that the dataframes passed as input to fit(), predict(), transform() (etc.) have the same column names. This prevents user issues (#7242)Any other comments?
I definitely don't have a sufficiently clear overview of the whole project to be certain, but the changes to the estimators code seem relatively small: only a line here and there.
cc @amueller