Skip to content

Conversation

jeremiedbb
Copy link
Member

We are trying to make meta-estimators delegate input validation to their underlying estimator. However some meta-estimators perform some transformation on y before passing it to their underlying estimator (e.g. OutputCodeClassifier). For those it's more convenient to allow _validate_data to only perform the "y part" of check_X_y.

I think it's ok to have such a limited validation on y (column_or_1d + assert_all_finite) because we only need the minimum to pass the required transformations and it will be eventually validated by the underlying estimator.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I am fine with this change. I can see a use case in HalvingSearchCV as well.

@jeremiedbb
Copy link
Member Author

it's also useful for #19692

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just pushed an improvement to be stricter with complex values not being accepted in y. I think this was an oversight.

It's technically a breaking change for subclasses of scikit-learn that rely on _validate_date but since this is private API, I think this is fine.

@ogrisel ogrisel merged commit 536f3cc into scikit-learn:main Jun 15, 2021
@ogrisel
Copy link
Member

ogrisel commented Jun 15, 2021

Merged! Thanks @jeremiedbb!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants