-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
MNT Allow to only validate y in _validate_data #20227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am fine with this change. I can see a use case in HalvingSearchCV
as well.
it's also useful for #19692 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I just pushed an improvement to be stricter with complex values not being accepted in y. I think this was an oversight.
It's technically a breaking change for subclasses of scikit-learn that rely on _validate_date
but since this is private API, I think this is fine.
Merged! Thanks @jeremiedbb! |
We are trying to make meta-estimators delegate input validation to their underlying estimator. However some meta-estimators perform some transformation on y before passing it to their underlying estimator (e.g. OutputCodeClassifier). For those it's more convenient to allow _validate_data to only perform the "y part" of check_X_y.
I think it's ok to have such a limited validation on y (column_or_1d + assert_all_finite) because we only need the minimum to pass the required transformations and it will be eventually validated by the underlying estimator.