Skip to content

[MRG] MNT requires_y tag with y=None validation #16622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 106 commits into from
Apr 22, 2020

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Mar 3, 2020

Follow up to #16112

EDIT: the tag is now requires_y

This PR adds a is_supervised tag and uses it to raise an error in _validate_data() if:

  • a supervised estimator is passed y=None
  • a non-supervised estimator is passed y!=None

This PR is mostly motivated by the fact that since #16112, supervised estimators that are passed y=None will fail with a tuple unpacking error instead of a nicer error message.

This comes with a bunch of complications:

  • This forces supervised estimators to call _validate_data(X, y), instead of validating X and y separately. Since calling check_array(X, ...) and then check_array(y, ...) isn't in general equivalent to calling check_X_y(X, y, ...), I had to introduce a way to check X and y separately when calling _validate_data(X, y, ...). This is really ugly. It will also definitely not work for third-party estimators that inherit from ours.

  • Some estimators like OneClassSVM, IsolationForest, and RandomTreesEmbedding are unsupervised, but they generate a random y that is passed down to their base classes and validated there with _validate_data(X, y). This makes _validate_data fail because of the check mentioned above. So we need custom checks in the validation of the base classes in each of these. This is really, really ugly.

  • Since _validate_data assumes the tag exists, this means that all estimators using validate_data must also support the tag. IMO, that's fine.

…ranch 'master' of github.com:scikit-learn/scikit-learn into n_features_in
@jnothman
Copy link
Member

jnothman commented Apr 16, 2020 via email

@NicolasHug
Copy link
Member Author

Admittedly the whole
idea of being optionally supervised in cross decomposition invalidates the
point of this tag.

Do you feel the same about a require_y tag?

@jnothman
Copy link
Member

Do you feel the same about a require_y tag?

No, that more explicitly matches what the tag is used to ensure.

@NicolasHug NicolasHug changed the title [MRG] MNT is_supervised_tag with y=None validation [MRG] MNT requires_y tag with y=None validation Apr 20, 2020
@NicolasHug
Copy link
Member Author

The tag is now requires_y and it defaults to False.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @NicolasHug

@cmarmo
Copy link
Contributor

cmarmo commented Apr 21, 2020

Hi @scikit-learn/core-devs, time to merge this one?

@agramfort
Copy link
Member

agramfort commented Apr 21, 2020 via email

@jnothman
Copy link
Member

The tag is now requires_y and it defaults to False.

Wonderful. More precise/explicit, and I think also a shorter PR. Thanks so much for your efforts, @NicolasHug.

Only remaining issue: should the new tag be mentioned in what's new?

@jnothman jnothman merged commit 089c8a1 into scikit-learn:master Apr 22, 2020
@jnothman
Copy link
Member

Thanks @NicolasHug

@agramfort
Copy link
Member

thanks heaps @NicolasHug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants