[MRG] MNT requires_y tag with y=None validation #16622

NicolasHug · 2020-03-03T17:00:26Z

Follow up to #16112

EDIT: the tag is now requires_y

This PR adds a is_supervised tag and uses it to raise an error in _validate_data() if:

a supervised estimator is passed y=None
~~a non-supervised estimator is passed y!=None~~

This PR is mostly motivated by the fact that since #16112, supervised estimators that are passed y=None will fail with a tuple unpacking error instead of a nicer error message.

This comes with a bunch of complications:

This forces supervised estimators to call _validate_data(X, y), instead of validating X and y separately. Since calling check_array(X, ...) and then check_array(y, ...) isn't in general equivalent to calling check_X_y(X, y, ...), I had to introduce a way to check X and y separately when calling _validate_data(X, y, ...). This is really ugly. It will also definitely not work for third-party estimators that inherit from ours.
Some estimators like OneClassSVM, IsolationForest, and RandomTreesEmbedding are unsupervised, but they generate a random y that is passed down to their base classes and validated there with _validate_data(X, y). This makes _validate_data fail because of the check mentioned above. So we need custom checks in the validation of the base classes in each of these. This is really, really ugly.
Since _validate_data assumes the tag exists, this means that all estimators using validate_data must also support the tag. IMO, that's fine.

…features_in

…ranch 'master' of github.com:scikit-learn/scikit-learn into n_features_in

…features_in

jnothman · 2020-04-16T22:22:50Z

It can work out, but making it neat would take more work than I could put in at the time. It frustrated me that some classes inherit from RegressorMixin despite being optionally unsupervised. Admittedly the whole idea of being optionally supervised in cross decomposition invalidates the point of this tag.

NicolasHug · 2020-04-17T12:14:17Z

Admittedly the whole
idea of being optionally supervised in cross decomposition invalidates the
point of this tag.

Do you feel the same about a require_y tag?

jnothman · 2020-04-19T14:57:52Z

Do you feel the same about a require_y tag?

No, that more explicitly matches what the tag is used to ensure.

…_supervised_tag

NicolasHug · 2020-04-20T14:17:33Z

The tag is now requires_y and it defaults to False.

adrinjalali

thanks @NicolasHug

sklearn/base.py

sklearn/linear_model/_coordinate_descent.py

sklearn/neighbors/_base.py

sklearn/utils/estimator_checks.py

Co-Authored-By: Adrin Jalali <adrin.jalali@gmail.com>

…n into is_supervised_tag

…_supervised_tag

cmarmo · 2020-04-21T18:57:27Z

Hi @scikit-learn/core-devs, time to merge this one?

agramfort · 2020-04-21T19:17:43Z

+1 on my side maybe @ogrisel or @GaelVaroquaux want to have a look?

…

jnothman · 2020-04-21T23:32:23Z

The tag is now requires_y and it defaults to False.

Wonderful. More precise/explicit, and I think also a shorter PR. Thanks so much for your efforts, @NicolasHug.

Only remaining issue: should the new tag be mentioned in what's new?

jnothman · 2020-04-22T12:41:55Z

Thanks @NicolasHug

agramfort · 2020-04-23T06:56:46Z

thanks heaps @NicolasHug

NicolasHug added 30 commits April 9, 2019 14:12

Basic validate_X and validate_X_y methods for _n_features_in attribute

7d9dcc4

created NonRectangularInputMixin

f117745

Merge remote-tracking branch 'upstream/master' into n_features_in

95b330c

resolved conflicts

e56592b

Merge branch 'master' of github.com:scikit-learn/scikit-learn into n_…

3bdcb5c

…features_in

_validate** is not private

8ecc690

Added support for pipeline and grid search

60e4cea

pep8

ff19f22

Trigger CI??

a44318b

Merge branch 'master' of github.com:scikit-learn/scikit-learn into n_…

42249fb

…features_in

Added to decision tree for gridsearch tests to pass

abdc94e

Merge branch 'master' of github.com:scikit-learn/scikit-learn into n_…

a50e76f

…features_in

Added support for ColumnTransformer and FeatureUnion

62fc42e

pep8

6845788

Merge branch 'master' of github.com:scikit-learn/scikit-learn into n_…

3246436

…features_in

BaseSearchCV now raises AttributeError

ee2598b

Merge branch 'master' of github.com:scikit-learn/scikit-learn into n_…

6a14e4b

…features_in

Merge branch 'master' of github.com:scikit-learn/scikit-learn into n_…

3f2d44f

…features_in

Added common test + used _validate_XXX on most estimators

25fda0f

Fixed some test

9bdfb65

fixed issues for some estimators

be76ef4

Merge branch 'n_features_in' of github.com:NicolasHug/scikit-learn; b…

b464f86

…ranch 'master' of github.com:scikit-learn/scikit-learn into n_features_in

fixed tests in test_data.py

70dc4ed

Fixed some tests

988f9c4

validate twice for Kmeans and FastICA

fd9b72c

again

4f3d6ff

and again

08f7192

Merge branch 'master' of github.com:scikit-learn/scikit-learn into n_…

5a41275

…features_in

should fix dep warning error

f0e7b41

removed superfluous tests

193fda1

NicolasHug added 3 commits April 20, 2020 08:30

Merge branch 'master' of github.com:scikit-learn/scikit-learn into is…

627c8f0

…_supervised_tag

Renamed tag into requires_y and changed default to False

4672cba

fixed test

4894908

NicolasHug changed the title ~~[MRG] MNT is_supervised_tag with y=None validation~~ [MRG] MNT requires_y tag with y=None validation Apr 20, 2020

NicolasHug added 2 commits April 20, 2020 10:25

probably reduced diff

c03bcd5

more diff

3fd613b

adrinjalali reviewed Apr 20, 2020

View reviewed changes

NicolasHug and others added 6 commits April 20, 2020 12:13

Apply suggestions from code review

89de5de

Co-Authored-By: Adrin Jalali <adrin.jalali@gmail.com>

addressed comments

f91352c

Merge branch 'is_supervised_tag' of github.com:NicolasHug/scikit-lear…

5f23988

…n into is_supervised_tag

Merge branch 'master' of github.com:scikit-learn/scikit-learn into is…

ec7a11b

…_supervised_tag

probably simplified neighbors logic

504d146

Fixed test when column_or_1d is called first

2630960

adrinjalali approved these changes Apr 20, 2020

View reviewed changes

agramfort approved these changes Apr 20, 2020

View reviewed changes

Added whatsnew + UG entry

48069cb

jnothman merged commit 089c8a1 into scikit-learn:master Apr 22, 2020

gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020

[MRG] MNT requires_y tag with y=None validation (scikit-learn#16622)

8e44161

amueller mentioned this pull request Jun 15, 2020

Allow determining whether a model is supervised programmatically #16468

Closed

viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020

[MRG] MNT requires_y tag with y=None validation (scikit-learn#16622)

80b340c

NicolasHug mentioned this pull request Jul 3, 2020

MNT Deprecates _estimator_type and replaces by a estimator tag #17806

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] MNT requires_y tag with y=None validation #16622

[MRG] MNT requires_y tag with y=None validation #16622

Uh oh!

NicolasHug commented Mar 3, 2020 •

edited

Loading

Uh oh!

jnothman commented Apr 16, 2020 via email

Uh oh!

NicolasHug commented Apr 17, 2020

Uh oh!

jnothman commented Apr 19, 2020

Uh oh!

NicolasHug commented Apr 20, 2020

Uh oh!

adrinjalali left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cmarmo commented Apr 21, 2020

Uh oh!

agramfort commented Apr 21, 2020 via email

Uh oh!

jnothman commented Apr 21, 2020

Uh oh!

jnothman commented Apr 22, 2020

Uh oh!

agramfort commented Apr 23, 2020

Uh oh!

Uh oh!

Uh oh!

[MRG] MNT requires_y tag with y=None validation #16622

[MRG] MNT requires_y tag with y=None validation #16622

Uh oh!

Conversation

NicolasHug commented Mar 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Apr 16, 2020 via email

Uh oh!

NicolasHug commented Apr 17, 2020

Uh oh!

jnothman commented Apr 19, 2020

Uh oh!

NicolasHug commented Apr 20, 2020

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cmarmo commented Apr 21, 2020

Uh oh!

agramfort commented Apr 21, 2020 via email

Uh oh!

jnothman commented Apr 21, 2020

Uh oh!

jnothman commented Apr 22, 2020

Uh oh!

agramfort commented Apr 23, 2020

Uh oh!

Uh oh!

NicolasHug commented Mar 3, 2020 •

edited

Loading