MNT Add estimator check for not calling __array_function__ #14702

amueller · 2019-08-20T21:24:53Z

test for #14687.

I don't think we test against numpy 1.17 so this will only fail on cron not here?

amueller · 2019-08-20T21:33:57Z

this is not a complete fix and is also breaking some things.
I feel like we might want to allow may_share_memory? If someone implements that wrongly and doesn't provide True or False they can figure that out themselves ;)

amueller · 2019-08-20T21:40:41Z

hm this is actually nearly ok but I gotta run ;)

amueller · 2019-08-21T14:29:38Z

now added a test for sample_weights as well. That wasn't tested with NotAnArray before (so didn't work) and I needed to add some logic to the slicing in GridSearchCV to make this work with NotAnArray fit_params

amueller · 2019-08-21T19:05:47Z

Should be good now?

amueller · 2019-08-21T19:13:14Z

oh coverage fails because we don't test on new numpy I guess. Should we? I think waiting till it's on conda might be ok?

thomasjpfan · 2019-08-21T19:14:41Z

sklearn/utils/estimator_checks.py


    def __array__(self, dtype=None):
        return self.data

+    def __array_function__(self, func, types, args, kwargs):


Returning True and raising TypeError needs to be tested?

you mean explicitly tested? Sure, i could do that. Thought that was a bit overkill for a test helper. They are obviously used in the tests, but on CI there's no __array_function__ protocol.

I would be more onboard if NotAnArray was private, which goes back to "public vs private utils" #6616 (comment)

On the other hand, we are depending on this raising when something is wrong. When everything is working, our tests do not run __array_function__.

That's a good point. I'll add a test (that won't be run)

adrinjalali · 2019-08-22T13:29:42Z

oh coverage fails because we don't test on new numpy I guess. Should we? I think waiting till it's on conda might be ok?

I'm not sure how many people install numpy from pypi, but my guess is "many". I'd rather have that, or conda-forge as the "test against the latest" in the CI, rather than conda. WDYT?

adrinjalali · 2019-08-22T13:32:15Z

sklearn/dummy.py

@@ -118,6 +118,7 @@ def fit(self, X, y, sample_weight=None):
        self.sparse_output_ = sp.issparse(y)

        if not self.sparse_output_:
+            y = np.asarray(y)


do we not want to have a more complex atleast_1d instead?

you mean column_or_1d? I honestly don't know why we would need atleast_1d. We usually want scalars to be treated differently.

there's also check_array(ensure_2d=False) which might be more suitable here?

yep, I tend to forget that we have both column_or_1d and atleast_1d. Ideally I think I'd rather have ensure_ndims=n in check_array maybe.

adrinjalali · 2019-08-22T13:33:01Z

sklearn/dummy.py

@@ -433,6 +434,8 @@ def fit(self, X, y, sample_weight=None):
        self.n_outputs_ = y.shape[1]

        check_consistent_length(X, y, sample_weight)
+        if sample_weight is not None:


I feel like this belongs to check_array too, doesn't it?

not sure what you mean. Using check_array(sample_weights, ensure_2d=True) instead of asarray? We don't have any tests for NaN in sample weights, do we? I'm not sure how much I want to make this PR about adding way more checks to sample weights

we have _check_sample_weight in validation

that does a lot more though, right?

amueller · 2019-08-22T14:10:43Z

The test failure is due to me using pytest in test_estimator_checks :-/ That will be fixed in #14381 I think.

I'm ok with moving the "latest" CI to conda-forge

# Conflicts: # sklearn/model_selection/tests/test_search.py

amueller · 2019-08-26T16:45:07Z

Never mind, #14381 doesn't fix that, I just won't use pytest in that file.

amueller · 2019-09-03T19:24:07Z

any takers? Should I include the change to CI in this PR?

adrinjalali · 2019-09-03T19:42:59Z

I was gonna do the CI change, if you don't, I'll do in a separate PR.

amueller · 2019-09-27T19:16:38Z

this is quite hard to keep in sync with master :-/

amueller · 2019-09-27T20:46:09Z

greeeeen

ogrisel

LGTM.

We need a changelog entry (and maybe a section in the doc) to explain that sklearn estimators will not rely on __array_function__ by default and instead materialize fit parameters as numpy arrays internally.

amueller · 2019-10-07T13:58:17Z

@ogrisel like this?

ogrisel

Maybe mention NEP 18 explicitly to improve googl-ability.

doc/whats_new/v0.22.rst

ogrisel · 2019-10-08T13:28:43Z

doc/whats_new/v0.22.rst

@@ -623,6 +627,10 @@ These changes mostly affect library developers.
  Such classifiers need to have the `binary_only=True` estimator tag.
  :pr:`13875` by `Trevor Stephens`_.

+- Estimators are expected to convert input data (``X``, ``y``,
+  ``sample_weights``) to numpy ``ndarray`` and never call
+  ``__array_function__`` on the original datatype that is passed.


Suggested change

``__array_function__`` on the original datatype that is passed.

``__array_function__`` on the original datatype that is passed. This means that,

by default, estimators are not expected to support the array function dispatching

mechanism of `NEP 18`_.

I don't think this wording is particularly clear. Estimators are forbidden from supporting the mechanism of NEP 18 with this PR.

I did not want to phrase it in a way that this will always be the case in the future. But phrase it as you wish as long as NEP 18 is mentioned :)

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

jnothman

otherwise lgtm

jnothman · 2019-10-23T13:08:05Z

doc/whats_new/v0.22.rst

@@ -623,6 +628,11 @@ These changes mostly affect library developers.
  Such classifiers need to have the `binary_only=True` estimator tag.
  :pr:`13875` by `Trevor Stephens`_.

+- Estimators are expected to convert input data (``X``, ``y``,
+  ``sample_weights``) to numpy ``ndarray`` and never call


why not use :class:`numpy.ndarray` to get the cross-reference?

doc/whats_new/v0.22.rst

jnothman · 2019-10-23T13:11:04Z

doc/whats_new/v0.22.rst

@@ -601,6 +601,11 @@ Changelog
 Miscellaneous
 .............

+- |API| Scikit-learn currently converts any input data structure implementing a duck array


Do you mean "now" instead of "currently"?

…-14702

NicolasHug · 2019-10-24T18:12:12Z

The only test is that estimators don't complain when you pass _NotAnArray, but it doesn't make sure __array_function__ isn't used, which is what we ultimately want right? At least that's what the what'snew says.

I think we should have _NotAnArray.__array_function__ raise an exception to properly cover that?

@amueller I addressed Joel's comments and merged with master. I'm happy to address future comments if you need

Regarding all the changes to sample_weight, I agree with @glemaitre that we could/should be using _check_sample_weights.
But I'm fine with doing that in another PR.

jnothman · 2019-10-24T20:32:17Z

I think we should have _NotAnArray.__array_function__ raise an exception

to properly cover that? Isn't that what this pr does?

NicolasHug · 2019-10-24T20:39:53Z

Ah, indeed. I wasn't reading that well for some reason

amueller added 3 commits August 20, 2019 17:22

estimator check for not calling __array_function__

9fe7275

make sure y is a numpy array in column_or_1d

76a1032

allow "may_share_memory"

f4b5017

amueller added 2 commits August 20, 2019 17:36

actually be conservative so as not to break everything

66e4f78

don't use array_functions in the common tests accidentally

22e1a3e

amueller added 2 commits August 21, 2019 09:30

some more asarray

537a93c

add test for sample_weights not an array

92013f5

amueller added 4 commits August 21, 2019 10:35

fix NotAnArray casting in _check_transformer

32015ca

allow 1d y in PLS

b03fae5

fix error message

3c95f15

don't care about order

4d2c751

amueller mentioned this pull request Aug 21, 2019

ENH: refactored utils/validation._check_sample_weights() and added stronger sample_weights checks for all estimators #14653

Closed

amueller mentioned this pull request Aug 21, 2019

Classifiers may not work with arrays defining __array_function__ #14687

Closed

thomasjpfan reviewed Aug 21, 2019

View reviewed changes

add tests for NotAnArray array_function asserts

0668558

adrinjalali reviewed Aug 22, 2019

View reviewed changes

amueller added the High Priority High priority issues and pull requests label Aug 26, 2019

amueller added 2 commits August 26, 2019 12:41

Merge branch 'master' into array_function_test

1689631

# Conflicts: # sklearn/model_selection/tests/test_search.py

don't use pytest in this file

73c3355

don't use deprecated NotAnArray

e0f239e

amueller force-pushed the array_function_test branch from bc9df43 to e0f239e Compare September 25, 2019 21:28

amueller added this to the 0.22 milestone Sep 27, 2019

amueller added 3 commits September 27, 2019 15:06

Merge branch 'master' into array_function_test

2b17749

fix naive bayes sample weights array conversion

0d75bea

fix gaussian NB unique call

6574e98

more deprecated NotAnArray

f4b1f92

ogrisel approved these changes Oct 7, 2019

View reviewed changes

whatsnew

60ada49

ogrisel reviewed Oct 8, 2019

View reviewed changes

amueller and others added 3 commits October 8, 2019 17:45

Update doc/whats_new/v0.22.rst

1a09940

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

Update doc/whats_new/v0.22.rst

a1d1362

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

add NEP 18 links

02a42bc

jnothman approved these changes Oct 23, 2019

View reviewed changes

NicolasHug mentioned this pull request Oct 23, 2019

DOC cleanup the roadmap #15332

Merged

NicolasHug added 2 commits October 24, 2019 13:37

Addressed Joel's comments

def3b81

Merge branch 'master' of github.com:scikit-learn/scikit-learn into pr…

21640e3

…-14702

NicolasHug approved these changes Oct 24, 2019

View reviewed changes

NicolasHug changed the title ~~[MRG] estimator check for not calling __array_function__~~ MNT Add estimator check for not calling __array_function__ Oct 24, 2019

NicolasHug merged commit a91bae7 into scikit-learn:master Oct 24, 2019

This was referenced Dec 11, 2019

Scalar fit_params no longer handled. Was: Singleton array (insert value here) cannot be considered a valid collection. #15805

Closed

Test __array_function__ not called in non-estimator API #15865

Open

thomasjpfan mentioned this pull request Feb 21, 2020

Some scikit-learn estimators no longer work with array_function enabled dask/dask-ml#541

Open

alexshacked mentioned this pull request Aug 29, 2020

[MRG] Test __array_function__ not called in non-estimator API (#15865) #18292

Open

-  ``__array_function__`` on the original datatype that is passed.
+  ``__array_function__`` on the original datatype that is passed. This means that,
+  by default, estimators are not expected to support the array function dispatching
+  mechanism of `NEP 18`_.

MNT Add estimator check for not calling __array_function__ #14702

MNT Add estimator check for not calling __array_function__ #14702

Conversation

amueller commented Aug 20, 2019

amueller commented Aug 20, 2019

amueller commented Aug 20, 2019

amueller commented Aug 21, 2019

amueller commented Aug 21, 2019

amueller commented Aug 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali commented Aug 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller Aug 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Aug 22, 2019

amueller commented Aug 26, 2019

amueller commented Sep 3, 2019

adrinjalali commented Sep 3, 2019

amueller commented Sep 27, 2019

amueller commented Sep 27, 2019

ogrisel left a comment

Choose a reason for hiding this comment

amueller commented Oct 7, 2019

ogrisel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Oct 24, 2019

jnothman commented Oct 24, 2019 via email

NicolasHug commented Oct 24, 2019

amueller Aug 22, 2019 •

edited

Loading