Classifiers may not work with arrays defining __array_function__

#### Description

With [NEP-18](https://www.numpy.org/neps/nep-0018-array-function-protocol.html), numpy functions that previously converted an array-like to an ndarray may no longer do the (implicit) conversion. `dask.array` recently implemented `__array_function__` so `np.unique(dask.array.Array)` now returns a `dask.array.Array`.


Some more details in https://github.com/dask/dask-ml/issues/541

#### Steps/Code to Reproduce

```python
import dask.array as da
import dask_ml.datasets
import sklearn.linear_model

X, y = dask_ml.datasets.make_classification(chunks=50)

clf = sklearn.linear_model.LogisticRegression()
clf.fit(X, y)
```

#### Expected Results

No error, the same output as `clf.fit(X.compute(), y.compute())`,
or by setting the environment variable `NUMPY_EXPERIMENTAL_ARRAY_FUNCTION='0'`.

#### Actual Results

That raises

```pytb
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-b0953fbb1d6e> in <module>
----> 1 clf.fit(X, y)

~/Envs/dask-dev/lib/python3.7/site-packages/sklearn/linear_model/logistic.py in fit(self, X, y, sample_weight)
   1536
   1537         multi_class = _check_multi_class(self.multi_class, solver,
-> 1538                                          len(self.classes_))
   1539
   1540         if solver == 'liblinear':

TypeError: 'float' object cannot be interpreted as an integer
```

This is because `self.classes_ = np.unique(y)` is a Dask Array with unknown length 

```python
In [2]: np.unique(da.arange(12))
Out[2]: dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>
```

since Dask is lazy and doesn't know the unique elements until compute time.

#### Versions

```
System:
    python: 3.7.3 (default, Apr  5 2019, 14:56:38)  [Clang 10.0.1 (clang-1001.0.46.3)]
executable: /Users/taugspurger/Envs/dask-dev/bin/python
   machine: Darwin-18.6.0-x86_64-i386-64bit

Python deps:
       pip: 19.2.1
setuptools: 41.0.1
   sklearn: 0.21.3
     numpy: 1.18.0.dev0+5e7e74b
     scipy: 1.2.0
    Cython: 0.29.9
    pandas: 0.25.0+169.g5de4e55d6

```

I think this needs need NumPy>=1.17 and Dask>=2.0.0

---

Possible solution: Explicitly convert array-likes to concrete ndarrays where necessary (this is a bit hard to determine though). For example https://github.com/scikit-learn/scikit-learn/blob/148491867920cc2af0e7e5700a0299be4a5d1c9f/sklearn/linear_model/logistic.py#L1517 would be `self.classes_ = np.asarray(np.unique(y))`. That may not be ideal for other libraries implementing `__array_function__` (like pydata/sparse).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Classifiers may not work with arrays defining __array_function__ #14687

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Classifiers may not work with arrays defining __array_function__ #14687

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions