Skip to content

sklearn.metrics.silhouette_samples does not work with sparse matrices #18524

@moi90

Description

@moi90

Describe the bug

silhouette_samples(X, y, metric="precomputed") fails with ValueError: diag requires an array of at least two dimensions if X is sparse.

Steps/Code to Reproduce

from sklearn.metrics import silhouette_samples
from sklearn.neighbors import kneighbors_graph
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=1000, centers=5, n_features=10)

pdist = kneighbors_graph(X, 5, mode='distance')

# pdist is scipy.sparse.csr.csr_matrix

silhouette_samples(pdist, y, metric="precomputed")

Expected Results

No error is thrown.

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-140890024e6c> in <module>
----> 1 silhouette_samples(pdist, y, metric="precomputed")

/data1/mschroeder/miniconda3/envs/20-assdc/lib/python3.7/site-packages/sklearn/metrics/cluster/_unsupervised.py in silhouette_samples(X, labels, metric, **kwds)
    216     if metric == 'precomputed':
    217         atol = np.finfo(X.dtype).eps * 100
--> 218         if np.any(np.abs(np.diagonal(X)) > atol):
    219             raise ValueError(
    220                 'The precomputed distance matrix contains non-zero '

<__array_function__ internals> in diagonal(*args, **kwargs)

/data1/mschroeder/miniconda3/envs/20-assdc/lib/python3.7/site-packages/numpy/core/fromnumeric.py in diagonal(a, offset, axis1, axis2)
   1615         return asarray(a).diagonal(offset=offset, axis1=axis1, axis2=axis2)
   1616     else:
-> 1617         return asanyarray(a).diagonal(offset=offset, axis1=axis1, axis2=axis2)
   1618 
   1619 

ValueError: diag requires an array of at least two dimensions

Possible fix

Change np.diagonal(X) to X.diagonal(), because this is implemented for numpy.ndarray as well as scipy.sparse.csr_matrix.diagonal. Or should this rather be fixed in numpy (so that np.diagonal does the right thing for sparse matrices)?

Versions

System:
    python: 3.7.6 | packaged by conda-forge | (default, Jun  1 2020, 18:57:50)  [GCC 7.5.0]
executable: /data1/mschroeder/miniconda3/envs/20-assdc/bin/python
   machine: Linux-4.4.0-109-generic-x86_64-with-debian-stretch-sid

Python dependencies:
       pip: 20.1.1
setuptools: 47.3.1.post20200616
   sklearn: 0.22.2.post1
     numpy: 1.18.5
     scipy: 1.4.1
    Cython: None
    pandas: 1.0.5
matplotlib: 3.2.1
    joblib: 0.15.1

Built with OpenMP: True

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions