Closed
Description
Description
DBSCAN returns incorrect labels array on given precomputed sparse input if there are only zeros in first row.
Steps/Code to Reproduce
import numpy as np
from scipy.sparse import csr_matrix
from sklearn.cluster import dbscan
# Create example distance matrix
# On such input and with epsilon value equal to 0.2 DBSCAN should leave first row unclustered, put 2nd and 3rd rows to one cluster and put 4th and 5th rows to another cluster
ar = np.array([
[0.0, 0.0, 0.0, 0.0, 0.0 ],
[0.0, 0.0, 0.2, 0.0, 0.3 ],
[0.0, 0.2, 0.0, 0.0, 0.0 ],
[0.0, 0.0, 0.0, 0.0, 0.1 ],
[0.0, 0.3, 0.0, 0.1, 0.0 ]
])
matrix = csr_matrix(ar)
# direct method used just for reference. DBSCAN.fit() gives the same result.
dbscan(matrix, metric='precomputed', eps=0.2, min_samples=2)
Expected Results
(array([1, 2, 3, 4]), array([-1, 0, 0, 1, 1]))
Actual Results
(array([0, 2, 3, 4]), array([0, 0, 0, 1, 0]))
Error appears only if first row of input matrix consist of only zeroes.
Versions
Linux-4.9.4-100.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four
('Python', '2.7.12 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:42:40) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')
('NumPy', '1.10.4')
('SciPy', '0.17.0')
('Scikit-Learn', '0.17.1')