Skip to content

DBSCAN gives incorrect result on precomputed sparse input if there are only zeros in first row #8306

Closed
@alxgrh

Description

@alxgrh

Description

DBSCAN returns incorrect labels array on given precomputed sparse input if there are only zeros in first row.

Steps/Code to Reproduce

import numpy as np
from scipy.sparse import csr_matrix
from sklearn.cluster import dbscan

# Create example distance matrix 
# On such input and with epsilon value equal to 0.2 DBSCAN should leave first row unclustered, put 2nd and 3rd rows to one cluster and put 4th and 5th rows to another cluster
ar = np.array([
        [0.0, 0.0, 0.0, 0.0, 0.0 ],
        [0.0, 0.0, 0.2, 0.0, 0.3 ],
        [0.0, 0.2, 0.0, 0.0, 0.0 ],
        [0.0, 0.0, 0.0, 0.0, 0.1 ],
        [0.0, 0.3, 0.0, 0.1, 0.0 ]
    ])
matrix = csr_matrix(ar)

# direct method used just for reference. DBSCAN.fit() gives the same result.
dbscan(matrix, metric='precomputed', eps=0.2, min_samples=2)

Expected Results

(array([1, 2, 3, 4]), array([-1, 0, 0, 1, 1]))

Actual Results

(array([0, 2, 3, 4]), array([0, 0, 0, 1, 0]))

Error appears only if first row of input matrix consist of only zeroes.

Versions

Linux-4.9.4-100.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four
('Python', '2.7.12 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:42:40) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')
('NumPy', '1.10.4')
('SciPy', '0.17.0')
('Scikit-Learn', '0.17.1')

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions