Skip to content

HDBSCAN modifies input precomputed distance matrix #31907

@tolyanchik5

Description

@tolyanchik5

Describe the bug

When using sklearn.cluster.HDBSCAN with metric="precomputed", the input distance matrix is modified after calling fit_predict(). The original hdbscan package (v0.8.40) works correctly.

Steps/Code to Reproduce

import numpy as np
from sklearn.cluster import HDBSCAN

rmsd_matrix = np.random.rand(5, 5)
rmsd_matrix = (rmsd_matrix + rmsd_matrix.T) / 2
np.fill_diagonal(rmsd_matrix, 0)

print("Before HDBSCAN:")
print(rmsd_matrix)

hdb = HDBSCAN(metric="precomputed", min_cluster_size=2)
hdb.fit_predict(rmsd_matrix)

print("\nAfter HDBSCAN:")
print(rmsd_matrix)  # Matrix is changed!

Expected Results

Input matrix should remain unchanged (as in original hdbscan).

Actual Results

Input matrix is changed

Versions

System:
    python: 3.12.3 (main, Jun 18 2025, 17:59:45) [GCC 13.3.0]
executable: /home/username/project/bin/python3
   machine: Linux-6.14.0-27-generic-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.7.0
          pip: 24.0
   setuptools: 80.9.0
        numpy: 2.2.6
        scipy: 1.16.0
       Cython: None
       pandas: 2.3.0
   matplotlib: 3.10.3
       joblib: 1.5.1
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 18
         prefix: libscipy_openblas
       filepath: /home/username/project/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-56d6093b.so
        version: 0.3.29
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 18
         prefix: libscipy_openblas
       filepath: /home/username/project/lib/python3.12/site-packages/scipy.libs/libscipy_openblas-68440149.so
        version: 0.3.28
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 18
         prefix: libgomp
       filepath: /home/username/project/lib/python3.12/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions