-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Closed as not planned
Labels
Description
Describe the bug
When using sklearn.cluster.HDBSCAN
with metric="precomputed"
, the input distance matrix is modified after calling fit_predict()
. The original hdbscan
package (v0.8.40) works correctly.
Steps/Code to Reproduce
import numpy as np
from sklearn.cluster import HDBSCAN
rmsd_matrix = np.random.rand(5, 5)
rmsd_matrix = (rmsd_matrix + rmsd_matrix.T) / 2
np.fill_diagonal(rmsd_matrix, 0)
print("Before HDBSCAN:")
print(rmsd_matrix)
hdb = HDBSCAN(metric="precomputed", min_cluster_size=2)
hdb.fit_predict(rmsd_matrix)
print("\nAfter HDBSCAN:")
print(rmsd_matrix) # Matrix is changed!
Expected Results
Input matrix should remain unchanged (as in original hdbscan).
Actual Results
Input matrix is changed
Versions
System:
python: 3.12.3 (main, Jun 18 2025, 17:59:45) [GCC 13.3.0]
executable: /home/username/project/bin/python3
machine: Linux-6.14.0-27-generic-x86_64-with-glibc2.39
Python dependencies:
sklearn: 1.7.0
pip: 24.0
setuptools: 80.9.0
numpy: 2.2.6
scipy: 1.16.0
Cython: None
pandas: 2.3.0
matplotlib: 3.10.3
joblib: 1.5.1
threadpoolctl: 3.6.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 18
prefix: libscipy_openblas
filepath: /home/username/project/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-56d6093b.so
version: 0.3.29
threading_layer: pthreads
architecture: Haswell
user_api: blas
internal_api: openblas
num_threads: 18
prefix: libscipy_openblas
filepath: /home/username/project/lib/python3.12/site-packages/scipy.libs/libscipy_openblas-68440149.so
version: 0.3.28
threading_layer: pthreads
architecture: Haswell
user_api: openmp
internal_api: openmp
num_threads: 18
prefix: libgomp
filepath: /home/username/project/lib/python3.12/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
version: None