Closed
Description
It seems there is a convergence issue in the PAM implementation. @TimotheeMathieu
On Google Colab
!pip install https://github.com/scikit-learn-contrib/scikit-learn-extra/archive/master.zip
import sklearn, numpy
import sklearn_extra.cluster
# Data set 20news
import sklearn.datasets
X, y = sklearn.datasets.fetch_20newsgroups_vectorized(return_X_y=True)
X, y = sklearn.utils.shuffle((X, y), random_state=1)
# Precompute cosine distance matrix
import sklearn.metrics.pairwise
diss = sklearn.metrics.pairwise.cosine_distances(X)
# run PAM from scikit-learn-extra
ske = sklearn_extra.cluster.KMedoids(20, "precomputed", method="pam", init="build")
ske.fit(diss)
It appears to run into the max_iter limit of 300 swap iterations: Maximum number of iteration reached before convergence.
And takes ages. The PAM implementation from the kmedoids
package takes just 3 iterations and 33883.32 ms.
Metadata
Metadata
Assignees
Labels
No labels