Skip to content

There was a “ValueError:array is too big...” when computing silhouette_samples of KMeans on large amounts of data #8878

Closed
@daydayup1

Description

@daydayup1

I run the codes as follow:
y = np.array(arr)
kmeans = KMeans(n_clusters=10, init='k-means++',random_state=None).fit(y)
sample_silhouette_values = silhouette_samples(y, kmeans.labels_)

The "arr" in first line was a two-dimensional array that had 270,000 rows and 30 columns.
And I got error as follow:

Traceback (most recent call last):
File "D:\eclipse-neon64\workSpace\cluster-sklearn\cluster\cluster-sklearn.py", line 41, in <module>
sample_silhouette_values = silhouette_samples(y, kmeans.labels_)
File "D:\python\lib\site-packages\sklearn\metrics\cluster\unsupervised.py", line 168, in silhouette_samples
distances = pairwise_distances(X, metric=metric, **kwds)
File "D:\python\lib\site-packages\sklearn\metrics\pairwise.py", line 1240, in pairwise_distances
return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
File "D:\python\lib\site-packages\sklearn\metrics\pairwise.py", line 1083, in _parallel_pairwise
return func(X, Y, **kwds)
File "D:\python\lib\site-packages\sklearn\metrics\pairwise.py", line 245, in euclidean_distances
distances = safe_sparse_dot(X, Y.T, dense_output=True)
File "D:\python\lib\site-packages\sklearn\utils\extmath.py", line 189, in safe_sparse_dot
return fast_dot(a, b)
ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

I don't think it is same as #4701 or #4197 ,but they have some similarity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions