KMeans processing n_init sequentially!!

Hi,

I was looking into KMeans code and found that the following can be parallelized. For example, each work in `for loop` can be processed independently. I expect this to reduce the runtime. Please check.

https://github.com/scikit-learn/scikit-learn/blob/84f8409dc5c485729649c5332e66fd5602549b50/sklearn/cluster/_kmeans.py#L1406-L1438

	for i in range(self._n_init):
	# Initialize centers
	centers_init = self._init_centroids(
	X, x_squared_norms=x_squared_norms, init=init, random_state=random_state
	)
	if self.verbose:
	print("Initialization complete")

	# run a k-means once
	labels, inertia, centers, n_iter_ = kmeans_single(
	X,
	sample_weight,
	centers_init,
	max_iter=self.max_iter,
	verbose=self.verbose,
	tol=self._tol,
	x_squared_norms=x_squared_norms,
	n_threads=self._n_threads,
	)

	# determine if these results are the best so far
	# we chose a new run if it has a better inertia and the clustering is
	# different from the best so far (it's possible that the inertia is
	# slightly better even if the clustering is the same with potentially
	# permuted labels, due to rounding errors)
	if best_inertia is None or (
	inertia < best_inertia
	and not _is_same_clustering(labels, best_labels, self.n_clusters)
	):
	best_labels = labels
	best_centers = centers
	best_inertia = inertia
	best_n_iter = n_iter_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

KMeans processing n_init sequentially!! #23366

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

KMeans processing n_init sequentially!! #23366

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions