Skip to content

Commit a1c3d87

Browse files
gpablo6jmloyolaglemaitre
committed
DOC Ensures that sklearn.cluster._kmeans.k_means passes numpydoc validation (#21423)
Co-authored-by: Juan Martin Loyola <jmloyola@outlook.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
1 parent c2700f5 commit a1c3d87

File tree

2 files changed

+20
-24
lines changed

2 files changed

+20
-24
lines changed

maint_tools/test_docstrings.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@
2222
"sklearn._config.get_config",
2323
"sklearn.base.clone",
2424
"sklearn.cluster._affinity_propagation.affinity_propagation",
25-
"sklearn.cluster._kmeans.k_means",
2625
"sklearn.cluster._kmeans.kmeans_plusplus",
2726
"sklearn.cluster._mean_shift.estimate_bandwidth",
2827
"sklearn.cluster._mean_shift.get_bin_seeds",

sklearn/cluster/_kmeans.py

Lines changed: 20 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ def k_means(
269269
algorithm="auto",
270270
return_n_iter=False,
271271
):
272-
"""K-means clustering algorithm.
272+
"""Perform K-means clustering algorithm.
273273
274274
Read more in the :ref:`User Guide <k_means>`.
275275
@@ -285,30 +285,27 @@ def k_means(
285285
centroids to generate.
286286
287287
sample_weight : array-like of shape (n_samples,), default=None
288-
The weights for each observation in X. If None, all observations
288+
The weights for each observation in `X`. If `None`, all observations
289289
are assigned equal weight.
290290
291291
init : {'k-means++', 'random'}, callable or array-like of shape \
292292
(n_clusters, n_features), default='k-means++'
293293
Method for initialization:
294294
295-
'k-means++' : selects initial cluster centers for k-mean
296-
clustering in a smart way to speed up convergence. See section
297-
Notes in k_init for more details.
298-
299-
'random': choose `n_clusters` observations (rows) at random from data
300-
for the initial centroids.
301-
302-
If an array is passed, it should be of shape (n_clusters, n_features)
303-
and gives the initial centers.
304-
305-
If a callable is passed, it should take arguments X, n_clusters and a
306-
random state and return an initialization.
295+
- `'k-means++'` : selects initial cluster centers for k-mean
296+
clustering in a smart way to speed up convergence. See section
297+
Notes in k_init for more details.
298+
- `'random'`: choose `n_clusters` observations (rows) at random from data
299+
for the initial centroids.
300+
- If an array is passed, it should be of shape `(n_clusters, n_features)`
301+
and gives the initial centers.
302+
- If a callable is passed, it should take arguments `X`, `n_clusters` and a
303+
random state and return an initialization.
307304
308305
n_init : int, default=10
309306
Number of time the k-means algorithm will be run with different
310307
centroid seeds. The final results will be the best output of
311-
n_init consecutive runs in terms of inertia.
308+
`n_init` consecutive runs in terms of inertia.
312309
313310
max_iter : int, default=300
314311
Maximum number of iterations of the k-means algorithm to run.
@@ -328,22 +325,22 @@ def k_means(
328325
329326
copy_x : bool, default=True
330327
When pre-computing distances it is more numerically accurate to center
331-
the data first. If copy_x is True (default), then the original data is
328+
the data first. If `copy_x` is True (default), then the original data is
332329
not modified. If False, the original data is modified, and put back
333330
before the function returns, but small numerical differences may be
334331
introduced by subtracting and then adding the data mean. Note that if
335332
the original data is not C-contiguous, a copy will be made even if
336-
copy_x is False. If the original data is sparse, but not in CSR format,
337-
a copy will be made even if copy_x is False.
333+
`copy_x` is False. If the original data is sparse, but not in CSR format,
334+
a copy will be made even if `copy_x` is False.
338335
339336
algorithm : {"auto", "full", "elkan"}, default="auto"
340-
K-means algorithm to use. The classical EM-style algorithm is "full".
341-
The "elkan" variation is more efficient on data with well-defined
337+
K-means algorithm to use. The classical EM-style algorithm is `"full"`.
338+
The `"elkan"` variation is more efficient on data with well-defined
342339
clusters, by using the triangle inequality. However it's more memory
343340
intensive due to the allocation of an extra array of shape
344-
(n_samples, n_clusters).
341+
`(n_samples, n_clusters)`.
345342
346-
For now "auto" (kept for backward compatibility) chooses "elkan" but it
343+
For now `"auto"` (kept for backward compatibility) chooses `"elkan"` but it
347344
might change in the future for a better heuristic.
348345
349346
return_n_iter : bool, default=False
@@ -355,7 +352,7 @@ def k_means(
355352
Centroids found at the last iteration of k-means.
356353
357354
label : ndarray of shape (n_samples,)
358-
label[i] is the code or index of the centroid the
355+
The `label[i]` is the code or index of the centroid the
359356
i'th observation is closest to.
360357
361358
inertia : float

0 commit comments

Comments
 (0)