Skip to content

DOC link to example explaining init usage in KMeans #26981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 12, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions sklearn/cluster/_kmeans.py
Original file line number Diff line number Diff line change
Expand Up @@ -1224,22 +1224,25 @@ class KMeans(_BaseKMeans):
(n_clusters, n_features), default='k-means++'
Method for initialization:

'k-means++' : selects initial cluster centroids using sampling based on
an empirical probability distribution of the points' contribution to the
overall inertia. This technique speeds up convergence. The algorithm
implemented is "greedy k-means++". It differs from the vanilla k-means++
by making several trials at each sampling step and choosing the best centroid
among them.
* 'k-means++' : selects initial cluster centroids using sampling \
based on an empirical probability distribution of the points' \
contribution to the overall inertia. This technique speeds up \
convergence. The algorithm implemented is "greedy k-means++". It \
differs from the vanilla k-means++ by making several trials at \
each sampling step and choosing the best centroid among them.

'random': choose `n_clusters` observations (rows) at random from data
for the initial centroids.
* 'random': choose `n_clusters` observations (rows) at random from \
data for the initial centroids.

If an array is passed, it should be of shape (n_clusters, n_features)
* If an array is passed, it should be of shape (n_clusters, n_features)\
and gives the initial centers.

If a callable is passed, it should take arguments X, n_clusters and a
* If a callable is passed, it should take arguments X, n_clusters and a\
random state and return an initialization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I would like to use item for the different options above (using -). It will make it obvious which part of the discussion is about the options for init and which part is to go further (the new link to the example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Guillaume, thank you for reviewing😄
What do you mean by 'using -'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by 'using -'?

Transforming the sentence into a list, e.g.

This, that, and everything

vs

- This
- That
- Everything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the original suggestion correctly, I believe it was more intended to separate "If an array is passed..." and "If a callable is passed..." as separate list entries, as opposed to the inputs of the callable that is now listed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your reply!
I modified it.

Copy link
Contributor

@Micky774 Micky774 Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines underneath a list symbol * that contain text to be included in the body of the list entry need to be indented with two spaces (I think. May be four spaces?). The formatting is a bit tricky sometimes. I still struggle with it myself😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
I fixed the indentation 😊

For an example of how to use the different `init` strategy, see the example
entitled :ref:`sphx_glr_auto_examples_cluster_plot_kmeans_digits.py`.

n_init : 'auto' or int, default=10
Number of times the k-means algorithm is run with different centroid
seeds. The final results is the best output of `n_init` consecutive runs
Expand Down