-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Add links to KMeans examples in docstrings and the user guide #27799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Add links to KMeans examples in docstrings and the user guide #27799
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @marenwestermann! Here is a batch of comments :)
doc/modules/clustering.rst
Outdated
@@ -218,7 +222,9 @@ initializations of the centroids. One method to help address this issue is the | |||
k-means++ initialization scheme, which has been implemented in scikit-learn | |||
(use the ``init='k-means++'`` parameter). This initializes the centroids to be | |||
(generally) distant from each other, leading to probably better results than | |||
random initialization, as shown in the reference. | |||
random initialization, as shown in the reference. For a detailed example of | |||
comaparing different initialization schemes refer to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comaparing different initialization schemes refer to | |
comparing different initialization schemes, refer to |
doc/modules/clustering.rst
Outdated
@@ -231,7 +237,17 @@ weight of 2 to a sample is equivalent to adding a duplicate of that sample | |||
to the dataset :math:`X`. | |||
|
|||
K-means can be used for vector quantization. This is achieved using the | |||
transform method of a trained model of :class:`KMeans`. | |||
transform method of a trained model of :class:`KMeans`. For an example of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transform method of a trained model of :class:`KMeans`. For an example of | |
`transform` method of a trained model of :class:`KMeans`. For an example of |
doc/modules/clustering.rst
Outdated
using the iris dataset | ||
|
||
* :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`: Document clustering | ||
using KMeans and MiniBatchKMeans based on sparse data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using KMeans and MiniBatchKMeans based on sparse data | |
using :class:`KMeans` and :class:`MiniBatchKMeans` based on sparse data |
doc/modules/clustering.rst
Outdated
|
||
.. topic:: Examples: | ||
|
||
* :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of K-Means |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of K-Means | |
* :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of :class:`KMeans` |
doc/modules/clustering.rst
Outdated
* :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of KMeans and | ||
MiniBatchKMeans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of KMeans and | |
MiniBatchKMeans | |
* :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of | |
:class:`KMeans` and :class:`MiniBatchKMeans` |
doc/modules/clustering.rst
Outdated
* :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`: Document clustering using sparse | ||
MiniBatchKMeans | ||
* :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`: Document clustering | ||
using KMeans and MiniBatchKMeans based on sparse data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using KMeans and MiniBatchKMeans based on sparse data | |
using :class:`KMeans` and :class:`MiniBatchKMeans` based on sparse data |
- top right: What the effect of a bad initialization is | ||
- top right: What using three clusters would deliver. | ||
|
||
- bottom left: What the effect of a bad initialization is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can be done in another PR, but currently it seems that the initialization is good. I would rather pass a fixed random_state
to KMeans
instead of setting a global np.random.seed
# using the model results itself. In that case, the :ref:`Silhouette Coefficient | ||
# <sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py>` comes in handy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather say something similar to
"In that case the Silhouette analysis comes in handy. See sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py
for an example on how to do it."
@@ -41,7 +41,7 @@ | |||
china = load_sample_image("china.jpg") | |||
|
|||
# Convert to floats instead of the default 8 bits integer coding. Dividing by | |||
# 255 is important so that plt.imshow behaves works well on float data (need to | |||
# 255 is important so that plt.imshow works well on float data (need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it does LGTM, thanks @marenwestermann and sorry for taking so long to answer! (I was/still am off on holidays)
Reference Issues/PRs
towards #26927
What does this implement/fix? Explain your changes.
Adds links to examples in the docstrings and the user guide which demonstrate how to use K-Means.
Any other comments?
I started with the example
plot_cluster_iris.py
and then realised that it probably makes sense to group all the links related to K-Means examples in one PR. So I will keep working on adding links to examples which show how to use K-Means.Edit: the examples are
Note: there can be more than one PR per example script because they might be referenced in different locations. For example there is an existing open PR for plot_document_clustering.py which links this example in the docs of a other estimator.