Skip to content

DOC Linked examples for clustering algorithms in their docstrings (#26927) #30127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 16, 2025

Conversation

SuccessMoses
Copy link
Contributor

Reference Issues/PRs

Adds links to examples/cluster mentioned in #26927

What does this implement/fix? Explain your changes.

Adds links to auto-generated examples for classes AffinityPropagation, SpectralClustering, DBSCAN, HDBSCAN and OPTICS. The example shows comparison between different clustering methods.

Any other comments?

Examples linked:

  • examples/cluster
    plot_cluster_comparison.py

Copy link

github-actions bot commented Oct 21, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 2a34caa. Link to the linter CI: here

@marenwestermann marenwestermann self-requested a review October 23, 2024 07:58
Copy link
Member

@marenwestermann marenwestermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SuccessMoses! Thank you for your PR.

Could you move the references to the end of the respective "Examples" sections in the docstrings of the estimators? (We haven't been consistent with placing the references in the past and are trying to be more consistent now. 🙂 )

Could you also reference this example in the same way in the docstrings of
MeanShift, MiniBatchKMeans, AgglomerativeClustering, AgglomerativeClustering, Birch and GaussianMixture?

@SuccessMoses
Copy link
Contributor Author

I am working on it

@marenwestermann
Copy link
Member

Could you also make a new rubric "Examples" at the end of the section here and link the example there (compare other PRs if needed).

@SuccessMoses
Copy link
Contributor Author

@marenwestermann

Copy link
Member

@marenwestermann marenwestermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small comments. Otherwise it looks good.

.. rubric:: Examples

* :ref:`sphx_glr_auto_examples_cluster_plot_cluster_comparison.py`: Shows the
characteristics of different clustering algorithms of 2D datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick: could you replace "of 2D datasets" with "on 2D datasets"?

@@ -483,6 +483,9 @@ class Birch(
Birch(n_clusters=None)
>>> brc.predict(X)
array([0, 0, 0, 1, 1, 1])

For a comparison of BIRCH clustering algorithm with other clustering algorithms, see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nitpick: could you write "of the BIRCH .."

@@ -140,6 +140,11 @@ model with equal covariance per component.
:term:`inductive` clustering methods) are not designed to be applied to new,
unseen data.

.. rubric:: Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali are you ok with having the example linked here? As noted before it's not obvious that it's possible to click on the images and access examples that way. Also, we haven't been consistent with linking examples in the user guide. There are quite a few places in the user guide where a link to an example is listed under the rubric "Examples" even though you can access the respective example through clicking on the image in that section (such as here: https://github.com/scikit-learn/scikit-learn/blob/main/doc/modules/clustering.rst?plain=1#L306)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair question. I'm not sure what I think about the question in general, but in this particular case, I think this is kinda repeating the same information right after where it's mentioned, and hence unnecessary.

However, your point on the images not being clearly clickable is valid, and we can probably improve the image caption to mention they can be clicked?

I'm not sure, maybe @Charlie-XIAO or @glemaitre have a better idea of how to improve this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think the link is redundant, there's a nice description with a hyperlink above here.

However, it's a good point that it's not clear the the user can click on them. I wonder if @Charlie-XIAO would have an idea of how we could improve that UX.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SuccessMoses! Could you remove this link including the "Examples" rubric you made? We're then ready to merge. I'll open an issue for discussing how to best improve the UX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@@ -140,6 +140,11 @@ model with equal covariance per component.
:term:`inductive` clustering methods) are not designed to be applied to new,
unseen data.

.. rubric:: Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair question. I'm not sure what I think about the question in general, but in this particular case, I think this is kinda repeating the same information right after where it's mentioned, and hence unnecessary.

However, your point on the images not being clearly clickable is valid, and we can probably improve the image caption to mention they can be clicked?

I'm not sure, maybe @Charlie-XIAO or @glemaitre have a better idea of how to improve this.

@@ -140,6 +140,11 @@ model with equal covariance per component.
:term:`inductive` clustering methods) are not designed to be applied to new,
unseen data.

.. rubric:: Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think the link is redundant, there's a nice description with a hyperlink above here.

However, it's a good point that it's not clear the the user can click on them. I wonder if @Charlie-XIAO would have an idea of how we could improve that UX.

@StefanieSenger
Copy link
Contributor

StefanieSenger commented Jan 10, 2025

Hi @SuccessMoses, I have just seen that meanwhile a few PRs have been merged that are touching on what you are doing here:

As you go forward with this PR, please consider to align this.

@marenwestermann
Copy link
Member

The best way to align with the other PRs is to pull the changes from upstream main into your feature branch like so:

git fetch upstream
git merge upstream/main

@marenwestermann marenwestermann merged commit 7e861bc into scikit-learn:main Feb 16, 2025
33 checks passed
@marenwestermann
Copy link
Member

marenwestermann commented Feb 16, 2025

I finished this PR because it was blocking another PR from getting merged. Thank you for your contribution @SuccessMoses!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants