Skip to content

[MRG][DOC] Fix inconsistencies in clustering doc. #13946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 30, 2019

Conversation

ab-anssi
Copy link

@ab-anssi ab-anssi commented May 25, 2019

What does this implement/fix? Explain your changes.

This pull request concerns only the clustering algorithms that can take as input
affinity or distance matrices. It could be extended to the other clustering algorithms.

  • fit and fit_predict have now the same descriptions for their input
    arguments X and y.
  • The commit follows the documentation conventions detailed in issue Consistency in documentation #3791.
  • The documentation now indicates the preferred sparse matrix format if relevant.
  • AgglomerativeClustering. The parameter affinity is misleading (see issue [AgglomerativeClustering] confusing parameter 'affinity' #13945).
    The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix).

A few questions about documentation conventions

  • Does the documentation of the fit methods should indicate that self is returned ? In this case, what is the returned type ? object ?
  • How should be called the object returned by predict of clustering algorithms ? y or labels ? What bout the description ? "Cluster labels" or "Index of the cluster each sample belongs to." ?

@ab-anssi ab-anssi changed the title [DOC] Fix inconsistencies in clustering doc. [MRG][DOC] Fix inconsistencies in clustering doc. May 26, 2019
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm

@ab-anssi
Copy link
Author

Thanks @jnothman for your review.

What about the following questions about documentation conventions ?

  • Does the documentation of the fit methods should indicate that self is returned ? In this case, what is the returned type ? object ?
  • How should be called the object returned by predict of clustering algorithms ? y or labels ? What about the description ? "Cluster labels" or "Index of the cluster each sample belongs to." ?

Currently it is not consistent between AffinityPropagation, DBSCAN, and AgglemerativeClustering. Do you think it should be updated for more consistency or remain as it is ?

@jnothman
Copy link
Member

jnothman commented May 27, 2019 via email

@jnothman
Copy link
Member

jnothman commented May 27, 2019 via email

@ab-anssi
Copy link
Author

ab-anssi commented May 27, 2019

Thanks for your feedback. I have updated the documentation of DBSCAN, AgglomerativeClustering and AffinityPropagation accordingly. I have also updated SpectralClustering since it also takes as input a feature or an affinity matrix.

I have noticed that there are the same inconsistencies in the documentation of other clustering algorithms (fit and fit_predict do not share the same descriptions for their input arguments, preferred sparse matrix format not indicated, etc.).
Is it ok for you if I update the documentation of other clustering algorithms (that take only feature matrix as input) in this pull request ?

Anaël Beaugnon and others added 13 commits May 28, 2019 11:48
This commit concerns only the clustering algorithms that can take as input
affinity or distance matrices.

* fit and fit_predict have now the same descriptions for their input
arguments X and y.

* The commit follows the documentation conventions detailed in issue scikit-learn#3791.

* Precise the preferred sparse matrix format if relevant.

* AgglomerativeClustering. The documentation of the fit method now
clarifies that a distance matrix is expected (not an affinity /
similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945).
Only the documentation is updated.

Updating the name of the ̀ affinity` parameter would involve an API
change, with a proper deprecation cycle.
Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
Description: "Cluster labels."
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm

@amueller amueller merged commit 896a76e into scikit-learn:master May 30, 2019
@amueller
Copy link
Member

thanks!

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
* Fix inconsistencies in clustering doc.

This commit concerns only the clustering algorithms that can take as input
affinity or distance matrices.

* fit and fit_predict have now the same descriptions for their input
arguments X and y.

* The commit follows the documentation conventions detailed in issue scikit-learn#3791.

* Precise the preferred sparse matrix format if relevant.

* AgglomerativeClustering. The documentation of the fit method now
clarifies that a distance matrix is expected (not an affinity /
similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945).

* Answer partly to issue scikit-learn#13945.

Only the documentation is updated.

Updating the name of the ̀ affinity` parameter would involve an API
change, with a proper deprecation cycle.

* Wrong arguments for fit_predict of AffinityPropag.

No argument sample_weight.

* Wrong arguments for fit_predict of AgglomerativeCl

No argument sample_weight.

* Add a comma (review by jnohman)

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

* Rm modification (review by jnothman)

* Add commas.

* Update sklearn/cluster/affinity_propagation_.py

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

* Fix back quotes (review by jnothman)

* add commas

* fit_predict and predict return labels.

Description: "Cluster labels."

* Doc fit: returns "self".

* Same modifications for SpectralClustering.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants