[MRG][DOC] Fix inconsistencies in clustering doc. #13946

ab-anssi · 2019-05-25T16:56:03Z

What does this implement/fix? Explain your changes.

This pull request concerns only the clustering algorithms that can take as input
affinity or distance matrices. It could be extended to the other clustering algorithms.

fit and fit_predict have now the same descriptions for their input
arguments X and y.
The commit follows the documentation conventions detailed in issue Consistency in documentation #3791.
The documentation now indicates the preferred sparse matrix format if relevant.
AgglomerativeClustering. The parameter affinity is misleading (see issue [AgglomerativeClustering] confusing parameter 'affinity' #13945).
The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix).

A few questions about documentation conventions

Does the documentation of the fit methods should indicate that self is returned ? In this case, what is the returned type ? object ?
How should be called the object returned by predict of clustering algorithms ? y or labels ? What bout the description ? "Cluster labels" or "Index of the cluster each sample belongs to." ?

sklearn/cluster/hierarchical.py

sklearn/cluster/affinity_propagation_.py

jnothman

Otherwise lgtm

ab-anssi · 2019-05-27T10:40:10Z

Thanks @jnothman for your review.

What about the following questions about documentation conventions ?

Does the documentation of the fit methods should indicate that self is returned ? In this case, what is the returned type ? object ?
How should be called the object returned by predict of clustering algorithms ? y or labels ? What about the description ? "Cluster labels" or "Index of the cluster each sample belongs to." ?

Currently it is not consistent between AffinityPropagation, DBSCAN, and AgglemerativeClustering. Do you think it should be updated for more consistency or remain as it is ?

jnothman · 2019-05-27T12:23:26Z

I'm okay with it being updated for consistency, but am fairly ambivalent to which choice. I suppose "labels" is appropriate. "Index of the cluster" doesn't make sense unless you have something to index into. That makes sense for kmeans, but not dbscan where the labelling is arbitrary except when it is -1. This description should probably be the same as that of labels_. And fit's return type should be the name of the current class ...

jnothman · 2019-05-27T12:24:00Z

But often for fit's return value, we just write "self". No type needed

ab-anssi · 2019-05-27T20:42:38Z

Thanks for your feedback. I have updated the documentation of DBSCAN, AgglomerativeClustering and AffinityPropagation accordingly. I have also updated SpectralClustering since it also takes as input a feature or an affinity matrix.

I have noticed that there are the same inconsistencies in the documentation of other clustering algorithms (fit and fit_predict do not share the same descriptions for their input arguments, preferred sparse matrix format not indicated, etc.).
Is it ok for you if I update the documentation of other clustering algorithms (that take only feature matrix as input) in this pull request ?

This commit concerns only the clustering algorithms that can take as input affinity or distance matrices. * fit and fit_predict have now the same descriptions for their input arguments X and y. * The commit follows the documentation conventions detailed in issue scikit-learn#3791. * Precise the preferred sparse matrix format if relevant. * AgglomerativeClustering. The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945).

Only the documentation is updated. Updating the name of the ̀ affinity` parameter would involve an API change, with a proper deprecation cycle.

No argument sample_weight.

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

Description: "Cluster labels."

jnothman

This lgtm

amueller · 2019-05-30T17:02:47Z

thanks!

* Fix inconsistencies in clustering doc. This commit concerns only the clustering algorithms that can take as input affinity or distance matrices. * fit and fit_predict have now the same descriptions for their input arguments X and y. * The commit follows the documentation conventions detailed in issue scikit-learn#3791. * Precise the preferred sparse matrix format if relevant. * AgglomerativeClustering. The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945). * Answer partly to issue scikit-learn#13945. Only the documentation is updated. Updating the name of the ̀ affinity` parameter would involve an API change, with a proper deprecation cycle. * Wrong arguments for fit_predict of AffinityPropag. No argument sample_weight. * Wrong arguments for fit_predict of AgglomerativeCl No argument sample_weight. * Add a comma (review by jnohman) Co-Authored-By: Joel Nothman <joel.nothman@gmail.com> * Rm modification (review by jnothman) * Add commas. * Update sklearn/cluster/affinity_propagation_.py Co-Authored-By: Joel Nothman <joel.nothman@gmail.com> * Fix back quotes (review by jnothman) * add commas * fit_predict and predict return labels. Description: "Cluster labels." * Doc fit: returns "self". * Same modifications for SpectralClustering.

ab-anssi force-pushed the doc_clustering branch from 73d5f3f to d02edbc Compare May 26, 2019 04:18

ab-anssi changed the title ~~[DOC] Fix inconsistencies in clustering doc.~~ [MRG][DOC] Fix inconsistencies in clustering doc. May 26, 2019

jnothman reviewed May 26, 2019

View reviewed changes

sklearn/cluster/hierarchical.py Outdated Show resolved Hide resolved

sklearn/cluster/hierarchical.py Outdated Show resolved Hide resolved

jnothman reviewed May 27, 2019

View reviewed changes

sklearn/cluster/affinity_propagation_.py Outdated Show resolved Hide resolved

jnothman reviewed May 27, 2019

View reviewed changes

Anaël Beaugnon and others added 13 commits May 28, 2019 11:48

Answer partly to issue scikit-learn#13945.

a43bef0

Only the documentation is updated. Updating the name of the ̀ affinity` parameter would involve an API change, with a proper deprecation cycle.

Wrong arguments for fit_predict of AffinityPropag.

c03d447

No argument sample_weight.

Wrong arguments for fit_predict of AgglomerativeCl

96d8b72

No argument sample_weight.

Add a comma (review by jnohman)

9661b28

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

Rm modification (review by jnothman)

e0b334c

Add commas.

b4c3ae7

Update sklearn/cluster/affinity_propagation_.py

95184a5

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

Fix back quotes (review by jnothman)

51fab7a

add commas

51a9c42

fit_predict and predict return labels.

607b716

Description: "Cluster labels."

Doc fit: returns "self".

598af66

Same modifications for SpectralClustering.

ca2420a

ab-anssi force-pushed the doc_clustering branch from 1af7eb4 to ca2420a Compare May 28, 2019 09:59

jnothman approved these changes May 30, 2019

View reviewed changes

amueller merged commit 896a76e into scikit-learn:master May 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG][DOC] Fix inconsistencies in clustering doc. #13946

[MRG][DOC] Fix inconsistencies in clustering doc. #13946

Uh oh!

ab-anssi commented May 25, 2019 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman left a comment

Uh oh!

ab-anssi commented May 27, 2019

Uh oh!

jnothman commented May 27, 2019 via email

Uh oh!

jnothman commented May 27, 2019 via email

Uh oh!

ab-anssi commented May 27, 2019 •

edited

Loading

Uh oh!

jnothman left a comment

Uh oh!

amueller commented May 30, 2019

Uh oh!

Uh oh!

Uh oh!

[MRG][DOC] Fix inconsistencies in clustering doc. #13946

[MRG][DOC] Fix inconsistencies in clustering doc. #13946

Uh oh!

Conversation

ab-anssi commented May 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this implement/fix? Explain your changes.

A few questions about documentation conventions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

ab-anssi commented May 27, 2019

Uh oh!

jnothman commented May 27, 2019 via email

Uh oh!

jnothman commented May 27, 2019 via email

Uh oh!

ab-anssi commented May 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

amueller commented May 30, 2019

Uh oh!

Uh oh!

ab-anssi commented May 25, 2019 •

edited

Loading

ab-anssi commented May 27, 2019 •

edited

Loading