-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG][DOC] Fix inconsistencies in clustering doc. #13946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise lgtm
Thanks @jnothman for your review. What about the following questions about documentation conventions ?
Currently it is not consistent between AffinityPropagation, DBSCAN, and AgglemerativeClustering. Do you think it should be updated for more consistency or remain as it is ? |
I'm okay with it being updated for consistency, but am fairly ambivalent to
which choice. I suppose "labels" is appropriate. "Index of the cluster"
doesn't make sense unless you have something to index into. That makes
sense for kmeans, but not dbscan where the labelling is arbitrary except
when it is -1. This description should probably be the same as that of
labels_.
And fit's return type should be the name of the current class ...
|
But often for fit's return value, we just write "self". No type needed
|
Thanks for your feedback. I have updated the documentation of I have noticed that there are the same inconsistencies in the documentation of other clustering algorithms ( |
This commit concerns only the clustering algorithms that can take as input affinity or distance matrices. * fit and fit_predict have now the same descriptions for their input arguments X and y. * The commit follows the documentation conventions detailed in issue scikit-learn#3791. * Precise the preferred sparse matrix format if relevant. * AgglomerativeClustering. The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945).
Only the documentation is updated. Updating the name of the ̀ affinity` parameter would involve an API change, with a proper deprecation cycle.
No argument sample_weight.
No argument sample_weight.
Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
Description: "Cluster labels."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lgtm
thanks! |
* Fix inconsistencies in clustering doc. This commit concerns only the clustering algorithms that can take as input affinity or distance matrices. * fit and fit_predict have now the same descriptions for their input arguments X and y. * The commit follows the documentation conventions detailed in issue scikit-learn#3791. * Precise the preferred sparse matrix format if relevant. * AgglomerativeClustering. The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945). * Answer partly to issue scikit-learn#13945. Only the documentation is updated. Updating the name of the ̀ affinity` parameter would involve an API change, with a proper deprecation cycle. * Wrong arguments for fit_predict of AffinityPropag. No argument sample_weight. * Wrong arguments for fit_predict of AgglomerativeCl No argument sample_weight. * Add a comma (review by jnohman) Co-Authored-By: Joel Nothman <joel.nothman@gmail.com> * Rm modification (review by jnothman) * Add commas. * Update sklearn/cluster/affinity_propagation_.py Co-Authored-By: Joel Nothman <joel.nothman@gmail.com> * Fix back quotes (review by jnothman) * add commas * fit_predict and predict return labels. Description: "Cluster labels." * Doc fit: returns "self". * Same modifications for SpectralClustering.
What does this implement/fix? Explain your changes.
This pull request concerns only the clustering algorithms that can take as input
affinity or distance matrices. It could be extended to the other clustering algorithms.
arguments X and y.
AgglomerativeClustering
. The parameter affinity is misleading (see issue [AgglomerativeClustering] confusing parameter 'affinity' #13945).The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix).
A few questions about documentation conventions
fit
methods should indicate thatself
is returned ? In this case, what is the returned type ?object
?predict
of clustering algorithms ?y
orlabels
? What bout the description ? "Cluster labels" or "Index of the cluster each sample belongs to." ?