-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Consistency in documentation #3791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
array-like and arrays are 2 different things. We should strive to support For the other remarks, I agree, and I would refer to the numpy doc |
Also, I am closing this issue, as we have way too many open issues in are |
Thanks for pointing to numpy doc. Point 4 is specific to scikit-learn, adding something about it to guidelines would be helpful. |
I am a bit confused with regards to point 4. OK, I think that I am getting it: how should we be calling in plain English "X" and "y"? I must admit that I am not sure that there is a global agreement. We tend to call them "data" and "target". But the wording "input data" and "prediction target" would probably be more precise. |
Yes, that's what I meant. In the current state |
I agree that the question should be raised and documented. As it is more My suggestion is "input data", and "prediction target". |
I leave it to you to initiate this discussion on mailing list then, when you think it's a right time. |
Answers are:
|
Can someone add that to the docs? |
Tagging as moderate as this involves editing quite a few docstrings... (And addressing #7758 along the way). I think @karandesai-96 maybe interested in this? :) |
Thx! |
This commit concerns only the clustering algorithms that can take as input affinity or distance matrices. * fit and fit_predict have now the same descriptions for their input arguments X and y. * The commit follows the documentation conventions detailed in issue scikit-learn#3791. * Precise the preferred sparse matrix format if relevant. * AgglomerativeClustering. The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945).
This commit concerns only the clustering algorithms that can take as input affinity or distance matrices. * fit and fit_predict have now the same descriptions for their input arguments X and y. * The commit follows the documentation conventions detailed in issue scikit-learn#3791. * Precise the preferred sparse matrix format if relevant. * AgglomerativeClustering. The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945).
* Fix inconsistencies in clustering doc. This commit concerns only the clustering algorithms that can take as input affinity or distance matrices. * fit and fit_predict have now the same descriptions for their input arguments X and y. * The commit follows the documentation conventions detailed in issue #3791. * Precise the preferred sparse matrix format if relevant. * AgglomerativeClustering. The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix). The parameter affinity is misleading (see issue #13945). * Answer partly to issue #13945. Only the documentation is updated. Updating the name of the ̀ affinity` parameter would involve an API change, with a proper deprecation cycle. * Wrong arguments for fit_predict of AffinityPropag. No argument sample_weight. * Wrong arguments for fit_predict of AgglomerativeCl No argument sample_weight. * Add a comma (review by jnohman) Co-Authored-By: Joel Nothman <joel.nothman@gmail.com> * Rm modification (review by jnothman) * Add commas. * Update sklearn/cluster/affinity_propagation_.py Co-Authored-By: Joel Nothman <joel.nothman@gmail.com> * Fix back quotes (review by jnothman) * add commas * fit_predict and predict return labels. Description: "Cluster labels." * Doc fit: returns "self". * Same modifications for SpectralClustering.
* Fix inconsistencies in clustering doc. This commit concerns only the clustering algorithms that can take as input affinity or distance matrices. * fit and fit_predict have now the same descriptions for their input arguments X and y. * The commit follows the documentation conventions detailed in issue scikit-learn#3791. * Precise the preferred sparse matrix format if relevant. * AgglomerativeClustering. The documentation of the fit method now clarifies that a distance matrix is expected (not an affinity / similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945). * Answer partly to issue scikit-learn#13945. Only the documentation is updated. Updating the name of the ̀ affinity` parameter would involve an API change, with a proper deprecation cycle. * Wrong arguments for fit_predict of AffinityPropag. No argument sample_weight. * Wrong arguments for fit_predict of AgglomerativeCl No argument sample_weight. * Add a comma (review by jnohman) Co-Authored-By: Joel Nothman <joel.nothman@gmail.com> * Rm modification (review by jnothman) * Add commas. * Update sklearn/cluster/affinity_propagation_.py Co-Authored-By: Joel Nothman <joel.nothman@gmail.com> * Fix back quotes (review by jnothman) * add commas * fit_predict and predict return labels. Description: "Cluster labels." * Doc fit: returns "self". * Same modifications for SpectralClustering.
We now have a Guidelines for writing documentation that details how to write docstrings. |
I think common things in documentation should be written in a consistent way. Right now I can think of the following points:
Whether to name entities which can be transformed to
ndarray
asarray
orarray-like
.How to list different input options. Should curly braces always be used? For example:
Optional arguments / default values. Some convention should be adopted, like:
or
The word
optional
should be dropped anyway as redundant.How to name
X
andy
arrays infit
andpredict
methods. The one adopted convention would be helpful.How to mention shapes of arrays. I was told that
X, shape (n_samples, n_features)
was adopted, is it true? How about 1-d arrays:(n_samples)
or(n_samples,)
?Anyway, do you think it is an issue at all? Maybe core developers could decide and update coding guidelines? I listed only some points.
The text was updated successfully, but these errors were encountered: