Skip to content

Consistency in documentation #3791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nmayorov opened this issue Oct 21, 2014 · 13 comments
Closed

Consistency in documentation #3791

nmayorov opened this issue Oct 21, 2014 · 13 comments
Labels
Documentation Moderate Anything that requires some knowledge of conventions and best practices

Comments

@nmayorov
Copy link
Contributor

I think common things in documentation should be written in a consistent way. Right now I can think of the following points:

  1. Whether to name entities which can be transformed to ndarray as array or array-like.

  2. How to list different input options. Should curly braces always be used? For example:

    X : {array-like, sparse matrix}
    
  3. Optional arguments / default values. Some convention should be adopted, like:

    kernel : string, default ’rbf’
    

    or

    kernel : string (default='rbf')
    

    The word optional should be dropped anyway as redundant.

  4. How to name X and y arrays in fit and predict methods. The one adopted convention would be helpful.

  5. How to mention shapes of arrays. I was told that X, shape (n_samples, n_features) was adopted, is it true? How about 1-d arrays: (n_samples) or (n_samples,)?


Anyway, do you think it is an issue at all? Maybe core developers could decide and update coding guidelines? I listed only some points.

@GaelVaroquaux
Copy link
Member

  1. Whether to name entities which can be transformed to ndarray as array or
    array-like.

array-like and arrays are 2 different things. We should strive to support
as much as possible array-like everywhere.

For the other remarks, I agree, and I would refer to the numpy doc
standard, which is the one that we should be following.

@GaelVaroquaux
Copy link
Member

Also, I am closing this issue, as we have way too many open issues in are
tracker. It's not that I don't agree with it. Feel free to send in PRs
for the parts of the documentation that do not respect the numpy doc
standard.

@nmayorov
Copy link
Contributor Author

Thanks for pointing to numpy doc.

Point 4 is specific to scikit-learn, adding something about it to guidelines would be helpful.

@GaelVaroquaux
Copy link
Member

I am a bit confused with regards to point 4.

OK, I think that I am getting it: how should we be calling in plain English "X" and "y"? I must admit that I am not sure that there is a global agreement. We tend to call them "data" and "target". But the wording "input data" and "prediction target" would probably be more precise.

@nmayorov
Copy link
Contributor Author

Yes, that's what I meant.

In the current state X and y are called differently in every module.

@GaelVaroquaux
Copy link
Member

In the current state X and y are called differently in every module.

I agree that the question should be raised and documented. As it is more
a question that calls for general discussion amongst the developer, I
feel that we should rather use the mailing list, rather than an issue,
for this discussion.

My suggestion is "input data", and "prediction target".

@nmayorov
Copy link
Contributor Author

I leave it to you to initiate this discussion on mailing list then, when you think it's a right time.

@amueller
Copy link
Member

Answers are:

  1. ndarray if it's a numpy array, array-like for input that can be other things.
  2. use "or"
  3. kernel : string (default='rbf')
  4. (n_samples, n_features), (n_samples,)

@amueller
Copy link
Member

Can someone add that to the docs?

@raghavrv
Copy link
Member

Tagging as moderate as this involves editing quite a few docstrings... (And addressing #7758 along the way). I think @karandesai-96 maybe interested in this? :)

@kdexd
Copy link

kdexd commented Oct 28, 2016

@raghavrv I'll take this up. I was looking at a separate issue, @amueller gave me a direction for it. I'll do that if I am able to, this one comes next in the line 😄

@raghavrv
Copy link
Member

Thx!

ab-anssi pushed a commit to ab-anssi/scikit-learn that referenced this issue May 25, 2019
This commit concerns only the clustering algorithms that can take as input
affinity or distance matrices.

* fit and fit_predict have now the same descriptions for their input
arguments X and y.

* The commit follows the documentation conventions detailed in issue scikit-learn#3791.

* Precise the preferred sparse matrix format if relevant.

* AgglomerativeClustering. The documentation of the fit method now
clarifies that a distance matrix is expected (not an affinity /
similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945).
ab-anssi pushed a commit to ab-anssi/scikit-learn that referenced this issue May 28, 2019
This commit concerns only the clustering algorithms that can take as input
affinity or distance matrices.

* fit and fit_predict have now the same descriptions for their input
arguments X and y.

* The commit follows the documentation conventions detailed in issue scikit-learn#3791.

* Precise the preferred sparse matrix format if relevant.

* AgglomerativeClustering. The documentation of the fit method now
clarifies that a distance matrix is expected (not an affinity /
similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945).
amueller pushed a commit that referenced this issue May 30, 2019
* Fix inconsistencies in clustering doc.

This commit concerns only the clustering algorithms that can take as input
affinity or distance matrices.

* fit and fit_predict have now the same descriptions for their input
arguments X and y.

* The commit follows the documentation conventions detailed in issue #3791.

* Precise the preferred sparse matrix format if relevant.

* AgglomerativeClustering. The documentation of the fit method now
clarifies that a distance matrix is expected (not an affinity /
similarity matrix). The parameter affinity is misleading (see issue #13945).

* Answer partly to issue #13945.

Only the documentation is updated.

Updating the name of the ̀ affinity` parameter would involve an API
change, with a proper deprecation cycle.

* Wrong arguments for fit_predict of AffinityPropag.

No argument sample_weight.

* Wrong arguments for fit_predict of AgglomerativeCl

No argument sample_weight.

* Add a comma (review by jnohman)

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

* Rm modification (review by jnothman)

* Add commas.

* Update sklearn/cluster/affinity_propagation_.py

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

* Fix back quotes (review by jnothman)

* add commas

* fit_predict and predict return labels.

Description: "Cluster labels."

* Doc fit: returns "self".

* Same modifications for SpectralClustering.
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this issue Jul 12, 2019
* Fix inconsistencies in clustering doc.

This commit concerns only the clustering algorithms that can take as input
affinity or distance matrices.

* fit and fit_predict have now the same descriptions for their input
arguments X and y.

* The commit follows the documentation conventions detailed in issue scikit-learn#3791.

* Precise the preferred sparse matrix format if relevant.

* AgglomerativeClustering. The documentation of the fit method now
clarifies that a distance matrix is expected (not an affinity /
similarity matrix). The parameter affinity is misleading (see issue scikit-learn#13945).

* Answer partly to issue scikit-learn#13945.

Only the documentation is updated.

Updating the name of the ̀ affinity` parameter would involve an API
change, with a proper deprecation cycle.

* Wrong arguments for fit_predict of AffinityPropag.

No argument sample_weight.

* Wrong arguments for fit_predict of AgglomerativeCl

No argument sample_weight.

* Add a comma (review by jnohman)

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

* Rm modification (review by jnothman)

* Add commas.

* Update sklearn/cluster/affinity_propagation_.py

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

* Fix back quotes (review by jnothman)

* add commas

* fit_predict and predict return labels.

Description: "Cluster labels."

* Doc fit: returns "self".

* Same modifications for SpectralClustering.
@thomasjpfan
Copy link
Member

We now have a Guidelines for writing documentation that details how to write docstrings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Moderate Anything that requires some knowledge of conventions and best practices
Projects
None yet
Development

No branches or pull requests

6 participants