Skip to content

add example of a good docstring for defaults and examples #12356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 7, 2019
27 changes: 27 additions & 0 deletions doc/developers/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,33 @@ Finally, follow the formatting rules below to make it consistently good:
SelectKBest : Select features based on the k highest scores.
SelectFpr : Select features based on a false positive rate test.

* When documenting the parameters and attributes, here is a list of some
well-formatted examples::

n_clusters : int, default=3
The number of clusters detected by the algorithm.

some_param : {'hello', 'goodbye'}, bool or int, default=True
The parameter description goes here, which can be either a string
literal (either `hello` or `goodbye`), a bool, or an int. The default
value is True.

array_parameter : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features) or (n_samples,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhh did we have a discussion of "of shape" vs "shape=". Did someone count that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"of shape": 513 lines; "shape=" 338 lines; "shape =" 1316 lines. "of shape =": 162 lines, "of shape=" 1 line.

These are regex matches, so non-exclusive, so "of shape" includes "of shape =".
So we have 350 "of shape (...)" and 1115 "shape = (..)"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes shape= would result in a smaller diff. Since this PR will result in many docstring changes, we should decide based on which is subjectively better.

(I have been +0.25 for shape= because it saves a few more characters.) Do you have a opinion on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is shape= but I don't mind that much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are often short on space in that line.

array-like of shape (n_samples,) vs array-like, shape=(n_samples,) saves 2 characters. Not sure if that's a good argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can fix those cases that don't comply with this with a sed, so I don't think the number of instances of each case are important that much anyway. I kinda liked not having the comma cause the comma is separating other things:

int, array-like of shape (n_samples,), or None, default=None (note that we don't have the oxford comma in our guide now, but I rather have it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like not having the comma :P

I prefer of shape if there is no comma. I find it more difficult to visually parse it.
I also find it closer to literal English --- even if we are writing documentation for expert computer scientists (I should not troll about Matlab post).

Anyway, I am going with the consensus (since it is merged it seems we are going for that one?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine with keeping it.
@adrinjalali feel free to go ahead and do the sed and then fix all the merge conflicts :P
If whis is the way we want to go, we should probably do the sed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are we doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, I'll submit a PR.

This parameter accepts data in either of the mentioned forms, with one
of the mentioned shapes. The default value is
`np.ones(shape=(n_samples,))`.

In general have the following in mind:

1. Use Python basic types. (``bool`` instead of ``boolean``)
2. Use parenthesis for defining shapes: ``array-like of shape (n_samples,)``
or ``array-like of shape (n_samples, n_features)``
3. For strings with multiple options, use brackets:
``input: {'log', 'squared', 'multinomial'}``
4. 1D or 2D data can be a subset of
``{array-like, ndarray, sparse matrix, dataframe}``. Note that ``array-like``
can also be a ``list``, while ``ndarray`` is explicitly only a ``numpy.ndarray``.

* For unwritten formatting rules, try to follow existing good works:

* For "References" in docstrings, see the Silhouette Coefficient
Expand Down