Skip to content

DOC Update docs guideline regarding docstring formatting #18243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 8, 2021

Conversation

glemaitre
Copy link
Member

We start to use the following convention and we should mention it in the documentation guideline:

multioutput_array : ndarray of (n_samples, n_classes) or list of thereof

of thereof refer to a container of the previous type.

Copy link
Member

@alfaro96 alfaro96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to see this across the documentation!

@@ -734,7 +736,8 @@ In general have the following in mind:
5. Specify ``dataframe`` when "frame-like" features are being used, such
as the column names.
6. When specifying the data type of a list, use ``of`` as a delimiter:
``list of int``.
``list of int``. If the data type was also mentioned as possible output,
you can use ``list of thereof`` to be more concise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to easily parse "list of thereof". What is thereof referencing to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int or list of thereof -> int or list of int
ndarray of shape (n_samples, n_features), dtype=np.int64 or list of thereof -> ndarray of shape (n_samples, n_features), dtype=np.int64 or list of ndarray of shape (n_samples, n_features), dtype=np.int64

If I recall @NicolasHug mentioned that Django (or was it another library?) are using this way for this particular case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross reference: #17306 (comment)

Copy link
Member

@NicolasHug NicolasHug Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "of thereof" I believe.

And I think we should give a more concrete example:

For example if a parameter supports ints, floats, lists of ints, and list of float, you can use "int, floats, or list thereof".

Also I don't think using "thereof" is warranted when it refers to only one simple thing: "int or list thereof" should probably just be "list or list of int"

Then again, not a native speaker...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean now. As a native speaker, I feel that this phrasing will be confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case we can also suggest "... or list of these"

Since "of these" / "of that" is precisely what "thereof" means

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct english, but I have not seen it commonly used. For example:

ndarray of shape (n_samples, n_classes) or list of thereof

The thereof is referring to ndarray. I think this type of "going back through the sentence" and having to think about what thereof is referring will be difficult for some.

We can try it out and see if it leads to any confusion.

XREF: https://english.stackexchange.com/questions/27654/why-do-we-use-the-word-thereof

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's try out. It is probably as confusing as the difference between an ndarray and an array-like at the difference that is documented in an English dictionary instead of our Glossary :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It makes more sense.

@adrinjalali
Copy link
Member

FWIW, I find list of such arrays much easier to parse than list of thereof. But happy with whatever y'all decide on.

@glemaitre glemaitre changed the title DOC mention 'of thereof' in doc guideline DOC mention 'thereof' in doc guideline Aug 25, 2020
@glemaitre
Copy link
Member Author

OK. So I added the different possibilities. An additional question since we are modifying now the guide, I want to state 2 things:

  • how to document when there is multiple dtype. I recall that we used a set array-like of shape (n_samples,), dtype={int, float}. Does it seem ok?
  • how to document the shape/length of a list. I can see some occurrence of list of shape (n_samples,). Do we want to keep this? Or do we want a specific way to state the shape of a list list of length n_samples? I think that the problem does not exist when we state array-like`.

@glemaitre glemaitre changed the title DOC mention 'thereof' in doc guideline DOC update documentation guideline regarding docstring formatting Aug 26, 2020
marenwestermann added a commit to marenwestermann/scikit-learn that referenced this pull request Aug 27, 2020
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find the thereof terminology not so clear (even if correct).

When documenting the length of the outer list of a nested list structure, I think it's important to document it explicitly as such. See example below, the other occurrences in this PR will need a similar update if your agree with this suggestion.

``ndarray of shape (n_samples,), dtype=np.int32``.
``ndarray of shape (n_samples,), dtype=np.int32``. You can specify
multiple dtype as a set:
``array-like of shape (n_samples,), dtype={int, float}``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not recommend to use dtype=int in documentation. Either use a specific precision level (np.int32) or "integer dtype" or floating point dtype" if you want to mention arbitrary precision. Or just "numeric dtype" if both integer or floating point dtypes are accepted (but maybe dtype can be completely omitted in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With numpy, passing dtype=int and dtype=float will result in np.int64 and np.float64 as actual dtypes but:

  • this is implicit;
  • this might be platform specific and could lead to subtle bugs (though I am not sure it's still the case with recent numpy versions and Python 3).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be platform specific and could lead to subtle bugs (though I am not sure it's still the case with recent numpy versions and Python 3).

I think this is still the case with the integer (or at least last week during a course :))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the correction avoiding using int and float and advise for using integer, floating and numeric.
I added some explanation in the glossary as well.

glemaitre and others added 2 commits October 9, 2020 20:58
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@glemaitre
Copy link
Member Author

When documenting the length of the outer list of a nested list structure, I think it's important to document it explicitly as such. See example below, the other occurrences in this PR will need a similar update if your agree with this suggestion.

I update these cases as well.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comment about the recommendation about the dtype doc recommendations.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Comment on lines 761 to 763
* ``array-like of shape (n_samples,) or list thereof``;
* ``array-like of shape (n_samples,) or list of these``;
* ``array-like of shape (n_samples,) or list of such arrays``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does one choose between these three options? Personally, I would always go with list of such ___

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of such ___, it is. Let's move forward with this.

@thomasjpfan thomasjpfan changed the title DOC update documentation guideline regarding docstring formatting DOC Update docs guideline regarding docstring formatting Jan 8, 2021
@thomasjpfan thomasjpfan merged commit 266a11b into scikit-learn:master Jan 8, 2021
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Jan 18, 2021
…n#18243)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
jeremiedbb added a commit that referenced this pull request Jan 19, 2021
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants