-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Update docs guideline regarding docstring formatting #18243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happy to see this across the documentation!
doc/developers/contributing.rst
Outdated
@@ -734,7 +736,8 @@ In general have the following in mind: | |||
5. Specify ``dataframe`` when "frame-like" features are being used, such | |||
as the column names. | |||
6. When specifying the data type of a list, use ``of`` as a delimiter: | |||
``list of int``. | |||
``list of int``. If the data type was also mentioned as possible output, | |||
you can use ``list of thereof`` to be more concise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unable to easily parse "list of thereof". What is thereof referencing to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int or list of thereof
-> int or list of int
ndarray of shape (n_samples, n_features), dtype=np.int64 or list of thereof
-> ndarray of shape (n_samples, n_features), dtype=np.int64 or list of ndarray of shape (n_samples, n_features), dtype=np.int64
If I recall @NicolasHug mentioned that Django (or was it another library?) are using this way for this particular case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross reference: #17306 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be "of thereof" I believe.
And I think we should give a more concrete example:
For example if a parameter supports ints, floats, lists of ints, and list of float, you can use "int, floats, or list thereof".
Also I don't think using "thereof" is warranted when it refers to only one simple thing: "int or list thereof" should probably just be "list or list of int"
Then again, not a native speaker...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean now. As a native speaker, I feel that this phrasing will be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case we can also suggest "... or list of these"
Since "of these" / "of that" is precisely what "thereof" means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct english, but I have not seen it commonly used. For example:
ndarray of shape (n_samples, n_classes) or list of thereof
The thereof
is referring to ndarray
. I think this type of "going back through the sentence" and having to think about what thereof
is referring will be difficult for some.
We can try it out and see if it leads to any confusion.
XREF: https://english.stackexchange.com/questions/27654/why-do-we-use-the-word-thereof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let's try out. It is probably as confusing as the difference between an ndarray and an array-like at the difference that is documented in an English dictionary instead of our Glossary :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre could you apply the changes proposed in https://github.com/scikit-learn/scikit-learn/pull/18243/files#r475250655 ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It makes more sense.
FWIW, I find |
OK. So I added the different possibilities. An additional question since we are modifying now the guide, I want to state 2 things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find the thereof terminology not so clear (even if correct).
When documenting the length of the outer list of a nested list structure, I think it's important to document it explicitly as such. See example below, the other occurrences in this PR will need a similar update if your agree with this suggestion.
doc/developers/contributing.rst
Outdated
``ndarray of shape (n_samples,), dtype=np.int32``. | ||
``ndarray of shape (n_samples,), dtype=np.int32``. You can specify | ||
multiple dtype as a set: | ||
``array-like of shape (n_samples,), dtype={int, float}``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not recommend to use dtype=int
in documentation. Either use a specific precision level (np.int32
) or "integer dtype" or floating point dtype" if you want to mention arbitrary precision. Or just "numeric dtype" if both integer or floating point dtypes are accepted (but maybe dtype can be completely omitted in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With numpy, passing dtype=int
and dtype=float
will result in np.int64
and np.float64
as actual dtypes but:
- this is implicit;
- this might be platform specific and could lead to subtle bugs (though I am not sure it's still the case with recent numpy versions and Python 3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be platform specific and could lead to subtle bugs (though I am not sure it's still the case with recent numpy versions and Python 3).
I think this is still the case with the integer (or at least last week during a course :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the correction avoiding using int
and float
and advise for using integer
, floating
and numeric
.
I added some explanation in the glossary as well.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
I update these cases as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comment about the recommendation about the dtype doc recommendations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
doc/developers/contributing.rst
Outdated
* ``array-like of shape (n_samples,) or list thereof``; | ||
* ``array-like of shape (n_samples,) or list of these``; | ||
* ``array-like of shape (n_samples,) or list of such arrays``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does one choose between these three options? Personally, I would always go with list of such ___
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of such ___
, it is. Let's move forward with this.
…n#18243) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
We start to use the following convention and we should mention it in the documentation guideline:
of thereof
refer to a container of the previous type.