-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Ensures that LatentDirichletAllocation passes numpydoc validation #20402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Ensures that LatentDirichletAllocation passes numpydoc validation #20402
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @g4brielvs !
|
||
Returns | ||
------- | ||
self | ||
self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self: | |
self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self : object
for consistency with other places in the package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but I don't think it's correct. The base class for all estimators is BaseEstimator not object, so I think there is no point in indicating the object dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
Returns | ||
------- | ||
self | ||
self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self: | |
self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self : object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial version was correct for cases where there are no types, and it's a bit pointless to indicate object
dtype for estimators which is also not accurate https://numpydoc.readthedocs.io/en/latest/format.html#parameters
cc @glemaitre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit pointless to indicate object dtype for estimators which is also not accurate
I agree. However, we have this way of documenting everywhere. I would prefer to keep this inaccurate convention until we do a find/replace regex in another PR. Would it not be easier to make the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but my point is there is no issue with this line of docstring. It should pass the numpydoc validation, I think? And if so I would rather we didn't change it instead of introducing unnecessary code churn with a solution we know is not helpful / correct.
self : object
is used elsewhere, but 27% of estimators still use the current conversion,
$ rg "^\s+self$" | grep self | wc -l
38
$ rg "^\s+self\s*:$" | grep self | wc -l
2
$ rg "^\s+self\s*:\s*object$" | grep self | wc -l
103
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rth Thank you so much for reviewing my PR. The numpydoc validation requires a description and that's the reason a added a tautology. I'll be more than happy to make any changes according to your guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the description is ok. The point of @rth is about self : object
vs self
alone on the current line, not the description below.
[2] "Stochastic Variational Inference", Matthew D. Hoffman, David M. Blei, | ||
Chong Wang, John Paisley, 2013 | ||
|
||
[3] Matthew D. Hoffman's onlineldavb code. Link: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think,
[2] "Stochastic Variational Inference", Matthew D. Hoffman, David M. Blei, | |
Chong Wang, John Paisley, 2013 | |
[3] Matthew D. Hoffman's onlineldavb code. Link: | |
.. [2] "Stochastic Variational Inference", Matthew D. Hoffman, David M. Blei, | |
Chong Wang, John Paisley, 2013 | |
.. [3] Matthew D. Hoffman's onlineldavb code. Link: |
but need to check the rendering (ci/circleci: doc artifact
CI job), currently it doesn't look ideal: https://142780-843222-gh.circle-artifacts.com/0/doc/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might raise an error if the article is not linked (somewhere we expect [2]_
) in the docstring. If this is not the case, we could either link where it is meaningful, otherwise, we could as well remove the reference.
Updating with correct spelling: #DataUmbrella sprint |
OK.
…On Mon, 28 Jun 2021 at 11:44, Roman Yurchak ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/decomposition/_lda.py
<#20402 (comment)>
:
>
Returns
-------
- self
+ self:
Right, but my point is there is no issue with this line of docstring. It
should pass the numpydoc validation, I think? And if so I would rather we
didn't change it instead of introducing unnecessary code churn with a
solution we know is not helpful / correct.
self : object is used elsewhere, but 27% of estimators use the current
conversion,
$ rg "^\s+self$" | grep self | wc -l
38
$ rg "^\s+self\s*:$" | grep self | wc -l
2
$ rg "^\s+self\s*:\s*object$" | grep self | wc -l
103
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20402 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY32P3LWJK22RLIEIMMEW3TVBAAXANCNFSM47LWDQNA>
.
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numpydoc validation requires a description and that's the reason a added a tautology.
If untyped self
doesn't pass validation than it's an instance of numpy/numpydoc#242. LGTM for the rest except for the https://github.com/scikit-learn/scikit-learn/pull/20402/files#r659270234 for which I'm still not sure.
I made the changes in another PR and merge with the current branch and authorship. |
Reference Issues/PRs
Addresses #20308
What does this implement/fix? Explain your changes.
This PR ensures
LatentDirichletAllocation
is compatible with numpydocdLatentDirichletAllocation
fromDOCSTRING_IGNORE_LIST
Any other comments?
Thanks #DataUmbrella