Skip to content

DOC Ensures that LatentDirichletAllocation passes numpydoc validation #20402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

DOC Ensures that LatentDirichletAllocation passes numpydoc validation #20402

wants to merge 1 commit into from

Conversation

g4brielvs
Copy link
Contributor

@g4brielvs g4brielvs commented Jun 26, 2021

Reference Issues/PRs

Addresses #20308

What does this implement/fix? Explain your changes.

This PR ensures LatentDirichletAllocation is compatible with numpydocd

  • Remove LatentDirichletAllocation from DOCSTRING_IGNORE_LIST
  • Minor style fixes.

Any other comments?

Thanks #DataUmbrella

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @g4brielvs !


Returns
-------
self
self:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self:
self

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self : object for consistency with other places in the package

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but I don't think it's correct. The base class for all estimators is BaseEstimator not object, so I think there is no point in indicating the object dtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


Returns
-------
self
self:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self:
self

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self : object

Copy link
Member

@rth rth Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial version was correct for cases where there are no types, and it's a bit pointless to indicate object dtype for estimators which is also not accurate https://numpydoc.readthedocs.io/en/latest/format.html#parameters

cc @glemaitre

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit pointless to indicate object dtype for estimators which is also not accurate

I agree. However, we have this way of documenting everywhere. I would prefer to keep this inaccurate convention until we do a find/replace regex in another PR. Would it not be easier to make the change?

Copy link
Member

@rth rth Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but my point is there is no issue with this line of docstring. It should pass the numpydoc validation, I think? And if so I would rather we didn't change it instead of introducing unnecessary code churn with a solution we know is not helpful / correct.

self : object is used elsewhere, but 27% of estimators still use the current conversion,

$ rg "^\s+self$"  | grep self | wc -l
38
$ rg "^\s+self\s*:$"  | grep self | wc -l
2
$ rg "^\s+self\s*:\s*object$"  | grep self | wc -l
103

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rth Thank you so much for reviewing my PR. The numpydoc validation requires a description and that's the reason a added a tautology. I'll be more than happy to make any changes according to your guidance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the description is ok. The point of @rth is about self : object vs self alone on the current line, not the description below.

Comment on lines +292 to +295
[2] "Stochastic Variational Inference", Matthew D. Hoffman, David M. Blei,
Chong Wang, John Paisley, 2013

[3] Matthew D. Hoffman's onlineldavb code. Link:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think,

Suggested change
[2] "Stochastic Variational Inference", Matthew D. Hoffman, David M. Blei,
Chong Wang, John Paisley, 2013
[3] Matthew D. Hoffman's onlineldavb code. Link:
.. [2] "Stochastic Variational Inference", Matthew D. Hoffman, David M. Blei,
Chong Wang, John Paisley, 2013
.. [3] Matthew D. Hoffman's onlineldavb code. Link:

but need to check the rendering (ci/circleci: doc artifact CI job), currently it doesn't look ideal: https://142780-843222-gh.circle-artifacts.com/0/doc/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might raise an error if the article is not linked (somewhere we expect [2]_) in the docstring. If this is not the case, we could either link where it is meaningful, otherwise, we could as well remove the reference.

@reshamas
Copy link
Member

Updating with correct spelling: #DataUmbrella sprint

@glemaitre
Copy link
Member

glemaitre commented Jun 28, 2021 via email

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numpydoc validation requires a description and that's the reason a added a tautology.

If untyped self doesn't pass validation than it's an instance of numpy/numpydoc#242. LGTM for the rest except for the https://github.com/scikit-learn/scikit-learn/pull/20402/files#r659270234 for which I'm still not sure.

@glemaitre glemaitre self-requested a review July 7, 2021 10:07
@glemaitre glemaitre assigned glemaitre and unassigned glemaitre Jul 7, 2021
@glemaitre glemaitre requested review from glemaitre and removed request for glemaitre July 7, 2021 10:14
@glemaitre glemaitre self-assigned this Jul 20, 2021
@glemaitre glemaitre removed their request for review July 20, 2021 16:30
@glemaitre
Copy link
Member

I made the changes in another PR and merge with the current branch and authorship.
Thanks @g4brielvs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants