-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Ensures that LatentDirichletAllocation passes numpydoc validation #20402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -139,7 +139,7 @@ def _update_doc_distribution( | |||||
|
||||||
|
||||||
class LatentDirichletAllocation(TransformerMixin, BaseEstimator): | ||||||
"""Latent Dirichlet Allocation with online variational Bayes algorithm | ||||||
"""Latent Dirichlet Allocation with online variational Bayes algorithm. | ||||||
|
||||||
.. versionadded:: 0.17 | ||||||
|
||||||
|
@@ -188,7 +188,7 @@ class LatentDirichletAllocation(TransformerMixin, BaseEstimator): | |||||
``n_samples``, the update method is same as batch learning. In the | ||||||
literature, this is called kappa. | ||||||
|
||||||
learning_offset : float, default=10. | ||||||
learning_offset : float, default=10.0 | ||||||
A (positive) parameter that downweights early iterations in online | ||||||
learning. It should be greater than 1.0. In the literature, this is | ||||||
called tau_0. | ||||||
|
@@ -278,6 +278,23 @@ class LatentDirichletAllocation(TransformerMixin, BaseEstimator): | |||||
Prior of topic word distribution `beta`. If the value is None, it is | ||||||
`1 / n_components`. | ||||||
|
||||||
See Also | ||||||
-------- | ||||||
sklearn.discriminant_analysis.LinearDiscriminantAnalysis: | ||||||
A classifier with a linear decision boundary, generated by fitting | ||||||
class conditional densities to the data and using Bayes’ rule. | ||||||
|
||||||
References | ||||||
---------- | ||||||
.. [1] "Online Learning for Latent Dirichlet Allocation", Matthew D. | ||||||
Hoffman, David M. Blei, Francis Bach, 2010 | ||||||
|
||||||
[2] "Stochastic Variational Inference", Matthew D. Hoffman, David M. Blei, | ||||||
Chong Wang, John Paisley, 2013 | ||||||
|
||||||
[3] Matthew D. Hoffman's onlineldavb code. Link: | ||||||
https://github.com/blei-lab/onlineldavb | ||||||
|
||||||
Examples | ||||||
-------- | ||||||
>>> from sklearn.decomposition import LatentDirichletAllocation | ||||||
|
@@ -293,18 +310,6 @@ class LatentDirichletAllocation(TransformerMixin, BaseEstimator): | |||||
>>> lda.transform(X[-2:]) | ||||||
array([[0.00360392, 0.25499205, 0.0036211 , 0.64236448, 0.09541846], | ||||||
[0.15297572, 0.00362644, 0.44412786, 0.39568399, 0.003586 ]]) | ||||||
|
||||||
References | ||||||
---------- | ||||||
.. [1] "Online Learning for Latent Dirichlet Allocation", Matthew D. | ||||||
Hoffman, David M. Blei, Francis Bach, 2010 | ||||||
|
||||||
[2] "Stochastic Variational Inference", Matthew D. Hoffman, David M. Blei, | ||||||
Chong Wang, John Paisley, 2013 | ||||||
|
||||||
[3] Matthew D. Hoffman's onlineldavb code. Link: | ||||||
https://github.com/blei-lab/onlineldavb | ||||||
|
||||||
""" | ||||||
|
||||||
def __init__( | ||||||
|
@@ -539,10 +544,12 @@ def partial_fit(self, X, y=None): | |||||
Document word matrix. | ||||||
|
||||||
y : Ignored | ||||||
Not used, present here for API consistency by convention. | ||||||
|
||||||
Returns | ||||||
------- | ||||||
self | ||||||
self: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, but I don't think it's correct. The base class for all estimators is BaseEstimator not object, so I think there is no point in indicating the object dtype. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||||||
Partially fitted estimator. | ||||||
""" | ||||||
self._check_params() | ||||||
first_time = not hasattr(self, "components_") | ||||||
|
@@ -587,10 +594,12 @@ def fit(self, X, y=None): | |||||
Document word matrix. | ||||||
|
||||||
y : Ignored | ||||||
Not used, present here for API consistency by convention. | ||||||
|
||||||
Returns | ||||||
------- | ||||||
self | ||||||
self: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The initial version was correct for cases where there are no types, and it's a bit pointless to indicate cc @glemaitre There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree. However, we have this way of documenting everywhere. I would prefer to keep this inaccurate convention until we do a find/replace regex in another PR. Would it not be easier to make the change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, but my point is there is no issue with this line of docstring. It should pass the numpydoc validation, I think? And if so I would rather we didn't change it instead of introducing unnecessary code churn with a solution we know is not helpful / correct.
$ rg "^\s+self$" | grep self | wc -l
38
$ rg "^\s+self\s*:$" | grep self | wc -l
2
$ rg "^\s+self\s*:\s*object$" | grep self | wc -l
103 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rth Thank you so much for reviewing my PR. The numpydoc validation requires a description and that's the reason a added a tautology. I'll be more than happy to make any changes according to your guidance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the description is ok. The point of @rth is about |
||||||
Fitted estimator. | ||||||
""" | ||||||
self._check_params() | ||||||
X = self._check_non_neg_array( | ||||||
|
@@ -799,6 +808,7 @@ def score(self, X, y=None): | |||||
Document word matrix. | ||||||
|
||||||
y : Ignored | ||||||
Not used, present here for API consistency by convention. | ||||||
|
||||||
Returns | ||||||
------- | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think,
but need to check the rendering (
ci/circleci: doc artifact
CI job), currently it doesn't look ideal: https://142780-843222-gh.circle-artifacts.com/0/doc/modules/generated/sklearn.decomposition.LatentDirichletAllocation.htmlThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might raise an error if the article is not linked (somewhere we expect
[2]_
) in the docstring. If this is not the case, we could either link where it is meaningful, otherwise, we could as well remove the reference.