Skip to content

Apply numpydoc validation to docstrings #15440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rth opened this issue Nov 2, 2019 · 34 comments · Fixed by #15461
Closed

Apply numpydoc validation to docstrings #15440

rth opened this issue Nov 2, 2019 · 34 comments · Fixed by #15461
Labels
good first issue Easy with clear instructions to resolve Sprint

Comments

@rth
Copy link
Member

rth commented Nov 2, 2019

It would be useful to enforce docstrings style with numpydoc. Currently only a small fraction of docstring pass that validation.

To improve docstring see steps below,

  1. Install scikit-learn from sources (see contribution guide).

  2. install numpydoc master with,

    pip install https://github.com/numpy/numpydoc/archive/master.zip
    
  3. Run the docstring validation on all docstrings,

    pytest maint_tools/test_docstrings.py -v
    

    and choose an estimator with an XFAIL status (meaning that it is a known failure). Write down all of its methods.

  4. Run,

    python maint_tools/test_docstrings.py import_path
    

    to see the list of validation errors for a particular method, where import_path can be for instance sklearn.linear_model.LogisticRegression (for the main estimator docstring) or sklearn.linear_model.LogisticRegression.fit (for the docstring of the fit method).

  5. Fix the docstring until validation passes. Repeat on all public methods of the chosen estimator.

  6. Add the estimator to the whitelist in maint_tools/test_docstrings.py here. Note that this list accepts regular expressions, so LogisticRegression will match all methods of that estimator, and also potentially other estimators e.g. LogisticRegressionCV. For instance one can use LogisticRegression$ to only match the main estimator docstring. When running from step 2, checks for the modified estimators should then pass.

Please write in a comment of this issue, the estimator you are planning to work on. Note that some methods are shared between estimators and are located in other files e.g. estimator.set_params.

@rth rth added Sprint good first issue Easy with clear instructions to resolve labels Nov 2, 2019
@tolaa001
Copy link
Contributor

tolaa001 commented Nov 2, 2019

I am working on the docstrings for LogisticRegression estimator

@norvan
Copy link
Contributor

norvan commented Nov 2, 2019

Taking a shot at docstrings for sklean.pipeline.Pipeline.

@LalliAcqua
Copy link
Contributor

I am working on the doctrings for TSNE

@gbroccolo
Copy link

looking to improve docstrings for sklearn.neighbors.KNeighborsClassifier

@paoloturati
Copy link
Contributor

working on RadiusNeighborsClassifier

@tolaa001
Copy link
Contributor

tolaa001 commented Nov 2, 2019

working on MinMaxScaler

@ghost
Copy link

ghost commented Nov 2, 2019

Working one OneHotEncoder

@paoloturati
Copy link
Contributor

Working on LabelBinarizer

@r-build
Copy link
Contributor

r-build commented Nov 2, 2019

Working on DecisionTreeClassifier

@paoloturati
Copy link
Contributor

Working on IsolationForest

@Yinglr
Copy link
Contributor

Yinglr commented Nov 2, 2019

Working on sklearn.cluster.MiniBatchKMeans

@ghost
Copy link

ghost commented Nov 2, 2019

Working on OrdinalEncoder

@bbuluttekin
Copy link
Contributor

Working on LinearRegression.

@lbfin
Copy link
Contributor

lbfin commented Nov 2, 2019

Worked on Lasso.path

@sam-dixon
Copy link
Contributor

Working on KernelDensity

@alexdesiqueira
Copy link

Working on PCA

@hailey0huong
Copy link
Contributor

Working on TfidfVectorizer

@go-bears
Copy link
Contributor

go-bears commented Nov 2, 2019

working on VotingClassifier

@alexdesiqueira
Copy link

Working on DecisionTreeClassifier

@abbiepopa
Copy link
Contributor

I am working on sklearn.cluster.KMeans

Submitted PR for KMeans: #15473

Now working on sklearn.ensemble.AdaBoostClassifier

@LalliAcqua
Copy link
Contributor

working on sklearn.svm._classes.py SVC and LinearSVC classes

@LauraLangdon
Copy link
Contributor

LauraLangdon commented Nov 2, 2019

I'm working on BernoulliNB.

@louishuynh
Copy link
Contributor

I'm looking at neighbors.NearestNeighbors

@rsanjabi
Copy link
Contributor

rsanjabi commented Nov 2, 2019

I'm working on the Random Forest Regressor.

RandomForestClassifier feature_importances_ is creating an error in the test_docstrings script:

Traceback (most recent call last):
File "maint_tools/test_docstrings.py", line 173, in
msg = repr_errors(res, method=args.import_path)
File "maint_tools/test_docstrings.py", line 112, in repr_errors
for code, message in res["errors"]
TypeError: sequence item 0: expected str instance, NoneType found

We are no longer working on this estimator.

@hailey0huong
Copy link
Contributor

I ran this command python maint_tools/test_docstrings.py sklearn.feature_extraction.text.TfidfVectorizer.fit and received this error " - YD01: No Yields section found" but this function does not need a yield. Is it a bug for the test code?

@poorna-kumar
Copy link
Contributor

Working on sklearn.linear_model.SGDClassifier

Running into a linter issue with SGDClassifier.loss_functions like so:

> python maint_tools/test_docstrings.py sklearn.linear_model.SGDClassifier.loss_functions
Traceback (most recent call last):
  File "maint_tools/test_docstrings.py", line 173, in <module>
    msg = repr_errors(res, method=args.import_path)
  File "maint_tools/test_docstrings.py", line 112, in repr_errors
    for code, message in res["errors"]
TypeError: sequence item 0: expected str instance, NoneType found

@abbiepopa
Copy link
Contributor

I will work on sklearn.cluster.Birch

@sam-dixon
Copy link
Contributor

Picking up RandomForestClassifier (and probably RandomForestRegressor while I'm at it).

@rsanjabi
Copy link
Contributor

rsanjabi commented Nov 2, 2019

Working on OPTICS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Easy with clear instructions to resolve Sprint
Projects
None yet