Skip to content

[WIP/RFC] Test docstring parameters (with order) #9023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from

Conversation

agramfort
Copy link
Member

@agramfort agramfort commented Jun 6, 2017

Reference Issue

Fixes #7758

What does this implement/fix? Explain your changes.

This adds a unit test to check that all parameters are documented
and in the right order.

Any other comments?

For now I only fixes datasets module and I think it's the least controversial.

@jnothman
Copy link
Member

jnothman commented Jun 7, 2017

See also #7793.

I think there's no doubt that something like this could improve documentation quality. One big challenge is handling exceptional cases, including deprecation and *args (train_test_split, for instance).

I had grand plans to make something like this more descriptive, outputting a diff between the parameters and the docstring, so that it effectively fixes things for you. I have some code somewhere. Let me know if I should pull it out and share it. (Note that diffing docstrings is only feasible in some cases, e.g. where it appears directly in the function in question and \ isn't used; it also requires changes to numpydoc, changes I hoped to push there.) I also hoped to similarly handle cases where the parameter name did not have a before the :.

But what I'd really like to see a test for is that fitted model attributes correspond to those in the docstring.

@ogrisel
Copy link
Member

ogrisel commented Jun 7, 2017

But what I'd really like to see a test for is that fitted model attributes correspond to those in the docstring.

+1 but maybe for another PR once this or #7793 is reviewed and merged.

@agramfort
Copy link
Member Author

agramfort commented Jun 7, 2017 via email

raise SkipTest(
"numpydoc is required to test the docstrings")

from numpydoc import docscrape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codecov browser extension tells me that this line is never executed by our CI servers. It seems that we need to add the numpydoc module to at least a Python 3.6 and probably to a Python 2.7 build job in travis to properly cover this test code (and actually run the tests).

@ogrisel
Copy link
Member

ogrisel commented Jun 7, 2017

@ogrisel do you think it's reasonable to fix all docstrings in this PR?

I think we can merge a first PR with the infrastructure to do the checks as long as it's sufficiently tested. Speaking of which, it would be great to have couple of unittests to check the check_parameters_match itself so as to make sure that the error message is informative enough on common invalid docstring cases.

otherwise we need to do the hack of flake8

As @jnothman said there are probably functions that are exception to the behavioral rules encoded in your test function (e.g. train_test_split). So if we implement a systematic CI check as done for the flake8 check we need to have an easy way to disable the check for specific functions.

@amueller
Copy link
Member

amueller commented Jun 7, 2017

numpydoc? anyone wanna review #7355 ;) [currently it coughs up lots of errors but should still work]

@@ -1199,22 +1211,22 @@ def make_sparse_spd_matrix(dim=1, alpha=0.95, norm_diag=False,
The probability that a coefficient is zero (see notes). Larger values
enforce more sparsity.

norm_diag : boolean, optional (default=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to respect the order exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

'__neg__', '__hash__')

public_modules = [
# the list of modules users need to access for all functionality
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not all? too much to do? leave for future PRS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you want a huge PR to review? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sklearn.base should be there, no? Why not use pkgutils?

]


def check_parameters_match(func, doc=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe should move to utils.testing? If it deserves its own tests, I don't think it should live in a test file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file currently contains three things from what I can see: a function to check docstrings, tests for this function, and tests that run this function on sklearn. I feel they should live in two or three different files since they are conceptually very separate.


def test_check_parameters_match():
check_parameters_match(f_ok)
assert_raise_message(RuntimeError, 'Unknown section Results',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: it would be better to have: 'Unknown section: Results'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even:

'Invalid section: Results'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a numpydoc string not mine

@agramfort
Copy link
Member Author

refactoring done

@@ -22,7 +22,7 @@ matrix:
# This environment tests that scikit-learn can be built against
# versions of numpy, scipy with ATLAS that comes with Ubuntu Trusty 14.04
- env: DISTRIB="ubuntu" PYTHON_VERSION="2.7" CYTHON_VERSION="0.23.4"
COVERAGE=true
COVERAGE=true TEST_DOCSTRINGS="false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to set TEST_DOCSTRINGS if you don't want to test the docstrings (similar to what we do with COVERAGE)

@raghavrv
Copy link
Member

Closing this in favor of #9206

@raghavrv raghavrv closed this Jun 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add check that all parameters are properly documented
6 participants