-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] MAINT Parametrize common estimator tests with pytest #11063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/tests/test_common.py
Outdated
yield name, Estimator | ||
|
||
|
||
def _carthesian_product_checks(check_generator, estimators): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the idea is that we generate all valid checks for all estimators which is loosely a cartesian product (without the typo), but better naming ideas are welcome..
Just had a quick glance at it but this looks nice and promising! Just a question will this approach work for other tests in |
@@ -49,26 +51,52 @@ def test_all_estimators(): | |||
# properly | |||
assert_greater(len(estimators), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove (or move to the appropriate location) the comment above "Test that estimators are default-constructible etc ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
and address Loic's comments
Thanks for the feedback!
I just migrated the only remaining test that used yields in To my knowledge, |
To clarify: this only migrates As to the other common tests,
are already parametrized,
doesn't need to be, and
is still not parametrized but that should be more straightforward as it's just an iteration over metrics (with possible concerns about how to cache input data). Parametrization of common tests in metrics is out of scope for this PR.. |
sklearn/test/test_common.py was the only thing I had in mind, great stuff! Just being able to run a test for one check and a single estimator is a very nice improvement (on top of cleaning up and getting rid of deprecated yield tests of course). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the right way to go.
Please update the development docs with the -k syntax
sklearn/tests/test_common.py
Outdated
|
||
@pytest.mark.parametrize( | ||
"name, Estimator, check", | ||
_cartesian_product_checks(_yield_all_checks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having multiple @parametrize will already produce the Cartesian product. How do we benefit from this explicit product generator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is a figure of speech; it's not rigorously a Cartesian product because for each estimator we get a list of checks that apply to it from _yield_all_checks
not a predefined list of checks, so this cannot be writted as a two independent @parametrize decorators (or using itertools.product
)..
Would the name _combine_checks
or _yield_checks_combination
be better ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Change the name. Thanks
Thanks for the review, @jnothman ! I have added the |
doc/developers/tips.rst
Outdated
|
||
pytest sklearn/tests/test_common.py -v -k LogisticRegression | ||
|
||
will run all common tests for the ``LogisticRegression`` estimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a :term: to reference the glossary for common tests, please
doc/developers/tips.rst
Outdated
@@ -64,8 +64,24 @@ will be displayed as a color background behind the line number. | |||
Useful pytest aliases and flags | |||
------------------------------- | |||
|
|||
We recommend using pytest to run unit tests. When a unit tests fail, the | |||
following tricks can make debugging easier: | |||
We recommend using pytest to run unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect with the dependence on parameterize, we should be more forceful than 'recommend'!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this sentence. After all, this is a section about pytest tips, so it already assumes that pytest is used.
The section about testing http://scikit-learn.org/dev/developers/contributing.html#testing-and-improving-test-coverage is explicit enough about the fact that pytest is needed.
Thanks for the comments @jnothman -- I made the requested changes... |
sklearn/tests/test_common.py
Outdated
yield name, Estimator | ||
|
||
|
||
def _combine_checks(check_generator, estimators): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would _generate_checks_per_estimator be a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's probably more explicit - renamed it.
LGTM. However, we always believed that we did not want to move the common tests with some pytest features to not depend from it. Am I wrong (I am really not sure :)) |
As far as I understand, we don't want As to the scikit-learn test suite, we are already using |
Thanks the clarification. I'll review after asap ;) |
oh it is already green in fact. |
Thanks @rth!!! |
Thanks @rth! I believe this removes some of the more problematic yield tests in our test suite in a both pragmatic and neat way (not always easy to have both)! |
Thanks for all the reviews! FYI, there is a follow up attempt at removing all remaining yield based tests in #11074 |
Sweet! |
Reference Issues/PRs
Partially address #10728, also relevant to #7319
What does this implement/fix? Explain your changes.
This migrates
sklearn/tests/test_common.py
from yield based tests to pytest paramerized ones.Instead of the approaches proposed in #7319 (comment) or #10728 (comment) this simply parametrizes
sklearn/tests/test_common.py::test_non_meta_estimators
with both the estimator and the check to perform.sklearn/utils/estimator_checks.py
is left unchanged, and the parametrization relies on_yield_all_checks
to produce a list of valid checks for a given estimator.Here is an example of output (in verbose mode),
it's a bit verbose, but allows to select both common tests specific to an estimator, e.g.,
(specifying the path to the test file is not necessary - it only makes collection faster),
and to select all tests specific to a check, e.g.