Skip to content

[MRG+1] MAINT Parametrize common estimator tests with pytest #11063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 7, 2018

Conversation

rth
Copy link
Member

@rth rth commented May 3, 2018

Reference Issues/PRs

Partially address #10728, also relevant to #7319

What does this implement/fix? Explain your changes.

This migrates sklearn/tests/test_common.py from yield based tests to pytest paramerized ones.

Instead of the approaches proposed in #7319 (comment) or #10728 (comment) this simply parametrizes sklearn/tests/test_common.py::test_non_meta_estimators with both the estimator and the check to perform.

sklearn/utils/estimator_checks.py is left unchanged, and the parametrization relies on _yield_all_checks to produce a list of valid checks for a given estimator.

Here is an example of output (in verbose mode),

$ pytest sklearn/tests/test_common.py::test_non_meta_estimators -v
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_estimators_dtypes] PASSED                                    [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_fit_score_takes_y] PASSED                                    [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_dtype_object] PASSED                                         [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_sample_weights_pandas_series] PASSED                         [  0%]
[...]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostClassifier-AdaBoostClassifier-check_estimators_dtypes] PASSED                          [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostClassifier-AdaBoostClassifier-check_fit_score_takes_y] PASSED                          [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostClassifier-AdaBoostClassifier-check_dtype_object] PASSED

it's a bit verbose, but allows to select both common tests specific to an estimator, e.g.,

$ pytest sklearn/tests/test_common.py -v -k LogisticRegression
sklearn/tests/test_common.py::test_parameters_default_constructible[LogisticRegression-LogisticRegression] PASSED                                     [  0%]
sklearn/tests/test_common.py::test_parameters_default_constructible[LogisticRegressionCV-LogisticRegressionCV] PASSED                                 [  1%]
sklearn/tests/test_common.py::test_parameters_default_constructible[RandomizedLogisticRegression-RandomizedLogisticRegression] PASSED                 [  2%]
sklearn/tests/test_common.py::test_non_meta_estimators[LogisticRegression-LogisticRegression-check_estimators_dtypes] PASSED                          [  3%]
sklearn/tests/test_common.py::test_non_meta_estimators[LogisticRegression-LogisticRegression-check_fit_score_takes_y] PASSED                          [  4%]
sklearn/tests/test_common.py::test_non_meta_estimators[LogisticRegression-LogisticRegression-check_dtype_object] PASSED 
[...]
================================================ 105 passed, 4717 deselected, 1580 warnings in 8.67 seconds =================================================

(specifying the path to the test file is not necessary - it only makes collection faster),

and to select all tests specific to a check, e.g.

$ pytest sklearn/tests/test_common.py -v -k check_estimators_nan_inf
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_estimators_nan_inf] PASSED                                   [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostClassifier-AdaBoostClassifier-check_estimators_nan_inf] PASSED                         [  1%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostRegressor-AdaBoostRegressor-check_estimators_nan_inf] PASSED                           [  2%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdditiveChi2Sampler-AdditiveChi2Sampler-check_estimators_nan_inf] PASSED                       [  2%]
sklearn/tests/test_common.py::test_non_meta_estimators[AffinityPropagation-AffinityPropagation-check_estimators_nan_inf] PASSED                       [  3%]
[...]
================================================= 148 passed, 4674 deselected, 128 warnings in 4.33 seconds =================================================

yield name, Estimator


def _carthesian_product_checks(check_generator, estimators):
Copy link
Member Author

@rth rth May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the idea is that we generate all valid checks for all estimators which is loosely a cartesian product (without the typo), but better naming ideas are welcome..

@lesteve
Copy link
Member

lesteve commented May 3, 2018

Just had a quick glance at it but this looks nice and promising!

Just a question will this approach work for other tests in test_common.py as well or does it happen that it is not a straightforward cartesian product of checks x estimators.

@@ -49,26 +51,52 @@ def test_all_estimators():
# properly
assert_greater(len(estimators), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove (or move to the appropriate location) the comment above "Test that estimators are default-constructible etc ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rth
Copy link
Member Author

rth commented May 3, 2018

Thanks for the feedback!

Just a question will this approach work for other tests in test_common.py as well or does it happen that it is not a straightforward cartesian product of checks x estimators.

I just migrated the only remaining test that used yields in test_common.py : test_class_weight_balanced_linear_classifiers. So all tests in test_common.py, for which it makes sense, should now be parametrized.

To my knowledge, test_non_meta_estimators is the only one that runs all checks x estimators it shouldn't be necessary for other common tests, unless I'm missing something?

@rth
Copy link
Member Author

rth commented May 3, 2018

To clarify: this only migrates sklearn/tests/test_common.py which are the most problematic.

As to the other common tests,

sklearn/preprocessing/tests/test_common.py
sklearn/metrics/cluster/tests/test_common.py

are already parametrized,

sklearn/datasets/tests/test_common.py

doesn't need to be, and

sklearn/metrics/tests/test_common.py

is still not parametrized but that should be more straightforward as it's just an iteration over metrics (with possible concerns about how to cache input data). Parametrization of common tests in metrics is out of scope for this PR..

@rth rth changed the title [MRG] MAINT Parametrize common tests with pytest [MRG] MAINT Parametrize common estimator tests with pytest May 3, 2018
@lesteve
Copy link
Member

lesteve commented May 3, 2018

To clarify: this only migrates sklearn/tests/test_common.py

sklearn/test/test_common.py was the only thing I had in mind, great stuff! Just being able to run a test for one check and a single estimator is a very nice improvement (on top of cleaning up and getting rid of deprecated yield tests of course).

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right way to go.

Please update the development docs with the -k syntax


@pytest.mark.parametrize(
"name, Estimator, check",
_cartesian_product_checks(_yield_all_checks,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having multiple @parametrize will already produce the Cartesian product. How do we benefit from this explicit product generator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a figure of speech; it's not rigorously a Cartesian product because for each estimator we get a list of checks that apply to it from _yield_all_checks not a predefined list of checks, so this cannot be writted as a two independent @parametrize decorators (or using itertools.product)..

Would the name _combine_checks or _yield_checks_combination be better ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Change the name. Thanks

@rth rth force-pushed the common-tests-pytest branch from 5f24a66 to 23a01f3 Compare May 4, 2018 09:58
@rth
Copy link
Member Author

rth commented May 4, 2018

Thanks for the review, @jnothman ! I have added the -k parameter documentation.


pytest sklearn/tests/test_common.py -v -k LogisticRegression

will run all common tests for the ``LogisticRegression`` estimator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a :term: to reference the glossary for common tests, please

@@ -64,8 +64,24 @@ will be displayed as a color background behind the line number.
Useful pytest aliases and flags
-------------------------------

We recommend using pytest to run unit tests. When a unit tests fail, the
following tricks can make debugging easier:
We recommend using pytest to run unit tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect with the dependence on parameterize, we should be more forceful than 'recommend'!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this sentence. After all, this is a section about pytest tips, so it already assumes that pytest is used.

The section about testing http://scikit-learn.org/dev/developers/contributing.html#testing-and-improving-test-coverage is explicit enough about the fact that pytest is needed.

@rth
Copy link
Member Author

rth commented May 6, 2018

Thanks for the comments @jnothman -- I made the requested changes...

yield name, Estimator


def _combine_checks(check_generator, estimators):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would _generate_checks_per_estimator be a better name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's probably more explicit - renamed it.

@jnothman jnothman changed the title [MRG] MAINT Parametrize common estimator tests with pytest [MRG+1] MAINT Parametrize common estimator tests with pytest May 7, 2018
@glemaitre
Copy link
Member

LGTM. However, we always believed that we did not want to move the common tests with some pytest features to not depend from it. Am I wrong (I am really not sure :))

@rth
Copy link
Member Author

rth commented May 7, 2018

However, we always believed that we did not want to move the common tests with some pytest features to not depend from it.

As far as I understand, we don't want check_estimator or check* from sklearn/utils/estimator_checks.py to depend on pytest because these are potentially used in scikit-learn contrib projects (and this PR doesn't change that).

As to the scikit-learn test suite, we are already using @pytest.mark.parametrize in some unit tests, so adding them in test_common.py is not really going to change anything: pytest will be needed anyway to run the test suite..

@glemaitre
Copy link
Member

Thanks the clarification. I'll review after asap ;)

@glemaitre
Copy link
Member

oh it is already green in fact.

@glemaitre glemaitre merged commit 67cc975 into scikit-learn:master May 7, 2018
@glemaitre
Copy link
Member

Thanks @rth!!!

@rth rth deleted the common-tests-pytest branch May 7, 2018 10:45
@rth rth mentioned this pull request May 7, 2018
3 tasks
@lesteve
Copy link
Member

lesteve commented May 7, 2018

Thanks @rth! I believe this removes some of the more problematic yield tests in our test suite in a both pragmatic and neat way (not always easy to have both)!

@rth
Copy link
Member Author

rth commented May 7, 2018

Thanks for all the reviews! FYI, there is a follow up attempt at removing all remaining yield based tests in #11074

@amueller
Copy link
Member

Sweet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants