[MRG+1] MAINT Parametrize common estimator tests with pytest #11063

rth · 2018-05-03T13:03:06Z

Reference Issues/PRs

Partially address #10728, also relevant to #7319

What does this implement/fix? Explain your changes.

This migrates sklearn/tests/test_common.py from yield based tests to pytest paramerized ones.

Instead of the approaches proposed in #7319 (comment) or #10728 (comment) this simply parametrizes sklearn/tests/test_common.py::test_non_meta_estimators with both the estimator and the check to perform.

sklearn/utils/estimator_checks.py is left unchanged, and the parametrization relies on _yield_all_checks to produce a list of valid checks for a given estimator.

Here is an example of output (in verbose mode),

$ pytest sklearn/tests/test_common.py::test_non_meta_estimators -v
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_estimators_dtypes] PASSED                                    [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_fit_score_takes_y] PASSED                                    [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_dtype_object] PASSED                                         [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_sample_weights_pandas_series] PASSED                         [  0%]
[...]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostClassifier-AdaBoostClassifier-check_estimators_dtypes] PASSED                          [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostClassifier-AdaBoostClassifier-check_fit_score_takes_y] PASSED                          [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostClassifier-AdaBoostClassifier-check_dtype_object] PASSED

it's a bit verbose, but allows to select both common tests specific to an estimator, e.g.,

$ pytest sklearn/tests/test_common.py -v -k LogisticRegression
sklearn/tests/test_common.py::test_parameters_default_constructible[LogisticRegression-LogisticRegression] PASSED                                     [  0%]
sklearn/tests/test_common.py::test_parameters_default_constructible[LogisticRegressionCV-LogisticRegressionCV] PASSED                                 [  1%]
sklearn/tests/test_common.py::test_parameters_default_constructible[RandomizedLogisticRegression-RandomizedLogisticRegression] PASSED                 [  2%]
sklearn/tests/test_common.py::test_non_meta_estimators[LogisticRegression-LogisticRegression-check_estimators_dtypes] PASSED                          [  3%]
sklearn/tests/test_common.py::test_non_meta_estimators[LogisticRegression-LogisticRegression-check_fit_score_takes_y] PASSED                          [  4%]
sklearn/tests/test_common.py::test_non_meta_estimators[LogisticRegression-LogisticRegression-check_dtype_object] PASSED 
[...]
================================================ 105 passed, 4717 deselected, 1580 warnings in 8.67 seconds =================================================

(specifying the path to the test file is not necessary - it only makes collection faster),

and to select all tests specific to a check, e.g.

$ pytest sklearn/tests/test_common.py -v -k check_estimators_nan_inf
sklearn/tests/test_common.py::test_non_meta_estimators[ARDRegression-ARDRegression-check_estimators_nan_inf] PASSED                                   [  0%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostClassifier-AdaBoostClassifier-check_estimators_nan_inf] PASSED                         [  1%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdaBoostRegressor-AdaBoostRegressor-check_estimators_nan_inf] PASSED                           [  2%]
sklearn/tests/test_common.py::test_non_meta_estimators[AdditiveChi2Sampler-AdditiveChi2Sampler-check_estimators_nan_inf] PASSED                       [  2%]
sklearn/tests/test_common.py::test_non_meta_estimators[AffinityPropagation-AffinityPropagation-check_estimators_nan_inf] PASSED                       [  3%]
[...]
================================================= 148 passed, 4674 deselected, 128 warnings in 4.33 seconds =================================================

rth · 2018-05-03T13:15:25Z

sklearn/tests/test_common.py

+        yield name, Estimator
+
+
+def _carthesian_product_checks(check_generator, estimators):


Here the idea is that we generate all valid checks for all estimators which is loosely a cartesian product (without the typo), but better naming ideas are welcome..

lesteve · 2018-05-03T13:53:43Z

Just had a quick glance at it but this looks nice and promising!

Just a question will this approach work for other tests in test_common.py as well or does it happen that it is not a straightforward cartesian product of checks x estimators.

lesteve · 2018-05-03T13:56:22Z

sklearn/tests/test_common.py

@@ -49,26 +51,52 @@ def test_all_estimators():
    # properly
    assert_greater(len(estimators), 0)


Remove (or move to the appropriate location) the comment above "Test that estimators are default-constructible etc ..."

and address Loic's comments

rth · 2018-05-03T14:32:15Z

Thanks for the feedback!

Just a question will this approach work for other tests in test_common.py as well or does it happen that it is not a straightforward cartesian product of checks x estimators.

I just migrated the only remaining test that used yields in test_common.py : test_class_weight_balanced_linear_classifiers. So all tests in test_common.py, for which it makes sense, should now be parametrized.

To my knowledge, test_non_meta_estimators is the only one that runs all checks x estimators it shouldn't be necessary for other common tests, unless I'm missing something?

rth · 2018-05-03T14:50:46Z

To clarify: this only migrates sklearn/tests/test_common.py which are the most problematic.

As to the other common tests,

sklearn/preprocessing/tests/test_common.py
sklearn/metrics/cluster/tests/test_common.py

are already parametrized,

sklearn/datasets/tests/test_common.py

doesn't need to be, and

sklearn/metrics/tests/test_common.py

is still not parametrized but that should be more straightforward as it's just an iteration over metrics (with possible concerns about how to cache input data). Parametrization of common tests in metrics is out of scope for this PR..

lesteve · 2018-05-03T15:01:59Z

To clarify: this only migrates sklearn/tests/test_common.py

sklearn/test/test_common.py was the only thing I had in mind, great stuff! Just being able to run a test for one check and a single estimator is a very nice improvement (on top of cleaning up and getting rid of deprecated yield tests of course).

jnothman

This looks like the right way to go.

Please update the development docs with the -k syntax

jnothman · 2018-05-03T22:08:53Z

sklearn/tests/test_common.py

+
+@pytest.mark.parametrize(
+        "name, Estimator, check",
+        _cartesian_product_checks(_yield_all_checks,


Having multiple @parametrize will already produce the Cartesian product. How do we benefit from this explicit product generator?

The name is a figure of speech; it's not rigorously a Cartesian product because for each estimator we get a list of checks that apply to it from _yield_all_checks not a predefined list of checks, so this cannot be writted as a two independent @parametrize decorators (or using itertools.product)..

Would the name _combine_checks or _yield_checks_combination be better ?

Yes. Change the name. Thanks

rth · 2018-05-04T10:34:11Z

Thanks for the review, @jnothman ! I have added the -k parameter documentation.

jnothman · 2018-05-06T00:57:02Z

doc/developers/tips.rst

+
+  pytest sklearn/tests/test_common.py -v -k LogisticRegression
+
+will run all common tests for the ``LogisticRegression`` estimator.


Use a :term: to reference the glossary for common tests, please

jnothman · 2018-05-06T00:57:51Z

doc/developers/tips.rst

@@ -64,8 +64,24 @@ will be displayed as a color background behind the line number.
 Useful pytest aliases and flags
 -------------------------------

-We recommend using pytest to run unit tests. When a unit tests fail, the
-following tricks can make debugging easier:
+We recommend using pytest to run unit tests.


I suspect with the dependence on parameterize, we should be more forceful than 'recommend'!

I removed this sentence. After all, this is a section about pytest tips, so it already assumes that pytest is used.

The section about testing http://scikit-learn.org/dev/developers/contributing.html#testing-and-improving-test-coverage is explicit enough about the fact that pytest is needed.

rth · 2018-05-06T19:56:40Z

Thanks for the comments @jnothman -- I made the requested changes...

jnothman · 2018-05-06T22:52:20Z

sklearn/tests/test_common.py

+        yield name, Estimator
+
+
+def _combine_checks(check_generator, estimators):


Would _generate_checks_per_estimator be a better name?

Sure, it's probably more explicit - renamed it.

glemaitre · 2018-05-07T10:01:18Z

LGTM. However, we always believed that we did not want to move the common tests with some pytest features to not depend from it. Am I wrong (I am really not sure :))

rth · 2018-05-07T10:09:15Z

However, we always believed that we did not want to move the common tests with some pytest features to not depend from it.

As far as I understand, we don't want check_estimator or check* from sklearn/utils/estimator_checks.py to depend on pytest because these are potentially used in scikit-learn contrib projects (and this PR doesn't change that).

As to the scikit-learn test suite, we are already using @pytest.mark.parametrize in some unit tests, so adding them in test_common.py is not really going to change anything: pytest will be needed anyway to run the test suite..

glemaitre · 2018-05-07T10:11:53Z

Thanks the clarification. I'll review after asap ;)

glemaitre · 2018-05-07T10:12:10Z

oh it is already green in fact.

glemaitre · 2018-05-07T10:13:11Z

Thanks @rth!!!

lesteve · 2018-05-07T14:53:24Z

Thanks @rth! I believe this removes some of the more problematic yield tests in our test suite in a both pragmatic and neat way (not always easy to have both)!

rth · 2018-05-07T15:17:53Z

Thanks for all the reviews! FYI, there is a follow up attempt at removing all remaining yield based tests in #11074

amueller · 2018-05-19T19:15:19Z

Sweet!

Parametrize test_common.py with pytest

73a348d

rth commented May 3, 2018

View reviewed changes

This was referenced May 3, 2018

yield tests are deprecated in pytest #10728

Closed

Move to pytest? #7319

Closed

lesteve reviewed May 3, 2018

View reviewed changes

Migrate test_class_weight_balanced_linear_classifiers

c1a9829

and address Loic's comments

rth changed the title ~~[MRG] MAINT Parametrize common tests with pytest~~ [MRG] MAINT Parametrize common estimator tests with pytest May 3, 2018

jnothman reviewed May 3, 2018

View reviewed changes

Document the pytest -k parameter

23a01f3

rth force-pushed the common-tests-pytest branch from 5f24a66 to 23a01f3 Compare May 4, 2018 09:58

Move the -k pytest parameter docs to doc/developers/tips.rst

136568e

jnothman reviewed May 6, 2018

View reviewed changes

Joel's comments

d246ba3

jnothman approved these changes May 6, 2018

View reviewed changes

Rename _combine_checks -> _generate_checks_per_estimator

69ebbeb

jnothman changed the title ~~[MRG] MAINT Parametrize common estimator tests with pytest~~ [MRG+1] MAINT Parametrize common estimator tests with pytest May 7, 2018

glemaitre merged commit 67cc975 into scikit-learn:master May 7, 2018

rth deleted the common-tests-pytest branch May 7, 2018 10:45

rth mentioned this pull request May 7, 2018

[MRG+2] Pytest parametrize unit tests #11074

Merged

3 tasks

amueller mentioned this pull request May 24, 2018

Segfault in LinearSVR on master #11131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] MAINT Parametrize common estimator tests with pytest #11063

[MRG+1] MAINT Parametrize common estimator tests with pytest #11063

rth commented May 3, 2018 •

edited

Loading

rth May 3, 2018 •

edited

Loading

lesteve commented May 3, 2018 •

edited

Loading

lesteve May 3, 2018

rth May 3, 2018

rth commented May 3, 2018

rth commented May 3, 2018 •

edited

Loading

lesteve commented May 3, 2018

jnothman left a comment

jnothman May 3, 2018

rth May 4, 2018

jnothman May 6, 2018

rth commented May 4, 2018

jnothman May 6, 2018

jnothman May 6, 2018

rth May 6, 2018

rth commented May 6, 2018

jnothman May 6, 2018

rth May 7, 2018

glemaitre commented May 7, 2018

rth commented May 7, 2018

glemaitre commented May 7, 2018

glemaitre commented May 7, 2018

glemaitre commented May 7, 2018

lesteve commented May 7, 2018 •

edited

Loading

rth commented May 7, 2018

amueller commented May 19, 2018

		yield name, Estimator


		def _carthesian_product_checks(check_generator, estimators):

		@@ -49,26 +51,52 @@ def test_all_estimators():
		# properly
		assert_greater(len(estimators), 0)


		pytest sklearn/tests/test_common.py -v -k LogisticRegression

		will run all common tests for the ``LogisticRegression`` estimator.

		yield name, Estimator


		def _combine_checks(check_generator, estimators):

[MRG+1] MAINT Parametrize common estimator tests with pytest #11063

[MRG+1] MAINT Parametrize common estimator tests with pytest #11063

Conversation

rth commented May 3, 2018 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

rth May 3, 2018 • edited Loading

Choose a reason for hiding this comment

lesteve commented May 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth commented May 3, 2018

rth commented May 3, 2018 • edited Loading

lesteve commented May 3, 2018

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth commented May 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth commented May 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented May 7, 2018

rth commented May 7, 2018

glemaitre commented May 7, 2018

glemaitre commented May 7, 2018

glemaitre commented May 7, 2018

lesteve commented May 7, 2018 • edited Loading

rth commented May 7, 2018

amueller commented May 19, 2018

rth commented May 3, 2018 •

edited

Loading

rth May 3, 2018 •

edited

Loading

lesteve commented May 3, 2018 •

edited

Loading

rth commented May 3, 2018 •

edited

Loading

lesteve commented May 7, 2018 •

edited

Loading