[MRG] ENH Disassemble check estimator #14381

thomasjpfan · 2019-07-16T15:04:20Z

Reference Issues/PRs

Fixes #11622
Alternative to #13843

What does this implement/fix? Explain your changes.

Libraries will be able to write the following to run their tests:

from itertools import chain

@pytest.mark.parameterize('estimator, check',
    chain.from_iterable(check_estimator(est, generate_only=False)
                        for est in estimators))
def test_sklearn_compatible_estimator(estimator, check):
    check(estimator)

Any other comments?

The name argument of the checks are set by check_estimator.

rth

Thanks! Please add a what's new entry as well.

rth · 2019-07-16T15:08:55Z

sklearn/utils/estimator_checks.py

+    generate_only : bool, optional (default=True)
+        When `True`, checks are evaluated when `check_estimator` is called.
+        When `False`, `check_estimator` generates the checks and the estimator.
+


Please add versionadded

rth · 2019-07-16T15:10:56Z

sklearn/tests/test_common.py

@@ -98,17 +100,15 @@ def _rename_partial(val):

 @pytest.mark.parametrize(
        "estimator, check",
-        _generate_checks_per_estimator(_yield_all_checks,


We can remove this function now?

rth · 2019-07-16T15:14:41Z

sklearn/utils/estimator_checks.py

+        @pytest.mark.parameterize(
+            'estimator, check',
+            chain.from_iterable(check_estimator(est, generate_only=False)
+                                for est in estimators))


One issues with this is that because estimator is an instance and not a class it's not going to be rendered nicely in the list of tests. That's why we use id=_rename_partial. I guess there is no easy way of fixing this for users...

Without our _rename_partial, pytest names everything estimator213-check41. It could be slightly better if we yield the name as well from check_estimator, which will result in ARDRegression-estimator211-check21.

I think we should really try to have the tests have sensible names. This is super useless otherwise I think.

Why not fix the fixme in _rename_partial and move it to estimator_checks and make it public and use it here? That solves the problem, right?

Using print_changed_only adds newlines sometimes, which leads to pytest names with newlines in them.

Making _rename_partial is a good idea, it would need another name, maybe nicer_check_estimator_id? 🤔

how about using print_changed_only and remove the newlines? Also: if you're testing stuff that's that deeply nested there's no way to give it a good name.

thomasjpfan · 2019-07-16T20:36:28Z

We can not use yield in check_estimator because python will automatically conver the function to a generator. This happens even if the yield is guarded by an if.

This PR was updated to return a list of checks, rather than a generator.

rth

Thanks @thomasjpfan !

amueller · 2019-07-19T19:45:56Z

sklearn/utils/estimator_checks.py

+
+        from itertools import chain
+
+        @pytest.mark.parameterize(


Suggested change

@pytest.mark.parameterize(

@pytest.mark.parametrize(

sklearn/utils/estimator_checks.py

amueller · 2019-07-19T21:02:31Z

looks good a part from typos and setting the ids.

…_estimator

amueller · 2019-07-22T15:09:15Z

sklearn/utils/estimator_checks.py

@@ -265,7 +265,31 @@ def _yield_all_checks(name, estimator):
    yield check_fit_idempotent


-def check_estimator(Estimator):
+def readable_check_estimator_ids(val):


maybe just set_check_estimator_ids or make_check_estimator_ids?

amueller · 2019-07-22T15:13:59Z

sklearn/utils/estimator_checks.py

@@ -265,7 +265,31 @@ def _yield_all_checks(name, estimator):
    yield check_fit_idempotent


-def check_estimator(Estimator):
+def readable_check_estimator_ids(val):
+    """Create readable pytest ids for `check_estimator` when


It's also used internally without reference to check_estimator.

How about

"""Create pytest ids for checks. Returns string representations for pytest tests, to be used as id. Use together with ``check_estimator(..., generate_only=True)`` """

Though that makes documenting val possibly more difficult. But on the other hand you could just directly document what val needs to be instead of defining it implicitly.

Also: missing Returns section.

amueller

It would be nice if you actually tested the code that you recommend others will use ;)

amueller · 2019-07-22T15:17:38Z

sklearn/utils/estimator_checks.py

+
+        from itertools import chain
+        import pytest
+        from sklearn.utils.estimator_check import check_estimator


Suggested change

from sklearn.utils.estimator_check import check_estimator

from sklearn.utils.estimator_checks import check_estimator

amueller · 2019-07-22T15:17:47Z

sklearn/utils/estimator_checks.py

+        from itertools import chain
+        import pytest
+        from sklearn.utils.estimator_check import check_estimator
+        from sklearn.utils.estimator_check import readable_check_estimator_ids


Suggested change

from sklearn.utils.estimator_check import readable_check_estimator_ids

from sklearn.utils.estimator_checks import readable_check_estimator_ids

amueller · 2019-07-22T16:50:34Z

sklearn/utils/tests/test_estimator_checks.py

+    pass
+
+
+@pytest.mark.parametrize("val, expected", [


this seems to contradict the statement below ;)

amueller · 2019-07-22T16:53:12Z

Looks good apart from the pytest soft dependency issue and the name of the function.
I would replace "readable", or just remove it, because it doesn't really add any information.

jnothman

WDYT of adding this syntactic sugar?

def parametrize_with_checks(estimators):
    if hasattr(estimators, 'fit'):
        estimators = [estimators]
    return pytest.mark.parametrize(
        ['estimator', 'check'],
        check_estimator(estimator, generate_only=True),
        ids=set_check_estimator_ids)

Then

from sklearn.utils.estimator_checks import parametrize_with_checks
from sklearn.linear_model import LogisticRegression

@parametrize_with_checks(LogisticRegression)
def test_sklearn_compatible_estimator(estimator, check):
    check(estimator)

Then we could make set_check_estimator_ids private.

rth · 2019-08-22T07:59:43Z

WDYT of adding this syntactic sugar?

+1 though maybe with a itertools.chain somewhere in parametrize_with_checks as check_estimator still doesn't support lists of estimators I think?

jnothman · 2019-08-22T08:29:45Z

Ahh yes, forgot that part from my draft implementation!

jnothman · 2019-08-22T08:31:04Z

Having this interface actually means that we probably don't need the awkward thing where check_estimator generates the estimator as well as the checks.

rth · 2019-08-22T08:39:18Z

Having this interface actually means that we probably don't need the
awkward thing where check_estimator generates the estimator as well as the
checks.

parametrize_with_checks is highly interlinked with pytest (and depends on pytest), while check_estimator(estimator, generate_only=True), is much more general and could used for other applications.

For instance to programmatically generate a list of checks for a given application, manually skip a few selected ones and run the rest with pytest.parametrize. That's an important application for contrib projects. I think it would be worth keeping it.

jnothman · 2019-08-22T08:47:31Z

My comment was about whether it should generate (estimator, check) pairs or just check.

thomasjpfan · 2019-08-23T20:11:21Z

Currently we still need the estimator passed back because of how we do set_checking_parameters 😅

jnothman · 2019-08-24T13:52:45Z

Currently we still need the estimator passed back because of how we do set_checking_parameters 😅

I don't actually think we mean the same thing...

But yes, it's easier to have check_estimator generate pairs also to be unambiguous in the case of classes vs instances, etc.

thomasjpfan · 2019-08-26T20:48:48Z

Overall I like having the syntactic sugar, but the hard dependency on pytest is a little unsettling.

My comment was about whether it should generate (estimator, check) pairs or
just check.

@jnothman Can you expand on this?

jnothman · 2019-08-26T21:10:31Z

It's not a hard dependency, it's a soft dependency that is required for those who want to use this helper. If check_estimator generates check functions only you can still construct a generator within parametrize_with_checks that parameterizes the tests with (estimator, check) pairs. Can give an implementation example another time if you wish.

…_estimator

thomasjpfan · 2019-08-26T23:19:16Z

Thank you for the clarification! As you said, the concern with only generating the checks is that it becomes harder to distinguish the class checks and the instance checks.

I will move forward with the parametrize_with_checks decorator, we would need to document how it names the tests since set_check_estimator_ids will become private.

jnothman · 2019-08-26T23:41:58Z

I would assume that the target audience here will be happy to work out the naming pattern, but sure!

…_estimator

jnothman

It might be nice to mention the new decorator in the developer guide.

jnothman · 2019-08-27T20:49:39Z

@rth let's merge?

rth

Thanks again @thomasjpfan !

jnothman · 2019-08-28T06:12:21Z

Yay! Well done everyone!

amueller · 2019-09-03T15:49:38Z

Awesome!

thomasjpfan added 2 commits July 16, 2019 10:52

ENH Disassemble check estimator

a0d3735

DOC Update

c8a66ce

rth reviewed Jul 16, 2019

View reviewed changes

thomasjpfan added 3 commits July 16, 2019 11:21

DOC Adds whats new

31c5d21

CLN Removes _generate_checks_per_estimator

a79c374

WIP

b3664b3

thomasjpfan changed the title ~~[MRG] ENH Disassemble check estimator~~ [WIP] ENH Disassemble check estimator Jul 16, 2019

ENH Returns list

a48210b

thomasjpfan changed the title ~~[WIP] ENH Disassemble check estimator~~ [MRG] ENH Disassemble check estimator Jul 16, 2019

rth mentioned this pull request Jul 17, 2019

[MRG + 1] TST test that all default arguments are not mutable #4379

Merged

rth approved these changes Jul 19, 2019

View reviewed changes

amueller reviewed Jul 19, 2019

View reviewed changes

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

thomasjpfan added 4 commits July 20, 2019 13:49

CLN Spelling mistakes

f70a89e

DOC Adds check_estimator import

ff1aba2

ENH Makes readable_check_estimator_ids public

aad74aa

Merge remote-tracking branch 'upstream/master' into disassemble_check…

eb2ed8b

…_estimator

amueller reviewed Jul 22, 2019

View reviewed changes

thomasjpfan added 4 commits July 22, 2019 11:30

TST Adds tests for readable_check_estimator_ids

aa6411c

STY Flake

ee1335b

DOC Update names

a1297a9

TST Adds position argument test

b65c615

amueller reviewed Jul 22, 2019

View reviewed changes

RFC Move tests to test_common

1a5fd07

thomasjpfan added 3 commits August 21, 2019 16:00

DOC Address rth's comments

9cb1074

CLN Removes try except

578e8ce

CLN Address rth comments

649023f

jnothman reviewed Aug 21, 2019

View reviewed changes

CLN Removes try except

204e01c

amueller mentioned this pull request Aug 22, 2019

MNT Add estimator check for not calling __array_function__ #14702

Merged

amueller added the High Priority High priority issues and pull requests label Aug 26, 2019

Merge remote-tracking branch 'upstream/master' into disassemble_check…

d6ad9ad

…_estimator

thomasjpfan added 4 commits August 27, 2019 01:08

ENH Adds some sugar

bad253d

Merge remote-tracking branch 'upstream/master' into disassemble_check…

865484d

…_estimator

DOC Adds doc entry for sugar

ca9c786

DOC Updates whats_new

488d378

jnothman approved these changes Aug 27, 2019

View reviewed changes

DOC Adds generate_only=True in develop docs

2218021

rth approved these changes Aug 28, 2019

View reviewed changes

rth merged commit d9a12aa into scikit-learn:master Aug 28, 2019

	from sklearn.utils.estimator_check import check_estimator
	from sklearn.utils.estimator_checks import check_estimator

Uh oh!

[MRG] ENH Disassemble check estimator #14381

[MRG] ENH Disassemble check estimator #14381

Uh oh!

Conversation

thomasjpfan commented Jul 16, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Jul 16, 2019

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amueller commented Jul 19, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Jul 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

rth commented Aug 22, 2019

Uh oh!

jnothman commented Aug 22, 2019 via email

Uh oh!

jnothman commented Aug 22, 2019 via email

Uh oh!

rth commented Aug 22, 2019

Uh oh!

jnothman commented Aug 22, 2019 via email

Uh oh!

thomasjpfan commented Aug 23, 2019

Uh oh!

jnothman commented Aug 24, 2019

Uh oh!

thomasjpfan commented Aug 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Aug 26, 2019 via email

Uh oh!

thomasjpfan commented Aug 26, 2019

Uh oh!

jnothman commented Aug 26, 2019 via email

amueller commented Jul 22, 2019 •

edited

Loading

thomasjpfan commented Aug 26, 2019 •

edited

Loading