[WIP] Verbose flag displaying progress bar for check_estimator in sklearn.utils.estimator_checks #13843

scouvreur · 2019-05-09T15:05:31Z

Reference Issues/PRs

Enhancement proposal suggested by @cod3licious in issue #13748. Also related to issue #11622.

What does this implement/fix? Explain your changes.

As suggested by @jnothman :

~~create 'verbose' parameter in check_estimator function with default value False~~
create version of check_estimator that generates tests for the pytest collector

Created 'verbose' parameter in check_estimator function from sklearn.utils.estimator_checks with default value False.

sklearn/utils/estimator_checks.py

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

jnothman · 2019-05-21T08:07:37Z

Please pursue step 2; step 1 is unnecessary, and we've correspondingly closed #13748

scouvreur · 2019-05-24T15:15:57Z

Thanks @jnothman - do you have any pointers on how I could get started on step 2 ? Or any pytest resources that could point me in the right direction to generate the tests for the collector ?

scouvreur · 2019-06-07T16:08:15Z

Sorry guys - is there any resource you could point me to to create tests for the pystest collector ?

amueller · 2019-06-07T16:25:47Z

I think this might be a good entry point:
https://docs.pytest.org/en/latest/usage.html#calling-pytest-from-python-code

Here's how to use hooks:
https://docs.pytest.org/en/latest/example/simple.html#incremental-testing-test-steps

I think you want to work with the collection hooks?
https://docs.pytest.org/en/latest/reference.html#collection-hooks

scouvreur · 2019-06-07T16:54:19Z

Thanks @amueller ! I will look into it

Pytest runner added inside the check_estimator function, still need to generate tests for the collector. On branch check_estimator_progress_verbose_mode Your branch is up to date with 'origin/check_estimator_progress_verbose_mode'. Changes to be committed: modified: estimator_checks.py

Merge remote-tracking branch 'upstream/master' into check_estimator_progress_verbose_mode.

Merge branch 'check_estimator_progress_verbose' of github.com:scouvreur/scikit-learn into check_estimator_progress_verbose_mode.

jnothman · 2019-06-19T13:27:45Z

I wouldn't think we want to make use of pytest.run here.

I think we already have a usable solution internal to scikit-learn here:

scikit-learn/sklearn/tests/test_common.py

Lines 84 to 113 in 197f448

    
           def _generate_checks_per_estimator(check_generator, estimators): 
        
               with ignore_warnings(category=(DeprecationWarning, FutureWarning)): 
        
                   for name, estimator in estimators: 
        
                       for check in check_generator(name, estimator): 
        
                           yield estimator, check 
        
           def _rename_partial(val): 
        
               if isinstance(val, functools.partial): 
        
                   kwstring = "".join(["{}={}".format(k, v) 
        
                                       for k, v in val.keywords.items()]) 
        
                   return "{}({})".format(val.func.__name__, kwstring) 
        
               # FIXME once we have short reprs we can use them here! 
        
               if hasattr(val, "get_params") and not isinstance(val, type): 
        
                   return type(val).__name__ 
        
           @pytest.mark.parametrize( 
        
                   "estimator, check", 
        
                   _generate_checks_per_estimator(_yield_all_checks, 
        
                                                  _tested_estimators()), 
        
                   ids=_rename_partial 
        
           ) 
        
           def test_estimators(estimator, check): 
        
               # Common tests for estimator instances 
        
               with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning, 
        
                                              UserWarning, FutureWarning)): 
        
                   set_checking_parameters(estimator) 
        
                   name = estimator.__class__.__name__ 
        
                   check(name, estimator)

but at the moment it relies on the internal _yield_all_checks. If we made that public, we could provide a pytest hook that would improve the syntax and allow for that test_estimators to be reduced to

@pytest.mark.estimator_checks(_tested_estimators())
def test_estimators(estimator, check):
    # Common tests for estimator instances
    with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
                                   UserWarning, FutureWarning)):
        set_checking_parameters(estimator)
        name = estimator.__class__.__name__
        check(name, estimator)

or something even more succinct, such as not needing the pytest.mark syntax.

This assumes that it's fine to construct the estimator and identify which checks it qualifies at collection time (we currently do this in scikit-learn). This may not be efficient. To avoid constructing the estimator at collection time, you'd need to generate all the checks and skip those combinations that are inapplicable at test (not collection) time. Here one option would be to create an estimator_check fixture that is itself parametrized to consider each check a different collected item.

I'm happy with the former approach - a public alternative to our current use of _generate_checks_per_estimator - for now.

scouvreur · 2019-06-24T16:35:50Z

Thanks for the feedback @jnothman - I will work on that and update you with progress !

Changed _yield_all_checks to a public method and changed references to its private version.

amueller · 2019-07-10T03:04:02Z

@jnothman I guess I wanted it to be a function, your solution would require the users to write a test, right? Which might be more sensible but not be a direct replacement for check_estimator.

jnothman · 2019-07-10T10:53:13Z

I guess I wanted it to be a function, your solution would require the users to write a test, right? Which might be more sensible but not be a direct replacement for check_estimator.

check_estimator needs to be called in a test for library developers already, so I don't see what's changed. We could add magic like

test_my_estimator = make_check_estimator_test(MyEstimator())

but I don't really see the benefit.

jnothman · 2019-07-10T10:59:52Z

If you really want to follow the approach I suggested above (not sure if it's the right thing to do), you will need to implement something much like https://github.com/pytest-dev/pytest/blob/2c402f4bd9cff2c6faeccb86a97364e1fa122a16/src/_pytest/python.py#L119-L128

jnothman · 2019-07-10T11:02:07Z

but actually that is probably a nuisance since scikit-learn would have to provide a pytest plugin, which seems a bit silly? Maybe better off just having users do:

@pytest.mark.parametrize('check', yield_all_checks)
@pytest.mark.parametrize('estimator', _tested_estimators())
def test_estimators(estimator, check):
    # Common tests for estimator instances
    with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
                                   UserWarning, FutureWarning)):
        set_checking_parameters(estimator)
        name = estimator.__class__.__name__
        check(name, estimator)

rth · 2019-07-10T11:35:31Z

@pytest.mark.parametrize('check', yield_all_checks)
@pytest.mark.parametrize('estimator', _tested_estimators())

The limitation of this is that checks are estimator dependent, and having lots of skipped tests that will never run is also not ideal.

The solution proposed in #13843 (comment) sounds reasonable. It's fairly close to the one proposed in #11622 (comment)

ALL_ESTIMATORS = [Estimator1, Estimator2, etc]

@pytest.mark.parametrize(
    'check, name, estimator',
    itertools.chain.from_iterable(
         check_estimator(Estimator, evaluate=False)
         for Estimator in ALL_ESTIMATORS
    )
)

(maybe with an additional helper function), where check_estimator(Estimator, evaluate=False) yields checks for that estimator.

jnothman · 2019-07-10T13:50:04Z

Right. Thanks for the reminder that we are trying to not overwhelm it with skips.

amueller · 2019-07-10T21:10:38Z

Fair, let's go with something like that. Seems pretty straight-forward, right?

jnothman · 2019-07-10T22:29:47Z

I think we should have generate_only=True rather than evaluate=False, but yes, it's a good solution as far as I'm concerned.

…rogress_verbose_mode

Added bool flag for check_estimators() and updated pytest decorator for test_estimators(). Changes to be committed: modified: tests/test_common.py modified: utils/estimator_checks.py

amueller · 2019-07-26T17:23:23Z

@scouvreur I think we're going with #14381 right now. Not sure if there has been some miscommunication anywhere? cc @thomasjpfan

thomasjpfan · 2019-07-26T20:51:41Z

Yes, it seems we are leaning toward #14381. Thank you for working on this issue @scouvreur !

scouvreur · 2019-07-29T09:50:36Z

Ah I see @amueller - should I close this PR then ?

jnothman · 2019-07-29T10:55:25Z

It's been closed. Thanks a lot for helping us nut this out.

Created stub for check_estimator verbose flag

75e71e7

Created 'verbose' parameter in check_estimator function from sklearn.utils.estimator_checks with default value False.

glemaitre reviewed May 9, 2019

View reviewed changes

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

Update sklearn/utils/estimator_checks.py

c96c992

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

scouvreur added 3 commits June 17, 2019 12:13

Merge remote-tracking branch 'upstream/master'

cab3fe1

Merge remote-tracking branch 'upstream/master' into check_estimator_progress_verbose_mode.

Merge branch 'check_estimator_progress_verbose'

367cd1c

Merge branch 'check_estimator_progress_verbose' of github.com:scouvreur/scikit-learn into check_estimator_progress_verbose_mode.

scouvreur added 3 commits July 8, 2019 16:58

Made _yield_all_checks public

c82677e

Changed _yield_all_checks to a public method and changed references to its private version.

Removed pytest import

8cc1a08

Updated import statement

a957270

thomasjpfan mentioned this pull request Jul 16, 2019

[MRG] ENH Disassemble check estimator #14381

Merged

scouvreur added 3 commits July 22, 2019 14:57

Merge remote-tracking branch 'upstream/master' into check_estimator_p…

e1be06f

…rogress_verbose_mode

Merge remote-tracking branch 'upstream/master' into check_estimator_p…

23c4f8e

…rogress_verbose_mode

Added Evaluate flag to check_estimators()

a9db592

Added bool flag for check_estimators() and updated pytest decorator for test_estimators(). Changes to be committed: modified: tests/test_common.py modified: utils/estimator_checks.py

thomasjpfan closed this Jul 26, 2019

scouvreur deleted the check_estimator_progress_verbose_mode branch July 29, 2019 10:47

Uh oh!

[WIP] Verbose flag displaying progress bar for check_estimator in sklearn.utils.estimator_checks #13843

[WIP] Verbose flag displaying progress bar for check_estimator in sklearn.utils.estimator_checks #13843

Uh oh!

Conversation

scouvreur commented May 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

Uh oh!

jnothman commented May 21, 2019

Uh oh!

scouvreur commented May 24, 2019

Uh oh!

scouvreur commented Jun 7, 2019

Uh oh!

amueller commented Jun 7, 2019

Uh oh!

scouvreur commented Jun 7, 2019

Uh oh!

jnothman commented Jun 19, 2019

Uh oh!

scouvreur commented Jun 24, 2019

Uh oh!

amueller commented Jul 10, 2019

Uh oh!

jnothman commented Jul 10, 2019

Uh oh!

jnothman commented Jul 10, 2019

Uh oh!

jnothman commented Jul 10, 2019

Uh oh!

rth commented Jul 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 10, 2019

Uh oh!

amueller commented Jul 10, 2019

Uh oh!

jnothman commented Jul 10, 2019

Uh oh!

amueller commented Jul 26, 2019

Uh oh!

thomasjpfan commented Jul 26, 2019

Uh oh!

scouvreur commented Jul 29, 2019

Uh oh!

jnothman commented Jul 29, 2019 via email

Uh oh!

Uh oh!

scouvreur commented May 9, 2019 •

edited

Loading

rth commented Jul 10, 2019 •

edited

Loading