Skip to content

[WIP] Verbose flag displaying progress bar for check_estimator in sklearn.utils.estimator_checks #13843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

scouvreur
Copy link
Contributor

@scouvreur scouvreur commented May 9, 2019

Reference Issues/PRs

Enhancement proposal suggested by @cod3licious in issue #13748. Also related to issue #11622.

What does this implement/fix? Explain your changes.

As suggested by @jnothman :

  • create 'verbose' parameter in check_estimator function with default value False
  • create version of check_estimator that generates tests for the pytest collector

Created 'verbose' parameter in check_estimator function
from sklearn.utils.estimator_checks with default value
False.
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@jnothman
Copy link
Member

Please pursue step 2; step 1 is unnecessary, and we've correspondingly closed #13748

@scouvreur
Copy link
Contributor Author

Thanks @jnothman - do you have any pointers on how I could get started on step 2 ? Or any pytest resources that could point me in the right direction to generate the tests for the collector ?

@scouvreur
Copy link
Contributor Author

Sorry guys - is there any resource you could point me to to create tests for the pystest collector ?

@amueller
Copy link
Member

amueller commented Jun 7, 2019

@scouvreur
Copy link
Contributor Author

Thanks @amueller ! I will look into it

Pytest runner added inside the check_estimator
function, still need to generate tests for the
collector.

On branch check_estimator_progress_verbose_mode
Your branch is up to date with 'origin/check_estimator_progress_verbose_mode'.

Changes to be committed:
	modified:   estimator_checks.py
Merge remote-tracking branch 'upstream/master'
into check_estimator_progress_verbose_mode.
Merge branch 'check_estimator_progress_verbose' of
github.com:scouvreur/scikit-learn into
check_estimator_progress_verbose_mode.
@jnothman
Copy link
Member

I wouldn't think we want to make use of pytest.run here.

I think we already have a usable solution internal to scikit-learn here:

def _generate_checks_per_estimator(check_generator, estimators):
with ignore_warnings(category=(DeprecationWarning, FutureWarning)):
for name, estimator in estimators:
for check in check_generator(name, estimator):
yield estimator, check
def _rename_partial(val):
if isinstance(val, functools.partial):
kwstring = "".join(["{}={}".format(k, v)
for k, v in val.keywords.items()])
return "{}({})".format(val.func.__name__, kwstring)
# FIXME once we have short reprs we can use them here!
if hasattr(val, "get_params") and not isinstance(val, type):
return type(val).__name__
@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,
_tested_estimators()),
ids=_rename_partial
)
def test_estimators(estimator, check):
# Common tests for estimator instances
with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
UserWarning, FutureWarning)):
set_checking_parameters(estimator)
name = estimator.__class__.__name__
check(name, estimator)

but at the moment it relies on the internal _yield_all_checks. If we made that public, we could provide a pytest hook that would improve the syntax and allow for that test_estimators to be reduced to

@pytest.mark.estimator_checks(_tested_estimators())
def test_estimators(estimator, check):
    # Common tests for estimator instances
    with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
                                   UserWarning, FutureWarning)):
        set_checking_parameters(estimator)
        name = estimator.__class__.__name__
        check(name, estimator)

or something even more succinct, such as not needing the pytest.mark syntax.

This assumes that it's fine to construct the estimator and identify which checks it qualifies at collection time (we currently do this in scikit-learn). This may not be efficient. To avoid constructing the estimator at collection time, you'd need to generate all the checks and skip those combinations that are inapplicable at test (not collection) time. Here one option would be to create an estimator_check fixture that is itself parametrized to consider each check a different collected item.

I'm happy with the former approach - a public alternative to our current use of _generate_checks_per_estimator - for now.

@scouvreur
Copy link
Contributor Author

Thanks for the feedback @jnothman - I will work on that and update you with progress !

scouvreur added 3 commits July 8, 2019 16:58
Changed _yield_all_checks to a public
method and changed references to its
private version.
@amueller
Copy link
Member

@jnothman I guess I wanted it to be a function, your solution would require the users to write a test, right? Which might be more sensible but not be a direct replacement for check_estimator.

@jnothman
Copy link
Member

I guess I wanted it to be a function, your solution would require the users to write a test, right? Which might be more sensible but not be a direct replacement for check_estimator.

check_estimator needs to be called in a test for library developers already, so I don't see what's changed. We could add magic like

test_my_estimator = make_check_estimator_test(MyEstimator())

but I don't really see the benefit.

@jnothman
Copy link
Member

If you really want to follow the approach I suggested above (not sure if it's the right thing to do), you will need to implement something much like https://github.com/pytest-dev/pytest/blob/2c402f4bd9cff2c6faeccb86a97364e1fa122a16/src/_pytest/python.py#L119-L128

@jnothman
Copy link
Member

but actually that is probably a nuisance since scikit-learn would have to provide a pytest plugin, which seems a bit silly? Maybe better off just having users do:

@pytest.mark.parametrize('check', yield_all_checks)
@pytest.mark.parametrize('estimator', _tested_estimators())
def test_estimators(estimator, check):
    # Common tests for estimator instances
    with ignore_warnings(category=(DeprecationWarning, ConvergenceWarning,
                                   UserWarning, FutureWarning)):
        set_checking_parameters(estimator)
        name = estimator.__class__.__name__
        check(name, estimator)

@rth
Copy link
Member

rth commented Jul 10, 2019

@pytest.mark.parametrize('check', yield_all_checks)
@pytest.mark.parametrize('estimator', _tested_estimators())

The limitation of this is that checks are estimator dependent, and having lots of skipped tests that will never run is also not ideal.

The solution proposed in #13843 (comment) sounds reasonable. It's fairly close to the one proposed in #11622 (comment)

ALL_ESTIMATORS = [Estimator1, Estimator2, etc]

@pytest.mark.parametrize(
    'check, name, estimator',
    itertools.chain.from_iterable(
         check_estimator(Estimator, evaluate=False)
         for Estimator in ALL_ESTIMATORS
    )
)

(maybe with an additional helper function), where check_estimator(Estimator, evaluate=False) yields checks for that estimator.

@jnothman
Copy link
Member

Right. Thanks for the reminder that we are trying to not overwhelm it with skips.

@amueller
Copy link
Member

Fair, let's go with something like that. Seems pretty straight-forward, right?

@jnothman
Copy link
Member

I think we should have generate_only=True rather than evaluate=False, but yes, it's a good solution as far as I'm concerned.

Added bool flag for check_estimators() and
updated pytest decorator for test_estimators().

Changes to be committed:
	modified:   tests/test_common.py
	modified:   utils/estimator_checks.py
@amueller
Copy link
Member

@scouvreur I think we're going with #14381 right now. Not sure if there has been some miscommunication anywhere? cc @thomasjpfan

@thomasjpfan
Copy link
Member

Yes, it seems we are leaning toward #14381. Thank you for working on this issue @scouvreur !

@scouvreur
Copy link
Contributor Author

Ah I see @amueller - should I close this PR then ?

@scouvreur scouvreur deleted the check_estimator_progress_verbose_mode branch July 29, 2019 10:47
@jnothman
Copy link
Member

jnothman commented Jul 29, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants