check_estimator is not sufficiently general #6715

jakevdp · 2016-04-25T17:52:26Z

I've been writing a lot of scikit-learn compatible code lately, and I love the idea of the general checks in check_estimator to make sure that my code is scikit-learn compatible. But in nearly all cases, I'm finding that these checks are not general enough for the objects I'm creating (for example: MSTClutering, which fails because it doesn't support specification of the number of clusters through an n_clusters argument)

Digging through the code, there are a lot of hard-coded special-cases of estimators within scikit-learn itself; this would imply that absent those special-cases scikit-learn's own estimators would not pass the general estimator checks, which is obviously a huge issue.

Making these checks significantly general would be hard; I imagine it would be a rather large project, and probably even involve designing an API so that estimators themselves can tune the checks that are run on them.

In the meantime, I think it would be better for us not to recommend people running this code on their own estimators, or to let them know that failing estimator checks do not necessarily imply non-compatibility.

The text was updated successfully, but these errors were encountered:

mblondel · 2016-04-26T09:26:40Z

Agreed that the estimator checks are not completely ready yet.

and probably even involve designing an API so that estimators themselves can tune the checks that are run on them

The plan is to address this using estimator tags
#6599

jnothman · 2016-04-26T09:47:48Z

The point is, perhaps, to make clearer at scikit-learn-contrib that
estimator_checks is a work in progress. That's why I suggested we note at
scikit-learn-contrib that necessary failures should be reported as
scikit-learn issues.

On 26 April 2016 at 19:26, Mathieu Blondel notifications@github.com wrote:

Agreed that the estimator checks are not completely ready yet.

and probably even involve designing an API so that estimators themselves
can tune the checks that are run on them

The plan is to address this using estimator tags
#6599 #6599

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#6715 (comment)

rth · 2016-09-25T12:40:24Z

To put some numbers on this issue, at present 41 estimators out of 147 do not pass the check_estimator validation in scikit learn. Out of 41 validations that fail (+4 skipped), 29 are masked in sklearn.utils.testing.DONT_TEST and 2 are private classes (and shouldn't probably be validated anyway). Full output here.

Tested with,

validate_estimators.py

from sklearn.utils.testing import all_estimators
from sklearn.utils.estimator_checks import check_estimator
import pytest


@pytest.mark.parametrize('name,estimator',
          all_estimators(include_meta_estimators=True,
              include_dont_test=True))
def test_estimator(name, estimator):
    check_estimator(estimator)

using py.test -v --tb=no validate_estimators.py on the code in the master branch (PY3, > sklearn-0.18rc2, Linux).

amueller · 2016-09-30T01:09:22Z

@rth I'd argue you should should do include_meta_estimators=False. On the other hand, that could be another "tag". Though that could be done with inheritance, too.

amueller · 2016-09-30T01:12:07Z

also, finally a minimal example of pytest parametrized testing that I can understand. That's quite nice - compare against the current mess in check_estimators and test_common.

rth · 2019-02-24T07:52:52Z

@qinhanmin2014 I don't think this is fully solved.

Estimator tags do help, but I'm not sure they fully resolve this. #11622 could also be relevant.

qinhanmin2014 · 2019-02-24T08:14:17Z

I'm following amueller, he said Fixes #6599, #7737, #6554, #6715 in the estimator tag PR. Reopen if you disagree. (Honestly I don't have a deep understanding of these issues/PRs)

jnothman · 2019-02-24T16:45:43Z

We should probably reopen this with the intention to split off key remaining issues

rth · 2019-02-25T07:46:20Z

Things that remain IMO are,

check if there are still some scikit-learn estimators that do not pass check_estimator (cf check_estimator is not sufficiently general #6715 (comment)) and if is the case adjust their tags accordingly.
some feedback from scikit-learn contrib on using estimator tags would be useful cc @glemaitre
in my opinion, a binary outcome from check_estimators is not ideal to use from contrib / third party projects. It's much more realistic to say that N of those tests pass and 2 fail for known reasons (but someone is working on those) than an all or nothing approach. This is for instance what we are doing internally in test_non_meta_estimators instead of calling check_estimator. Allow disassembled use of check_estimator #11622 is not ideal, but it is a way to address that.

jnothman · 2019-02-25T10:49:09Z

One of the big remaining issues IMO is the selection of appropriate data for different estimators. And perhaps a convenient way to specify a series of parameter settings to test with (esp. with meta-estimators), but at least this can be done with explicit calling of check_estimator.

amueller · 2019-02-26T11:35:08Z

yes, I think the last one can be done with explicit calls for now. The bigger issue is that none of the meta estimators fulfill the API contracts at all #9741.

There's a couple of tags that are missing, like positive and sparse data, and then there's generating 1d data, text data etc in the tests.

I'm not sure I understand @rth's last point. I guess we're not giving individual failures now but we should? That's "just" a matter of yielding the tests, right? Or does pytest allow us to run functions as tests?

amueller · 2019-02-26T11:39:36Z

(doesn't look like you can run pytests as a function)

amueller · 2019-02-26T18:10:19Z

We can run pytest.main() and write a hook to overwrite the pytest collection mechanism. That way someone could do check_estimator(MyEstimator) and you'd get the output of the failed and passed and skipped tests as if running pytest.

trevorstephens · 2019-03-24T08:55:34Z

Chiming in! I'm working on an expansion to gplearn for classification and it seems that binary-only classification is an immediate fail for these checks? Is that intentional?

trevorstephens · 2019-03-24T09:00:06Z

In which case, I guess I should either modify the checks, or run checks individually? That's feeling awkward. Should the little winks and nods to sklearn estimators scattered throughout this module be generalised such that independent package maintainers can do something similar without a PR on sklearn?

trevorstephens · 2019-03-24T09:16:06Z

If I change this to allow for something more elaborate than binary classification https://github.com/trevorstephens/gplearn/pull/141/files#diff-e868c30a1191845a36e17737e4611da8R289 I can get a lot further through the test suite ... But eventually fail against a three label problem

NicolasHug · 2019-03-24T14:28:04Z

This is only a hacky workaround @trevorstephens but if that helps, in pygbm we ended up creating our own check_estimator() that skips test that cannot succeed (see here)

trevorstephens · 2019-03-24T20:54:00Z

Thanks @NicolasHug I'll do something similar for now then.

jnothman · 2019-03-25T07:08:39Z

it seems that binary-only classification is required to pass these checks

Could you clarify what you mean by this? I think now that we have "estimator tags" controlling which checks are applicable to which estimators, it would be appropriate to open a separate issue for this, clearly explaining / giving an example for what estimator features / limitations are not catered to.

trevorstephens · 2019-03-25T07:13:57Z

Sorry was typing too fast @jnothman and inverted what I was trying to say... Github comment was corrected to

it seems that binary-only classification is an immediate fail for these checks

Sorry for the confusion. My estimator is purely for binary classification and it fails about half the tests because they assume multi-label classification. I may be a bit behind these vogue new tag things though. Shall research.

jnothman · 2019-03-25T07:44:16Z

TL;DR: we should remove that requirement. Historically we have considered the requirement that any binary classifier in scikit-learn should support multiclass at least with ovr. We violated that with CalibratedClassifierCV and it makes sense to do so for other probabilistic classifiers. Yes it should be possible for every binary classifier to support multilabel using binary relevance, but I don't see why we should force this when it can add a lot of maintenance complexity. A pr fixing this soon might yet make it into 0.21...

trevorstephens · 2019-03-25T07:54:49Z

Fair enough to have standards on scikit-learn's own native estimators, not really wanting to push on that here. It was just very unexpected that support of multi-class is required for passing scikit-learn compatibility checks. Acceptance into the main package as a first-class estimator, and compatibility with scikit-learn, are kinda different things.

trevorstephens · 2019-03-25T08:01:15Z

If there's some appetite to rethink some of these tests to relieve that requirement, and enough core team support, I might be able to help with that PR since it looks like I'm going to be re-writing a few hundred lines of test code for myself anyhow @jnothman :-D

bladeoflight16 · 2019-11-07T07:08:02Z

Binary classifiers are a fantastic example for consideration. These violate the "normal" expectation that multiclass problems are supported, but they can still be particularly useful for a specific problem. I'm also dealing with trying to create a classifier that has unusual class restrictions (much more unusual than binary), and because check_estimator has no mechanism to allow for me to specify classes that meet them, I can't validate compliance with any of the standards. The fact there's no list I can actually read makes it worse, since (being rather new to scikit-learn) I don't really have any hints as to what they are to even eyeball that I think I'm following them.

I think what's probably really needed here is to identify the requirements according to what functionality an estimator class needs to support. E.g., what's required for use with cross_validate? What's required to support a meta learner like MultiOutputRegressor? Or for ensemble learners like BaggingClassifier? Or piplines? Once the requirements for these different pieces of functionality are identified, the checks can be encoded more cleanly. Separate methods might make sense (like a bare minimum check method, which might need to allow certain behaviors to be filled in via arguments, and then a full blown one for testing models included in scikit-learn, and maybe meta learners get a separate one from regular learns).

To put it another way: it's not just that the method isn't general enough. It just makes way too many assumptions about the use cases a model needs to support to be of any use to a third party. If you want stricter standards inside scikit-learn, that's fine and maybe even great, but the rest of us writing our own bespoked models to deal with unusual situations need something to be able to give us some confidence that it will work with most scikit code requiring a model as input.

jnothman · 2019-11-07T07:55:46Z

Thanks for the feedback. Yes, the suite of checks has grown organically and is not documented. The new parametrize_with_checks in version 0.22 (and the nightly builds) will help you list which checks are failing. Can you please give us this list and share your use case more specifically?

trevorstephens · 2020-02-13T11:27:49Z

560 lines of test code removed from my package based on catching up to latest sklearn version. Awesome! and thank you for the support of us downstream packages. Much appreciated.

amueller · 2021-12-10T17:09:14Z

I think we can close this issue now, if there's specific issues that the current tags don't address they should get dedicated issues.

glemaitre · 2022-07-29T09:13:16Z

+1 with @amueller. We still have to improve the common test for sure.

jnothman mentioned this issue Jul 13, 2016

check_estimator should allow for binary-only classification #6981

Closed

jnothman changed the title ~~estimator_checks are not sufficiently general~~ check_estimator is not sufficiently general Aug 11, 2016

jnothman mentioned this issue Aug 11, 2016

[MRG+1] Add spherecluster related package #7172

Merged

jnothman mentioned this issue Aug 29, 2016

Ability to specify parameters for common tests #7289

Closed

This was referenced Oct 24, 2016

check_estimator should allow NaNs? #7737

Closed

MetaEstimators hard-coded in common tests. #6079

Closed

jnothman mentioned this issue Nov 16, 2016

Bayesian Logistic Regression / Logistic Regression with ARD #6259

Closed

rth mentioned this issue Dec 5, 2016

scikit-learn style krige parameter optimisation GeoStat-Framework/PyKrige#24

Merged

amueller mentioned this issue Dec 12, 2016

[MRG] Estimator tags #8022

Merged

4 tasks

rth mentioned this issue Jun 8, 2017

Ensure estimators pass check_estimator with default parameters? #9050

Closed

GKjohns mentioned this issue Dec 6, 2017

[WIP] Estimator Tags #10264

Closed

rth mentioned this issue Apr 28, 2018

yield tests are deprecated in pytest #10728

Closed

rth mentioned this issue Jul 24, 2018

Allow disassembled use of check_estimator #11622

Closed

qinhanmin2014 closed this as completed Feb 24, 2019

rth reopened this Feb 25, 2019

trevorstephens mentioned this issue Mar 27, 2019

Remove custom estimator checks when sklearn removes multi-class requirement trevorstephens/gplearn#147

Closed

trevorstephens mentioned this issue May 14, 2019

ENH Binary only estimator checks for classification #13875

Merged

rth mentioned this issue May 30, 2019

Add non-strict mode to check_estimator #13969

Closed

amueller mentioned this issue Jun 10, 2019

All tests in check_estimator are disabled if X_types does not include "2darray" #14057

Open

amueller mentioned this issue Aug 21, 2019

More generic estimator checks #14712

Closed

kasparthommen mentioned this issue Jan 30, 2020

check_estimator assumes too much about 3rd-party estimators #16311

Closed

thomasjpfan mentioned this issue Jan 31, 2020

[MRG] Adds XFAIL/XPASS to common tests #16328

Closed

rth mentioned this issue Feb 20, 2020

Run scikit-learn common tests for image.preprocessing.Binarizer, Inverter giotto-ai/giotto-tda#319

Merged

rth mentioned this issue Mar 5, 2020

Refactor common tests in tslearn/tslearn/tests/test_estimators.py tslearn-team/tslearn#194

Open

rth mentioned this issue Mar 26, 2020

[RFC] compatibility with scikit-learn microsoft/LightGBM#2628

Closed

cmarmo added module:utils Needs Decision - Close Requires decision for closing labels Dec 10, 2021

glemaitre closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check_estimator is not sufficiently general #6715

check_estimator is not sufficiently general #6715

jakevdp commented Apr 25, 2016 •

edited by jnothman

Loading

mblondel commented Apr 26, 2016

jnothman commented Apr 26, 2016

rth commented Sep 25, 2016

amueller commented Sep 30, 2016

amueller commented Sep 30, 2016

rth commented Feb 24, 2019

qinhanmin2014 commented Feb 24, 2019

jnothman commented Feb 24, 2019 via email

rth commented Feb 25, 2019

jnothman commented Feb 25, 2019 via email

amueller commented Feb 26, 2019

amueller commented Feb 26, 2019

amueller commented Feb 26, 2019

trevorstephens commented Mar 24, 2019 •

edited

Loading

trevorstephens commented Mar 24, 2019

trevorstephens commented Mar 24, 2019

NicolasHug commented Mar 24, 2019

trevorstephens commented Mar 24, 2019

jnothman commented Mar 25, 2019

trevorstephens commented Mar 25, 2019

jnothman commented Mar 25, 2019 via email

trevorstephens commented Mar 25, 2019

trevorstephens commented Mar 25, 2019

bladeoflight16 commented Nov 7, 2019 •

edited

Loading

jnothman commented Nov 7, 2019 via email

trevorstephens commented Feb 13, 2020

amueller commented Dec 10, 2021

glemaitre commented Jul 29, 2022

check_estimator is not sufficiently general #6715

check_estimator is not sufficiently general #6715

Comments

jakevdp commented Apr 25, 2016 • edited by jnothman Loading

mblondel commented Apr 26, 2016

jnothman commented Apr 26, 2016

rth commented Sep 25, 2016

amueller commented Sep 30, 2016

amueller commented Sep 30, 2016

rth commented Feb 24, 2019

qinhanmin2014 commented Feb 24, 2019

jnothman commented Feb 24, 2019 via email

rth commented Feb 25, 2019

jnothman commented Feb 25, 2019 via email

amueller commented Feb 26, 2019

amueller commented Feb 26, 2019

amueller commented Feb 26, 2019

trevorstephens commented Mar 24, 2019 • edited Loading

trevorstephens commented Mar 24, 2019

trevorstephens commented Mar 24, 2019

NicolasHug commented Mar 24, 2019

trevorstephens commented Mar 24, 2019

jnothman commented Mar 25, 2019

trevorstephens commented Mar 25, 2019

jnothman commented Mar 25, 2019 via email

trevorstephens commented Mar 25, 2019

trevorstephens commented Mar 25, 2019

bladeoflight16 commented Nov 7, 2019 • edited Loading

jnothman commented Nov 7, 2019 via email

trevorstephens commented Feb 13, 2020

amueller commented Dec 10, 2021

glemaitre commented Jul 29, 2022

jakevdp commented Apr 25, 2016 •

edited by jnothman

Loading

trevorstephens commented Mar 24, 2019 •

edited

Loading

bladeoflight16 commented Nov 7, 2019 •

edited

Loading