-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
check_estimator is not sufficiently general #6715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Agreed that the estimator checks are not completely ready yet.
The plan is to address this using estimator tags |
The point is, perhaps, to make clearer at scikit-learn-contrib that On 26 April 2016 at 19:26, Mathieu Blondel notifications@github.com wrote:
|
To put some numbers on this issue, at present 41 estimators out of 147 do not pass the Tested with,
using |
@rth I'd argue you should should do |
also, finally a minimal example of pytest parametrized testing that I can understand. That's quite nice - compare against the current mess in check_estimators and test_common. |
@qinhanmin2014 I don't think this is fully solved. Estimator tags do help, but I'm not sure they fully resolve this. #11622 could also be relevant. |
I'm following amueller, he said |
We should probably reopen this with the intention to split off key
remaining issues
|
Things that remain IMO are,
|
One of the big remaining issues IMO is the selection of appropriate data
for different estimators. And perhaps a convenient way to specify a series
of parameter settings to test with (esp. with meta-estimators), but at
least this can be done with explicit calling of check_estimator.
|
yes, I think the last one can be done with explicit calls for now. The bigger issue is that none of the meta estimators fulfill the API contracts at all #9741. There's a couple of tags that are missing, like positive and sparse data, and then there's generating 1d data, text data etc in the tests. I'm not sure I understand @rth's last point. I guess we're not giving individual failures now but we should? That's "just" a matter of yielding the tests, right? Or does pytest allow us to run functions as tests? |
(doesn't look like you can run pytests as a function) |
We can run |
Chiming in! I'm working on an expansion to |
In which case, I guess I should either modify the checks, or run checks individually? That's feeling awkward. Should the little winks and nods to sklearn estimators scattered throughout this module be generalised such that independent package maintainers can do something similar without a PR on |
If I change this to allow for something more elaborate than binary classification https://github.com/trevorstephens/gplearn/pull/141/files#diff-e868c30a1191845a36e17737e4611da8R289 I can get a lot further through the test suite ... But eventually fail against a three label problem |
This is only a hacky workaround @trevorstephens but if that helps, in pygbm we ended up creating our own |
Thanks @NicolasHug I'll do something similar for now then. |
Could you clarify what you mean by this? I think now that we have "estimator tags" controlling which checks are applicable to which estimators, it would be appropriate to open a separate issue for this, clearly explaining / giving an example for what estimator features / limitations are not catered to. |
Sorry was typing too fast @jnothman and inverted what I was trying to say... Github comment was corrected to
Sorry for the confusion. My estimator is purely for binary classification and it fails about half the tests because they assume multi-label classification. I may be a bit behind these vogue new tag things though. Shall research. |
TL;DR: we should remove that requirement.
Historically we have considered the requirement that any binary classifier
in scikit-learn should support multiclass at least with ovr. We violated
that with CalibratedClassifierCV and it makes sense to do so for other
probabilistic classifiers. Yes it should be possible for every binary
classifier to support multilabel using binary relevance, but I don't see
why we should force this when it can add a lot of maintenance complexity.
A pr fixing this soon might yet make it into 0.21...
|
Fair enough to have standards on scikit-learn's own native estimators, not really wanting to push on that here. It was just very unexpected that support of multi-class is required for passing scikit-learn compatibility checks. Acceptance into the main package as a first-class estimator, and compatibility with scikit-learn, are kinda different things. |
If there's some appetite to rethink some of these tests to relieve that requirement, and enough core team support, I might be able to help with that PR since it looks like I'm going to be re-writing a few hundred lines of test code for myself anyhow @jnothman :-D |
Binary classifiers are a fantastic example for consideration. These violate the "normal" expectation that multiclass problems are supported, but they can still be particularly useful for a specific problem. I'm also dealing with trying to create a classifier that has unusual class restrictions (much more unusual than binary), and because I think what's probably really needed here is to identify the requirements according to what functionality an estimator class needs to support. E.g., what's required for use with To put it another way: it's not just that the method isn't general enough. It just makes way too many assumptions about the use cases a model needs to support to be of any use to a third party. If you want stricter standards inside scikit-learn, that's fine and maybe even great, but the rest of us writing our own bespoked models to deal with unusual situations need something to be able to give us some confidence that it will work with most scikit code requiring a model as input. |
Thanks for the feedback. Yes, the suite of checks has grown organically and
is not documented. The new parametrize_with_checks in version 0.22 (and
the nightly builds) will help you list which checks are failing. Can you
please give us this list and share your use case more specifically?
|
560 lines of test code removed from my package based on catching up to latest |
I think we can close this issue now, if there's specific issues that the current tags don't address they should get dedicated issues. |
+1 with @amueller. We still have to improve the common test for sure. |
I've been writing a lot of scikit-learn compatible code lately, and I love the idea of the general checks in
check_estimator
to make sure that my code is scikit-learn compatible. But in nearly all cases, I'm finding that these checks are not general enough for the objects I'm creating (for example:MSTClutering
, which fails because it doesn't support specification of the number of clusters through ann_clusters
argument)Digging through the code, there are a lot of hard-coded special-cases of estimators within scikit-learn itself; this would imply that absent those special-cases scikit-learn's own estimators would not pass the general estimator checks, which is obviously a huge issue.
Making these checks significantly general would be hard; I imagine it would be a rather large project, and probably even involve designing an API so that estimators themselves can tune the checks that are run on them.
In the meantime, I think it would be better for us not to recommend people running this code on their own estimators, or to let them know that failing estimator checks do not necessarily imply non-compatibility.
The text was updated successfully, but these errors were encountered: