[WIP] API specify test parameters via classmethod #11324

jnothman · 2018-06-20T07:40:43Z

This is an initial implementation of what I suggested in #8022 (comment) (ping @amueller).

The idea is to put the test configuration for an estimator class on the class. This provides:

advantage: can instantiate meta-estimators which have required args
advantage: can see the test parametrisation clearly on the class
disadvantage: code limiting max_iter, etc., is repetitive (this PR adds lines and decentralises functionality)
disadvantage: harder for reviewers to point out to a contributor that these changes are needed

The key changes are to base.py, test_common.py and estimator_checks.py

TODO:

documentation
use _generate_test_params when calling check_estimator on a class
replace check_parameters_default_constructible
work out how to still do check_no_attributes_set_in_init
test exception when running check_estimator on class requiring parameters

jnothman · 2018-06-20T08:19:29Z

sklearn/utils/estimator_checks.py

@@ -319,61 +319,6 @@ def _boston_subset(n_samples=200):
    return BOSTON


-def set_checking_parameters(estimator):


I suppose we need to leave this in for backwards compat?

amueller · 2018-06-20T11:56:34Z

yes, this looks like a good idea. I might try to get the tags done first, though? Will be next week, though :-/

jnothman · 2018-06-20T12:00:19Z

I think this is more-or-less orthogonal to tags.

jnothman · 2018-06-20T12:00:55Z

Though it will simplify your tags implementation for meta-estimators, for instance.

amueller · 2018-06-20T12:03:27Z

more or less, yes ;) might be there's no actual issues here. You mean by providing ways to instantiate them? Well it will make it easier to test more classes which will mean there's be many more failures lol ;)

jnothman · 2018-06-20T12:07:06Z

True! But it also means you might not need to have a required_parameters tag.

amueller · 2018-06-20T12:10:24Z

So you wouldn't test that all parameters are either specified by default or explicitly defined as required? We could easily miss "max_iter" not having a default value with this. Maybe not as big an issue for us but for third parties?

amueller · 2018-06-20T12:13:34Z

sklearn/svm/classes.py

@@ -594,6 +602,10 @@ def __init__(self, C=1.0, kernel='rbf', degree=3, gamma='auto_deprecated',
            decision_function_shape=decision_function_shape,
            random_state=random_state)

+    @classmethod
+    def _get_test_instances(cls):
+        yield cls(decision_function_shape='ovo')


This will fail, right?

Or rather: why doesn't this fail?

The current set_checking_parameters does this for *SVC

you're right. But why? I'm sure that was me. I should get some coffee and think about that. That makes the shape of the decision function wrong, doesn't it? Are we never testing for more than 3 classes? I'm quite confused.

amueller · 2018-06-20T12:15:59Z

Did you set parameters at everything that was impacted by the old function? Or how did you determine when to set parameters? Maybe it's easier to merge this first, either way I'm unlikely to have a lot of bandwidth before next week :-/

jnothman · 2018-06-20T12:17:41Z

I tried to do every edit that corresponded to the existing set_checking_parameters. I've deprecated that function now, but left the portable logic in, on the basis that some implementers may have used it.

amueller · 2018-06-20T12:21:20Z

I'd really like to hear your opinion on the default parameter tests. It seems the tag for tag was confusing to you and @rth. But I feel it's important to make sure that third party estimators don't have weird required parameters.

amueller · 2018-06-20T12:23:23Z

And I think leaving the function in but deprecating doesn't really cost us much so I'd do it.

jnothman · 2018-06-20T12:23:37Z

The point is that this will also raise an error if they try run check_estimator on the estimator unless they define params. I think this will help us clearly review any testing params required for contrib projects.

jnothman · 2018-06-20T12:24:37Z

And I think leaving the function in but deprecating doesn't really cost us much so I'd do it.

Done. See https://github.com/scikit-learn/scikit-learn/pull/11324/files#diff-a95fe0e40350c536a5e303e87ac979c4R323

jnothman · 2018-06-20T12:28:50Z

I think you'll like the last commit.

amueller · 2018-06-20T12:29:29Z

The point is that this will also raise an error if they try run check_estimator on the estimator unless they define params. I think this will help us clearly review any testing params required for contrib projects.

Yeah but it requires a review, and check_estimator has wider use than contrib projects. It leaves open the possibility for someone accidentally forgetting so specify a default for a parameter that they set in testing. But maybe I'm being paranoid.

It could also be that if someone doesn't know our conventions they'll just add them to the testing function and not to the constructor. But maybe the docs can clarify that.

amueller · 2018-06-20T12:30:31Z

Ok that seems a good solution ;) you pre-empted my complaints.
Needs dev docs, otherwise looks good (though I didn't go through for all the cases).
Let's to his before tags then.

jnothman · 2018-06-20T12:43:37Z

It's not actually a sufficient solution. Most people won't use our test_common.py which is the only place _get_test_instances is called. I've left myself a todo to use _get_test_instances when calling check_estimator(cls) We could also (now that it's only recently been introduced) consider deprecating calling check_estimator on an instance... but we don't need to worry about that here.

jnothman · 2018-06-20T12:45:47Z

The trouble with that is supporting check_no_attributes_set_in_init. Also, this PR should aim to replace all of check_parameters_default_constructible

jnothman · 2018-06-20T12:47:09Z

The easiest way I can think of doing check_no_attributes_set_in_init is to make _get_test_instances return a dict of params, rather than an instance, and the construction to happen in test_common/check_estimator.

amueller · 2018-06-20T12:56:55Z

Hm there is tests that test init stuff that are not covered in clone, I think.
I need to double check.

Also from a use perspective is seems easier to pass an instance than add to this function in the top esitmator. Would we want all possibly pipelines we want to test be hardcoded in the pipeline class? That seems weird to me, in particular if I'm testing things like NaN handling.

It would be nice for a third party to be able to check whether their pipeline construct works nicely without inheriting / monkey-patching pipeline.

jnothman · 2018-06-20T13:06:18Z

Maybe I'm sold on the requires_params tag. I tried changing check_parameters_default_constructible to accept some params and make sure that either these or the defaults are set.... but maybe that's silly.

amueller · 2018-06-20T13:42:33Z

I'm not adamant about my solution at all, will think about your idea later (have to run).
I thought your error message was good but if it's hard to implement thats another thing. I think it might be reasonable to think that people don't start with implementing _generate_test_params so the test will be meaningful in most cases.

Currently the distinction between class level tests and instance level tests is pretty ugly. I was not happy with that at my last refactoring. But I didn't see a good way to check default_constructible but also pass in complex meta-estimators that might not "belong" to a specific class.

amueller · 2018-06-28T14:57:10Z

Do we want to have this in the release? I was kinda hoping for estimator tags, not sure if I have time to finish it, though...

jnothman · 2018-06-28T21:44:31Z

It would be nice to have all these things... I'm not sure we can justify a delay to release, though!

NicolasHug · 2019-07-29T15:23:18Z

@jnothman do you mind if I try solving the merge conflicts so we can move this forward?

amueller · 2019-07-29T15:48:11Z

@NicolasHug I'm not sure there's consensus on this.

I think I'd rather to the meta-estimator initialization by using instance-based testing as in #9741 instead of class-based testing.

The main downside of that is that some estimators (like SelectKBest and GaussianRandomProjection) don't pass the common tests with their default parameters, while other estimators take "a long time" with their default parameters.

Having a decentralized architecture is better than the centralized architecture we have right now, but in the end I feel having "testing parameters" is a bit like playing vw because we're not actually testing the defaults.
I'm not sure if it is worth changing the defaults so they work, but that also seems a bit arbitrary. Or change the tests so they work?

amueller · 2019-07-29T15:49:04Z

Maybe we need a slep / discussion on if/how to remove set_checking_parameters.

rth · 2019-07-31T12:37:19Z

I think I'd rather to the meta-estimator initialization by using instance-based testing as in #9741 instead of class-based testing.

@amueller Could you elaborate on that? Say if I want to run common tests on an estimator with different solvers (#14063), I see how it can be done with this PR but not with instance based approaches (short of manually creating appropriate tests).

jnothman · 2019-07-31T14:16:55Z

I think @amueller is saying that this is more or less equivalent to just calling check_estimator on a bunch of instances of the estimator, with different solvers for example, and doing so in the estimator's test module.

rth · 2019-07-31T14:38:43Z

this is more or less equivalent to just
calling check_estimator on a bunch of instances of the estimator, with
different solvers for example, and doing so in the estimator's test module.

I agree that would be the most straightforward approach. Is there a reason we haven't been doing that so far? There shouldn't be issues for non meta-estimator I think?

amueller · 2019-07-31T16:43:23Z

@rth I think the reason we haven't done it is that the last time I was working on this the estimator tags were not merge and so it was not easily possible.

Doing instance-level checks is my preferred solution to checking different code paths. However, it doesn't get rid of set_checking_parameters. I think it's still good to run everything through the common tests so we don't for get anything. But the problem is that some estimators are slow by default and some estimators don't pass the tests by default. Neither of these issues is solved by explicit instance-based testing.

jnothman · 2019-08-01T00:50:46Z

One alternative to running common tests is for check_estimator to log which estimator classes had been run through it, and for us to check at the end of testing that all sklearn classes are on that log...

…

amueller · 2019-08-02T20:35:10Z

That would still allow a lot of hackery, though. I feel like ideally we shouldn't need to tweak the estimators to pass the tests.

jnothman · 2019-08-03T22:18:33Z

Yes... The tags should help adapt to the tests. But some of the checking params are there just to make the tests faster...

adrinjalali · 2019-08-12T09:13:51Z

That would still allow a lot of hackery, though. I feel like ideally we shouldn't need to tweak the estimators to pass the tests.

One way to fix the issue is to have more "auto" values for the params. The estimators would ideally set them properly when a small dataset is passed.

amueller · 2019-08-12T13:54:06Z

@adrinjalali that's only sort-of true. For example if we're checking API we don't need algorithms to convert, so we set max_iter or n_iter to something small.

@jnothman I wasn't necessarily taking about tags. My issue that the point of the tests is to make sure the estimators behave correctly. Any mechanism that allows us to set max_iter to make the algorithm run faster also allows us to hide issues with the default parameters or work around bugs, i.e. cheat the tests.
From that perspective, this PR basically implements the volkswagen method.

jnothman · 2019-08-12T14:24:24Z

So do we need fast and slow runs of the tests?

amueller · 2019-08-12T14:45:51Z

And the fast runs use the method in this PR?

In principle that seems like a reasonable solution. Running with default params on the cron job and with "fast" params on PRs?

That would not resolve the issues with weird defaults (like SelectKBest) but we that part could be resolved by being "smart" with "auto"?

Does that mean check_estimator gets a use_testing_parameters argument that propagates to everything, together with the strict_mode parameter?

API specify test parameters via classmethod

1d07ede

jnothman added the API label Jun 20, 2018

jnothman added 2 commits June 20, 2018 18:07

Assorted fixes

01cae01

Leave max_iter=None where this is auto

b8fc795

jnothman commented Jun 20, 2018

View reviewed changes

jnothman changed the title ~~API specify test parameters via classmethod~~ [MRG] API specify test parameters via classmethod Jun 20, 2018

Deprecate set_checking_params instead of removin

4f3ada2

amueller reviewed Jun 20, 2018

View reviewed changes

Give developer a friendly message if they require params

fa5ad2a

Cosmit

715c9ad

typo

3534495

jnothman changed the title ~~[MRG] API specify test parameters via classmethod~~ [WIP] API specify test parameters via classmethod Jun 20, 2018

Restructure as _generate_test_params

fbb97fc

WIP

0f59f2a

Change cls to dict in missed instances

1dc98f2

jnothman mentioned this pull request Apr 15, 2019

New tag: checking_init_parameters #13647

Closed

rth mentioned this pull request Jun 13, 2019

Programatically finding all supported solvers and losses for an estimator #14063

Open

jnothman mentioned this pull request Jul 25, 2019

[MRG] FIX run test for meta-estimator having estimators keyword #14305

Merged

NicolasHug mentioned this pull request Jul 31, 2019

FEA Turn on early stopping in histogram GBDT by default #14516

Merged

rth mentioned this pull request Apr 24, 2020

Remove support for classes in check_estimator and parametrize_with_checks #17030

Closed

rth mentioned this pull request Jun 3, 2020

Run common checks on estimators with non default parameters #17441

Open

2 tasks

rth mentioned this pull request Aug 15, 2020

TST enable to run common test on stacking and voting estimators #18045

Merged

Base automatically changed from master to main January 22, 2021 10:50

		@@ -319,61 +319,6 @@ def _boston_subset(n_samples=200):
		return BOSTON


		def set_checking_parameters(estimator):

[WIP] API specify test parameters via classmethod #11324

Are you sure you want to change the base?

[WIP] API specify test parameters via classmethod #11324

Conversation

jnothman commented Jun 20, 2018 • edited Loading

jnothman Jun 20, 2018

Choose a reason for hiding this comment

amueller commented Jun 20, 2018

jnothman commented Jun 20, 2018

jnothman commented Jun 20, 2018

amueller commented Jun 20, 2018

jnothman commented Jun 20, 2018 via email

amueller commented Jun 20, 2018

amueller Jun 20, 2018

Choose a reason for hiding this comment

amueller Jun 20, 2018

Choose a reason for hiding this comment

jnothman Jun 20, 2018

Choose a reason for hiding this comment

amueller Jun 20, 2018

Choose a reason for hiding this comment

amueller commented Jun 20, 2018

jnothman commented Jun 20, 2018

amueller commented Jun 20, 2018

amueller commented Jun 20, 2018

jnothman commented Jun 20, 2018 via email

jnothman commented Jun 20, 2018

jnothman commented Jun 20, 2018

amueller commented Jun 20, 2018

amueller commented Jun 20, 2018

jnothman commented Jun 20, 2018 via email

jnothman commented Jun 20, 2018

jnothman commented Jun 20, 2018

amueller commented Jun 20, 2018

jnothman commented Jun 20, 2018

amueller commented Jun 20, 2018 • edited Loading

amueller commented Jun 28, 2018

jnothman commented Jun 28, 2018

NicolasHug commented Jul 29, 2019

amueller commented Jul 29, 2019

amueller commented Jul 29, 2019

rth commented Jul 31, 2019

jnothman commented Jul 31, 2019 via email

rth commented Jul 31, 2019

amueller commented Jul 31, 2019

jnothman commented Aug 1, 2019 via email

amueller commented Aug 2, 2019

jnothman commented Aug 3, 2019 via email

adrinjalali commented Aug 12, 2019

amueller commented Aug 12, 2019 • edited Loading

jnothman commented Aug 12, 2019 via email

amueller commented Aug 12, 2019

jnothman commented Jun 20, 2018 •

edited

Loading

amueller commented Jun 20, 2018 •

edited

Loading

amueller commented Aug 12, 2019 •

edited

Loading