Skip to content

[WIP] Prototype 2 for strict check_estimator mode #16890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Apr 10, 2020

Closes #14929
Fixes #13969
Fixes #16241

Alternative to #16882 and #17252

Here we require every single check to have a strict_mode parameter, even if it's not used

ping @rth :)

@NicolasHug
Copy link
Member Author

Souunds like this doesn't play well with tests marked with a _xfail_test tag.

I honestly don't understand how that works, help welcome @thomasjpfan

if strict_mode:
return check(*args, **kwargs)
else:
raise SkipTest(f"Skipping {check} since strict mode is off")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue may be that this decorator shouldn't raise skip test (which might not be handled well in a pytest session), but rather only mark the test as skipped in some way. And we can skip it in the same place we currently mark tests as xfail.

I'll try to have a more detailed look in a couple of days..

@NicolasHug
Copy link
Member Author

The issue comes from _set_check_estimator_ids() which will return check_methods_subset_invariance(strict_mode=True), but the tag in (e.g.) BernoulliRBM is 'check_methods_subset_invariance' so they don't exactly match now.

So the fix is to change the tag value, for every single estimator.

Or is there a way we can change _mark_xfail_checks() to make it more robust?

@thomasjpfan
Copy link
Member

thomasjpfan commented Apr 14, 2020

The following is one way help with the keyword issue:

--- a/sklearn/utils/estimator_checks.py
+++ b/sklearn/utils/estimator_checks.py
@@ -290,7 +290,7 @@ def _yield_all_checks(name, estimator):
         yield check_fit_non_negative
 
 
-def _set_check_estimator_ids(obj):
+def _set_check_estimator_ids(obj, include_keywords=True):
     """Create pytest ids for checks.
 
     When `obj` is an estimator, this returns the pprint version of the
@@ -306,6 +306,9 @@ def _set_check_estimator_ids(obj):
     obj : estimator or function
         Items generated by `check_estimator`
 
+    include_keywords : bool, default=True
+        Includes keywords in id
+
     Returns
     -------
     id : string or None
@@ -318,7 +321,7 @@ def _set_check_estimator_ids(obj):
         if not isinstance(obj, partial):
             return obj.__name__
 
-        if not obj.keywords:
+        if not obj.keywords or not include_keywords:
             return obj.func.__name__
 
         kwstring = ",".join(["{}={}".format(k, v)
@@ -378,7 +381,7 @@ def _mark_xfail_checks(estimator, check, pytest):
     if not xfail_checks:
         return estimator, check
 
-    check_name = _set_check_estimator_ids(check)
+    check_name = _set_check_estimator_ids(check, include_keywords=False)
     msg = xfail_checks.get(check_name, None)
 
     if msg is None:

@NicolasHug
Copy link
Member Author

NicolasHug commented Apr 14, 2020

Thanks for the suggestion @thomasjpfan .

This seems to have fixed the previous issue, but I'm concerned that this patch will prevent the selection of xfail tests based on a parameter (especially the strict_mode parameter), e.g.:

    def _more_tags(self):
        return {
            '_xfail_test': {
                'some_check(strict_mode=true)':
                'the strict case fails but the non-strict one should be just fine'
            }
        }

@thomasjpfan
Copy link
Member

Ah yes, then we would need to update _mark_xfail_checks to be a little smarter.

@NicolasHug
Copy link
Member Author

thoughts @rth @jnothman ?

@NicolasHug
Copy link
Member Author

And @rth @thomasjpfan, would it be better for the xfail tag logic to rely on partial functions instead of string representations? I.e. we would specify partial application of a check in the tag, instead of its test id.

@rth
Copy link
Member

rth commented Apr 17, 2020

Thanks for working on this @NicolasHug . Sorry for slow review.

would it be better for the xfail tag logic to rely on partial functions instead of string representations? I.e. we would specify partial application of a check in the tag, instead of its test id.

Could you give an example? Do you mean, e.g.

          '_xfail_test': {
                partial(check_methods_subset_invariance, some_param=3), 
                'fails for the predict method'
            }

?
I guess technically partial functions are hashable and can serve as dict keys,

>>> def f(x, a=2):
...     return x + 1
... 
>>> from functools import partial
>>> partial(f, a=3)
functools.partial(<function f at 0x7f2366b2bdc0>, a=3)
>>> hash(partial(f, a=3))
8736876476520

I still find it a bit convoluted to use it that, particularly if check_.. can one day become decorated by pytest with some marks (and then I'm not sure if there is case where it's not hashable). But as a tuple (func, message) why not, though we would need to iterate though all of the items manually then. maybe comparison would be more reliable indeed, and it would ensure that the skipped check does indeed exist.
Though for contrib project that use this it means that we need to be really backward compatible for the check_ functions as that would break their imports.

So overall I have mixed feeling about it, possibly with +0.5. Not sure if it helps :)

@rth
Copy link
Member

rth commented Apr 17, 2020

The issue comes from _set_check_estimator_ids() which will return check_methods_subset_invariance(strict_mode=True), but the tag in (e.g.) BernoulliRBM is 'check_methods_subset_invariance' so they don't exactly match now.

Just read the previous conversation in more details. If strict check is unpleasant to make work with xfail_checks as strings, then +1 to for partial functions.
As long as we can mitigate above concerns about hashability (i.e. maybe not use dict) and backward compatibility of check_* functions..

@NicolasHug
Copy link
Member Author

BTW as @adrinjalali noted we should aim for a feature freeze soon. Considering that i) the solution for the strict mode will likely break backward compat of the xfail logic and that ii) we won't reach a solution within the timeframe, we should probably make very clear in the UG that the xfail_checks tag should basically not be used at all since it will change almost certainly. (I know tags are experimental, but still).

@rth
Copy link
Member

rth commented Apr 18, 2020

we won't reach a solution within the timeframe

You think we won't? Disabling stict checks is likely the no 1 ask from maintainers of contrib projects. If you need help on this I'd be happy to help.

Do we have a date for the feature freeze?

@NicolasHug
Copy link
Member Author

Thanks for the help! I'll try to make it work with partial functions and will report back. If it's too tricky I guess relying on the generated check id would be fine.

Feature freeze is planned for the end of next week apparently.
But this PR is only the first part, i.e. implementing the strict mode mechanism. Once it's merged, we'll still have to do part 2, i.e. bikeshed on which tests should or should not be strict ;)

@rth
Copy link
Member

rth commented Apr 18, 2020

Feature freeze is planned for the end of next week apparently.

Yes, saw the message on gitter. Will respond there.

But this PR is only the first part, i.e. implementing the strict mode mechanism. Once it's merged, we'll still have to do part 2, i.e. bikeshed on which tests should or should not be strict ;)

I know, but at least we'll have the basic mechanism.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this much better than using an estimator tag to indicate strict mode?

@NicolasHug
Copy link
Member Author

You mean an additional tag on top of xfail_checks?

With #16958, I'm starting to believe we need to rethink the xfail_checks tag to make it more general and also support a strict mode

@jnothman
Copy link
Member

jnothman commented Apr 19, 2020 via email

@NicolasHug
Copy link
Member Author

Could you please help me understand with an example of how the tag would be used? Would it be something like

def _more_tags(self):
	return {
		'dont_execute_in_strict_mode': [check_1, check_2, ...]
	}

?

In any case, note that the changes proposed so far in this PR are orthogonal to the addition of a new tag.

Also, question for you: do you think we should allow checks to be partially strict? I.e. allow a check to have a strict section, and a non-strict section?

@adrinjalali
Copy link
Member

Also, question for you: do you think we should allow checks to be partially strict? I.e. allow a check to have a strict section, and a non-strict section?

If that's the case for a test, should they not be two tests?

This also fixes #13969, doesn't it?

Putting this in 0.24 (although I'd be happy to have this in a minor release as well).

@adrinjalali adrinjalali added this to the 0.24 milestone Apr 20, 2020
@NicolasHug
Copy link
Member Author

I think we should not release xfail_tags in its current form in 0.23.

@NicolasHug
Copy link
Member Author

NicolasHug commented Apr 20, 2020

@adrinjalali

If that's the case for a test, should they not be two tests?

I personally tend to agree with you. For context, the reason I'm asking is because if we don't need to have partially strict checks, then #16882 is enough. @rth seems to think it's worth having #16882 (comment). But I'd like to hear @jnothman's opinion too.

@adrinjalali
Copy link
Member

I personally would be much happier with this solution, especially having #16756 in mind.

@NicolasHug
Copy link
Member Author

Sorry, what do you mean by "this" solution?

@rth
Copy link
Member

rth commented Apr 20, 2020

I agree we should avoid over-engineering this. We can apply strict at the check level, but this means e.g. that we cant assert error message in common tests. Basically all assert_raises_regex and assert_raise_message would then need to be removed, unless they assert messages from public helper functions (e.g. check_array or check_is_fitted).

I think we should not release xfail_tags in its current form in 0.23.

Just a reminder that "The estimator tags are experimental and the API is subject to change.".

Currently xfail_tags does address a valid use case, both internally and for contrib projects. Yes, API could potentially change in the future for xfail_tags due to interaction with strict=True which may be not ideal but I don't see a reason not to release it.

@rth
Copy link
Member

rth commented Apr 20, 2020

We can apply strict at the check level,

Just to be clear, I'm not opposed to that solution, as long as we break checks in smaller pieces when needed. Technically they might be semi-public API though (or at least have been around enough time for contrib projects to build testing setups around them e.g. https://github.com/rtavenar/tslearn/blob/f87744a4d62578f4c82aead6f3ee4e9d51fd2a54/tslearn/tests/test_estimators.py#L40)

@adrinjalali
Copy link
Member

Sorry, what do you mean by "this" solution?

Sorry, I meant as opposed to #16882

@jnothman
Copy link
Member

Could you please help me understand with an example of how the tag would be used?

I just meant:

def _more_tags(self):
	return { 'strict': True }

Then all assertions which are for a standard we hold in sklearn but not required for general compatibility would skip if 'strict' is not.

@NicolasHug
Copy link
Member Author

@jnothman I think that tuning on the strict mode should be a parameter to check_estimator, not a property of the estimator. One may want to run both the strict and the non-strict suite on the same instance.

Also please see the latest prototype at #17252

@jnothman
Copy link
Member

One may want to run both the strict and the non-strict suite on the same instance.

In what case?

@NicolasHug
Copy link
Member Author

in the case where we know an estimator doesn't pass the strict suite, but we still want to know which of the strict checks fail.

@NicolasHug
Copy link
Member Author

Also note that changing the tag of an instance is basically unsupported right now. We only support setting tags on classes, as far as I can tell.

Having strict_mode as a tag would thus prevent us from using the strict suite with Est(some_param='a') and the non-strict suite with Est(some_param='b).

@jnothman
Copy link
Member

Okay. I'm alright with strict not being a tag

Also note that changing the tag of an instance is basically unsupported right now. We only support setting tags on classes, as far as I can tell.

Untrue.

$ git grep -A3 more_tags | grep 'self\.'
sklearn/feature_extraction/_hash.py-        return {'X_types': [self.input_type]}
sklearn/feature_selection/_from_model.py-        estimator_tags = self.estimator._get_tags()
sklearn/feature_selection/_rfe.py-        estimator_tags = self.estimator._get_tags()
sklearn/impute/_base.py-        return {'allow_nan': is_scalar_nan(self.missing_values)}
sklearn/linear_model/_glm/glm.py-            _family_instance = self._family_instance
sklearn/preprocessing/_function_transformer.py-        return {'no_validation': not self.validate,

@NicolasHug
Copy link
Member Author

I meant setting tags on the fly on an instance i.e. some_instance.set_tags(...), but agreed, supporting init-parameter-dependent tags would be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants