TST Add sample order invariance to estimator_checks #17598

anhqngo · 2020-06-15T19:54:58Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Added check_methods_sample_order_variance function, which checks for method invariance under sample order (i.e. results should not change when sample before or after applying the method).

For example, if we randomize the indices, the results should still be the same:

idx = np.random.permutation(X.shape[0])
assert_allclose_dense_sparse({method}(X)[idx], {method}(X[idx]))

Any other comments?

The original issue mentions using assert_array_equal to check for invariance. While this method works for predict, decision_function, score_samples, and predict_proba, it fails for transform. Therefore, I have opted for assert_allclose_dense_sparse with atol=1e-9 instead of assert_array_equal, which passes the tests for all methods.

Furthermore, the original issue uses random sampling, but since we already have check_methods_subset_invariance already, I just shuffled the indices instead of random sampling.

rth

Thanks @ngojason9, this is very useful! For the failing estimaotr,

we can skip RadiusNeighborsTransformer as it returns a (n_samples, n_samples) matrix and so won't verify this property. You can do that with _xfail_checks estimator tag: https://scikit-learn.org/stable/developers/develop.html#estimator-tags
For stochastic models including BernoulliRBM, maybe defaults, the dataset or tolerances would need to be adjusted.

sklearn/utils/tests/test_estimator_checks.py

anhqngo · 2020-06-19T18:51:58Z

Thanks @rth for the suggestion. I added _xfail tags for the transform method in RadiusNeighborsTransformer and KNeighborsTransformer classes, as well as score_samples method in BernoulliRBM since they are non-deterministic. Let me know what you think!

rth

Thanks @ngojason9 ! A few last comments otherwise LGTM.

sklearn/neighbors/_graph.py

rth · 2020-06-19T21:53:44Z

sklearn/utils/estimator_checks.py

+    if hasattr(estimator, "n_components"):
+        estimator.n_components = 1
+    if hasattr(estimator, "n_clusters"):
+        estimator.n_clusters = 1


Does it make sense to have 1 clusters for this test though? The result is always going to be the same cluster, component so we won't be testing much?

umm I'm not entirely sure what n_clusters should be instead. Are you suggesting that if the estimator has 2 clusters, then we should test each cluster individually? Apologies if I totally misunderstood your suggestion.

No mean that with n_clusters=1, for instance KMeans.predict whether there is sample invariance or not will predict that one cluster, since there is no alternative values. Could we try changing this to 2 and see if tests still pass?

@rth I tried changing to 2 and the tests still pass. I also tried deleting that block of code altogether and the tests also pass. How would you like me to proceed?

Please set it to 2.

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

glemaitre

It looks good. Could you add an entry in whats new since it will be run by the check_estimator for third-party libraries?

sklearn/utils/tests/test_estimator_checks.py

glemaitre · 2020-06-24T09:11:45Z

@rth could you have a final look at this one.

rth

Setting estimator.n_clusters = 2 would be preferable I think otherwise LGTM

rth · 2020-07-04T09:35:11Z

sklearn/utils/estimator_checks.py

+    if hasattr(estimator, "n_components"):
+        estimator.n_components = 1
+    if hasattr(estimator, "n_clusters"):
+        estimator.n_clusters = 1


Please set it to 2.

jnothman

I assume we have decided that we can break check_estimator backwards compatibility since new checks can be disabled by downstream libraries?

flosincapite · 2020-07-17T18:26:44Z

Hi @ngojason9 ! It looks like the CI is failing but this PR is otherwise okay. @rth, is that correct? If so, Jason, please make the necessary fixes to ensure CI passes.

rth · 2020-07-17T22:33:01Z

I assume we have decided that we can break check_estimator backwards compatibility since new checks can be disabled by downstream libraries?

That and we should ideally merge some version of #17361 before the next release.

please make the necessary fixes to ensure CI passes.

Merging master in might be enough. Once we fix #17943 that is.

rth · 2020-08-07T18:41:42Z

@ngojason9 Could you please merge upstream/master to resolve CI issues?

sklearn/utils/estimator_checks.py

glemaitre · 2020-08-24T11:26:06Z

@ngojason9 we finally introduce #17361 in master. So if you could merge master into your branch and make the above change, then we will be able to merge this PR.

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

cmarmo

Thanks @ngojason9, three approvals already! Once the lint fixed this PR is ready for merging!

cmarmo · 2020-09-08T08:35:20Z

sklearn/utils/estimator_checks.py

@@ -1168,6 +1169,41 @@ def check_methods_subset_invariance(name, estimator_orig, strict_mode=True):
                            atol=1e-7, err_msg=msg)


+@ignore_warnings(category=FutureWarning)
+def check_methods_sample_order_invariance(name, estimator_orig, strict_mode=True):


Suggested change

def check_methods_sample_order_invariance(name, estimator_orig, strict_mode=True):

def check_methods_sample_order_invariance(name, estimator_orig,

strict_mode=True):

glemaitre · 2020-10-08T13:03:15Z

I fixed the linting issue in #18570 because I could not push directly in this branch.
Everything was green so I merged it.

Thanks @ngojason9 for the work.

anhqngo added 4 commits June 9, 2020 18:18

Add sample order invariance to estimator_checks

efef747

initial test for sample order invariance

8bab441

fix wording of "subset"

e04ef5e

format with flake8

411a306

github-actions bot added the module:utils label Jun 15, 2020

anhqngo changed the title ~~Add sample order invariance to estimator_checks~~ [WIP] Add sample order invariance to estimator_checks Jun 18, 2020

rth reviewed Jun 18, 2020

View reviewed changes

sklearn/utils/tests/test_estimator_checks.py Show resolved Hide resolved

anhqngo added 6 commits June 19, 2020 10:43

add xfail tags

9f14a8c

change to pytest.raises from assert_raises_regex

16c144e

add xfail to RadiusNeighborsTransformers

45ae6f5

delete extra line - linting

85a272c

remove pytest bc of soft dependency

221d010

lint

8ac935f

anhqngo requested a review from rth June 19, 2020 18:52

anhqngo changed the title ~~[WIP] Add sample order invariance to estimator_checks~~ Add sample order invariance to estimator_checks Jun 19, 2020

rth reviewed Jun 19, 2020

View reviewed changes

anhqngo and others added 2 commits June 22, 2020 10:10

Update sklearn/neighbors/_graph.py

ce78564

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

Update sklearn/neighbors/_graph.py

120209f

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

glemaitre approved these changes Jun 23, 2020

View reviewed changes

sklearn/utils/tests/test_estimator_checks.py Show resolved Hide resolved

sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved

glemaitre changed the title ~~Add sample order invariance to estimator_checks~~ TST Add sample order invariance to estimator_checks Jun 23, 2020

Add changelog to whats_new

fe47149

rth approved these changes Jul 4, 2020

View reviewed changes

update cluster number

9594384

jnothman approved these changes Jul 6, 2020

View reviewed changes

glemaitre self-assigned this Aug 24, 2020

glemaitre removed their assignment Aug 24, 2020

glemaitre reviewed Aug 24, 2020

View reviewed changes

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

anhqngo and others added 2 commits September 7, 2020 22:06

Merge remote-tracking branch 'upstream/master' into jason-test

1f89835

Update sklearn/utils/estimator_checks.py

e4ef3aa

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

cmarmo reviewed Sep 8, 2020

View reviewed changes

glemaitre self-assigned this Oct 8, 2020

glemaitre mentioned this pull request Oct 8, 2020

TST Add sample order invariance to estimator_checks #18570

Merged

glemaitre closed this Oct 8, 2020

cmarmo mentioned this pull request Oct 8, 2020

Tests for sample order invariance in estimator_checks #8695

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST Add sample order invariance to estimator_checks #17598

TST Add sample order invariance to estimator_checks #17598

anhqngo commented Jun 15, 2020

rth left a comment •

edited

Loading

anhqngo commented Jun 19, 2020 •

edited

Loading

rth left a comment

rth Jun 19, 2020

anhqngo Jun 22, 2020

rth Jun 25, 2020

anhqngo Jun 30, 2020

rth Jul 4, 2020

glemaitre left a comment

glemaitre commented Jun 24, 2020

rth left a comment

rth Jul 4, 2020

jnothman left a comment

flosincapite commented Jul 17, 2020

rth commented Jul 17, 2020

rth commented Aug 7, 2020

glemaitre commented Aug 24, 2020

cmarmo left a comment

cmarmo Sep 8, 2020

glemaitre commented Oct 8, 2020

	def check_methods_sample_order_invariance(name, estimator_orig, strict_mode=True):
	def check_methods_sample_order_invariance(name, estimator_orig,
	strict_mode=True):

TST Add sample order invariance to estimator_checks #17598

TST Add sample order invariance to estimator_checks #17598

Conversation

anhqngo commented Jun 15, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

rth left a comment • edited Loading

Choose a reason for hiding this comment

anhqngo commented Jun 19, 2020 • edited Loading

rth left a comment

Choose a reason for hiding this comment

rth Jun 19, 2020

Choose a reason for hiding this comment

anhqngo Jun 22, 2020

Choose a reason for hiding this comment

rth Jun 25, 2020

Choose a reason for hiding this comment

anhqngo Jun 30, 2020

Choose a reason for hiding this comment

rth Jul 4, 2020

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

glemaitre commented Jun 24, 2020

rth left a comment

Choose a reason for hiding this comment

rth Jul 4, 2020

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

flosincapite commented Jul 17, 2020

rth commented Jul 17, 2020

rth commented Aug 7, 2020

glemaitre commented Aug 24, 2020

cmarmo left a comment

Choose a reason for hiding this comment

cmarmo Sep 8, 2020

Choose a reason for hiding this comment

glemaitre commented Oct 8, 2020

rth left a comment •

edited

Loading

anhqngo commented Jun 19, 2020 •

edited

Loading