-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Invariance tests for clustering metrics #8102 #8135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Invariance tests for clustering metrics #8102 This code forms a common test for the different clustering metrics based on their common properties. |
3rd version
Please sort out your PEP8 compliance and I hope to review in a few days' time. |
@jnothman Okay I will try to sort PEP8 compliance |
What is 1.py? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!!
Please still fix PEP8 issues
Please also test input format invariance: each should accept an array or a list in its place.
Label permutation and format invariance should be tested for unsupervised metrics too.
1.py
Outdated
@author: user | ||
""" | ||
|
||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We conventionally follow the following import order:
- standard library
- other external libraries
- sklearn
- perhaps testing imports come last
test_common.py
Outdated
@author: anki08 | ||
""" | ||
|
||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We conventionally follow the following import order:
- standard library
- other external libraries
- sklearn
- perhaps testing imports come last
test_common.py
Outdated
@@ -0,0 +1,219 @@ | |||
# -*- coding: utf-8 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not in sklearn/metrics/cluster/tests
?
Where it is, the tests are not being run by make test
or the continuous integration servers.
test_common.py
Outdated
"adjusted_mutual_info_score" | ||
] | ||
|
||
#METRICS where permutations oflabels dont change score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be true of all clustering metrics.
test_common.py
Outdated
"normalized_mutual_info_score" | ||
] | ||
|
||
#metrics which result in 0 when a class is split across different clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this, nor is CLASS_BASED_METRICS
used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is basically for supervised metrics which give a high score if the clusters are both homogeneous and complete
test_common.py
Outdated
#test function for mtericshaving the property of not changing the score when | ||
#the labels are permuted. | ||
def permute_labels(): | ||
for name in METRICS_NORMALIZED_OUTPUT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong metric set
test_common.py
Outdated
assert_equal(adjusted_mutual_info_score(labels_a, labels_b), 0.0) | ||
assert_equal(normalized_mutual_info_score(labels_a, labels_b), 0.0) | ||
|
||
#test function for mtericshaving the property of not changing the score when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're confusing what i meant by invariant to label permutation.
I mean that for some (binary) y_true, y_pred
, metric(y_true, y_pred) == metric(1 - y_true, y_pred) == metric(y_true, 1 - y_pred) == metric(1-y_true, 1-y_pred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unsupervised metrics the input format is completely different . The parameters are X and labels .How will the label permutation work there ? Silhouette__score needs X.shape to execute which is not present in labels. So it doesnot follow format invariance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just mean that metric(X, y_pred) = metric(X, 1 - y_pred)
test_common.py
Outdated
# If classes members are completely split across different clusters, | ||
#the assignment is totally in-complete, hence the score of these metrics is 0 | ||
#they are perfect when the clusters are both homoneneous and complete | ||
def class_based_clusters(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should start with test_
test_common.py
Outdated
#the assignment is totally in-complete, hence the score of these metrics is 0 | ||
#they are perfect when the clusters are both homoneneous and complete | ||
def class_based_clusters(): | ||
for name in METRICS_NORMALIZED_OUTPUT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong set of metrics
test_common.py
Outdated
var_2 = metric([0, 1, 0, 1, 0, 1], [2, 0, 1, 1, 0, 2]) | ||
assert_equal(var_1,var_2) | ||
|
||
# If classes members are completely split across different clusters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems much like test_exactly_zero_info_score
...?
Added format invariance and changed label permutation tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the two old files and only work on one.
test_common.py
Outdated
|
||
#test function for metrics whose output in range 0 to 1 | ||
def test_normalized_output(): | ||
for name in METRICS_NORMALIZED_OUTPUT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that the lower_bound
will not be true for asymmetric measures, but probably is for symmetric ones.
test_common.py
Outdated
assert_equal(adjusted_mutual_info_score(labels_a, labels_b), 0.0) | ||
assert_equal(normalized_mutual_info_score(labels_a, labels_b), 0.0) | ||
|
||
#test function for mtericshaving the property of not changing the score when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just mean that metric(X, y_pred) = metric(X, 1 - y_pred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fowlkes_mallows and calinsky_harabaz, at least, are not present in your imports.
] | ||
|
||
# Metrics where permutations of labels dont change score( 0 and 1 exchchanged) | ||
METRICS_PERMUTE_LABELS = ["homogeneity_score","v_measure_score", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true of all clustering in scikit-learn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. it should not have its own group. Just test this property on all.
] | ||
|
||
# Input parameters can be both in the form of arrays and lists | ||
METRICS_WITH_FORMAT_INVARIANCE = ["homogeneity_score","v_measure_score", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true of all clustering in scikit-learn
for name in CLASS_BASED_METRICS: | ||
metric = ALL_METRICS[name] | ||
assert_equal(metric([0, 0, 0, 0],[0, 1, 2, 3]),0.0) | ||
assert_equal(metric([0, 0, 1, 1],[0, 0, 1, 1]),1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this is testing being normalised, not class-based
metric = ALL_METRICS[name] | ||
assert_equal(metric([0, 0, 0, 0],[0, 1, 2, 3]),0.0) | ||
assert_equal(metric([0, 0, 1, 1],[0, 0, 1, 1]),1.0) | ||
assert_equal(metric([0, 0, 1, 1],[0, 0, 1, 1]),1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this appears to be a duplicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not run fowlkes_mallows and calinsky_harabaz on my laptop Therefore I did not add them .
ERROR :
NameError: name 'calinski_harabaz_score' is not defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You apparently have an old version of scikit-learn installed. You need to be working with the current master.
"normalized_mutual_info_score","adjusted_rand_score" | ||
] | ||
|
||
# If classes members are completely split across different clusters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't get what the point of this group is. adjusted_rand_accuracy
and mutual_info_score
and fowlkes_mallows_score
would also pass the test as it stands.
I would really appreciate if you used more meaningful commit messages. "Add files via upload" gives me no indication of what's changed. There are a lot of PRs to keep track of. |
…Added fowlkes_mallows and calinsky_harabaz . Removed the groups for permute_labels and invariance_format
Please fix your flake8 issues. They make it much harder to focus on the actual work:
|
maybe autopep8 would help on this occasion, though usually I'd recommend against it. |
Thank you for that. Much appreciated. If you feel like this is basically what you hope to have merged (with some likely changes under review), please change the |
Okay . Thanks a lot for your help . |
I can't figure out why the code is failing in CircleCI . Could You please help me @tguillemot |
@anki08 Don't waste your time the problem appears in several PR. It is not related to your work. |
@anki08 Ci problems seems solved now, can you try a rebase ? |
Its failing even now : |
Don't worry about the CI. |
I think this may be mergeable, though we should merge in master to check that CI runs fine, and ideally another core dev should give it a quick look. |
what's with the weird sphinx error? |
remove outdated comment
We might need to merge in master to get updated circle script |
@glemaitre, want to review this? |
I look at it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some pytest updates in fact. The tests themselves seem ok.
for name in SYMMETRIC_METRICS: | ||
metric = SUPERVISED_METRICS[name] | ||
assert_almost_equal(metric(y1, y2), metric(y2, y1), | ||
err_msg="%s is not symmetric" % name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add:
... %s was expected to be symmetric.
for name in NON_SYMMETRIC_METRICS: | ||
metric = SUPERVISED_METRICS[name] | ||
assert_not_equal(metric(y1, y2), metric(y2, y1), | ||
msg="%s is symmetric" % name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same suggestion as before but non symmetric
assert_not_equal(metric(y1, y2), metric(y2, y1), | ||
msg="%s is symmetric" % name) | ||
|
||
assert_equal(sorted(SYMMETRIC_METRICS + NON_SYMMETRIC_METRICS), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert sorted(SYMMETRIC_METRICS + NON_SYMMETRIC_METRICS) == sorted(SUPERVISED_METRICS)
|
||
for name in NON_SYMMETRIC_METRICS: | ||
metric = SUPERVISED_METRICS[name] | ||
assert_not_equal(metric(y1, y2), metric(y2, y1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: can we have surprises if there is some numerical error which do not trigger the equality.
In this regard, I would write:
assert metric(y1, y2) != pytest.approx(metric(y2, y1))
If we go in this direction I would suggest to replace assert_almost_equal
by
assert x == pytest.approx(y)
upper_bound_2 = [0, 0, 0, 1, 1, 1] | ||
for name in NORMALIZED_METRICS: | ||
metric = SUPERVISED_METRICS[name] | ||
assert_greater(metric([0, 0, 0, 1, 1], [0, 0, 0, 1, 2]), 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace:
assert_greater(x, y)
byassert x > y
assert_less(x, y)
byassert x < y
assert_equal(x, y)
byassert x == y
score_1 = metric(y_pred, y_label) | ||
assert_almost_equal(score_1, metric(1 - y_pred, y_label), | ||
err_msg="%s failed labels permutation" % name) | ||
assert_almost_equal(score_1, metric(1 - y_pred, 1 - y_label), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use approx
y_true = [0, 0, 0, 0, 1, 1, 1, 1] | ||
y_pred = [0, 1, 2, 3, 4, 5, 6, 7] | ||
|
||
def generate_formats(y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can parametrize using pytest
|
||
for name in UNSUPERVISED_METRICS: | ||
metric = UNSUPERVISED_METRICS[name] | ||
X = np.random.randint(10, size=(8, 10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
randint(..., dtype=np.float64) will avoid a astype afterwords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups my comment is wrong
@glemaitre aside from cosmetics does this look good to you? I'd like to merge this so we can be more confident about PRs like #10827 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman The only thing that I can see is about the upper bound which is not tested while the lower bound is tested.
Otherwise, I am fine on the principle. Checking the other common test for metric, I see that we don't test:
- infinite or nan input
- single sample
Should it be considered in the tests?
assert_equal(metric(upper_bound_1, upper_bound_2), 1.0, | ||
msg="%s has upper_bound greater than 1" % name) | ||
|
||
lower_bound_1 = [0, 0, 0, 0, 0, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just by curiosity, shall test the upper bound as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad it was just above
metric = SUPERVISED_METRICS[name] | ||
score = [metric(lower_bound_1, lower_bound_2), | ||
metric(lower_bound_2, lower_bound_1)] | ||
assert_true(0.0 in score, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_allclose
|
||
for name in UNSUPERVISED_METRICS: | ||
metric = UNSUPERVISED_METRICS[name] | ||
X = np.random.randint(10, size=(8, 10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups my comment is wrong
non-finite and single sample both sound like good ideas.
… |
Reference Issue
What does this implement/fix? Explain your changes.
Any other comments?