Skip to content

[MRG+1] Invariance tests for clustering metrics #8102 #8135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 42 commits into from

Conversation

anki08
Copy link

@anki08 anki08 commented Dec 29, 2016

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

@anki08
Copy link
Author

anki08 commented Dec 30, 2016

Invariance tests for clustering metrics #8102

This code forms a common test for the different clustering metrics based on their common properties.

@jnothman
Copy link
Member

Please sort out your PEP8 compliance and I hope to review in a few days' time.

@anki08
Copy link
Author

anki08 commented Dec 30, 2016

@jnothman Okay I will try to sort PEP8 compliance
Thank you

@jnothman
Copy link
Member

What is 1.py?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

Please still fix PEP8 issues
Please also test input format invariance: each should accept an array or a list in its place.

Label permutation and format invariance should be tested for unsupervised metrics too.

1.py Outdated
@author: user
"""

import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We conventionally follow the following import order:

  • standard library
  • other external libraries
  • sklearn
  • perhaps testing imports come last

test_common.py Outdated
@author: anki08
"""

import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We conventionally follow the following import order:

  • standard library
  • other external libraries
  • sklearn
  • perhaps testing imports come last

test_common.py Outdated
@@ -0,0 +1,219 @@
# -*- coding: utf-8 -*-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not in sklearn/metrics/cluster/tests?

Where it is, the tests are not being run by make test or the continuous integration servers.

test_common.py Outdated
"adjusted_mutual_info_score"
]

#METRICS where permutations oflabels dont change score
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be true of all clustering metrics.

test_common.py Outdated
"normalized_mutual_info_score"
]

#metrics which result in 0 when a class is split across different clusters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this, nor is CLASS_BASED_METRICS used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is basically for supervised metrics which give a high score if the clusters are both homogeneous and complete

test_common.py Outdated
#test function for mtericshaving the property of not changing the score when
#the labels are permuted.
def permute_labels():
for name in METRICS_NORMALIZED_OUTPUT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong metric set

test_common.py Outdated
assert_equal(adjusted_mutual_info_score(labels_a, labels_b), 0.0)
assert_equal(normalized_mutual_info_score(labels_a, labels_b), 0.0)

#test function for mtericshaving the property of not changing the score when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're confusing what i meant by invariant to label permutation.

I mean that for some (binary) y_true, y_pred, metric(y_true, y_pred) == metric(1 - y_true, y_pred) == metric(y_true, 1 - y_pred) == metric(1-y_true, 1-y_pred)

Copy link
Author

@anki08 anki08 Jan 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unsupervised metrics the input format is completely different . The parameters are X and labels .How will the label permutation work there ? Silhouette__score needs X.shape to execute which is not present in labels. So it doesnot follow format invariance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean that metric(X, y_pred) = metric(X, 1 - y_pred)

test_common.py Outdated
# If classes members are completely split across different clusters,
#the assignment is totally in-complete, hence the score of these metrics is 0
#they are perfect when the clusters are both homoneneous and complete
def class_based_clusters():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should start with test_

test_common.py Outdated
#the assignment is totally in-complete, hence the score of these metrics is 0
#they are perfect when the clusters are both homoneneous and complete
def class_based_clusters():
for name in METRICS_NORMALIZED_OUTPUT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong set of metrics

test_common.py Outdated
var_2 = metric([0, 1, 0, 1, 0, 1], [2, 0, 1, 1, 0, 2])
assert_equal(var_1,var_2)

# If classes members are completely split across different clusters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems much like test_exactly_zero_info_score...?

Added format invariance and changed label permutation tests
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the two old files and only work on one.

test_common.py Outdated

#test function for metrics whose output in range 0 to 1
def test_normalized_output():
for name in METRICS_NORMALIZED_OUTPUT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that the lower_bound will not be true for asymmetric measures, but probably is for symmetric ones.

test_common.py Outdated
assert_equal(adjusted_mutual_info_score(labels_a, labels_b), 0.0)
assert_equal(normalized_mutual_info_score(labels_a, labels_b), 0.0)

#test function for mtericshaving the property of not changing the score when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean that metric(X, y_pred) = metric(X, 1 - y_pred)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fowlkes_mallows and calinsky_harabaz, at least, are not present in your imports.

]

# Metrics where permutations of labels dont change score( 0 and 1 exchchanged)
METRICS_PERMUTE_LABELS = ["homogeneity_score","v_measure_score",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true of all clustering in scikit-learn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. it should not have its own group. Just test this property on all.

]

# Input parameters can be both in the form of arrays and lists
METRICS_WITH_FORMAT_INVARIANCE = ["homogeneity_score","v_measure_score",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true of all clustering in scikit-learn

for name in CLASS_BASED_METRICS:
metric = ALL_METRICS[name]
assert_equal(metric([0, 0, 0, 0],[0, 1, 2, 3]),0.0)
assert_equal(metric([0, 0, 1, 1],[0, 0, 1, 1]),1.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this is testing being normalised, not class-based

metric = ALL_METRICS[name]
assert_equal(metric([0, 0, 0, 0],[0, 1, 2, 3]),0.0)
assert_equal(metric([0, 0, 1, 1],[0, 0, 1, 1]),1.0)
assert_equal(metric([0, 0, 1, 1],[0, 0, 1, 1]),1.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to be a duplicate

Copy link
Author

@anki08 anki08 Jan 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not run fowlkes_mallows and calinsky_harabaz on my laptop Therefore I did not add them .
ERROR :
NameError: name 'calinski_harabaz_score' is not defined

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You apparently have an old version of scikit-learn installed. You need to be working with the current master.

"normalized_mutual_info_score","adjusted_rand_score"
]

# If classes members are completely split across different clusters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get what the point of this group is. adjusted_rand_accuracy and mutual_info_score and fowlkes_mallows_score would also pass the test as it stands.

@jnothman
Copy link
Member

jnothman commented Jan 3, 2017

I would really appreciate if you used more meaningful commit messages. "Add files via upload" gives me no indication of what's changed. There are a lot of PRs to keep track of.

…Added fowlkes_mallows and calinsky_harabaz . Removed the groups for permute_labels and invariance_format
@jnothman
Copy link
Member

jnothman commented Jan 4, 2017

Please fix your flake8 issues. They make it much harder to focus on the actual work:

sklearn/metrics/cluster/tests/test_common.py:10:1: F401 'KMeans' imported but unused
from sklearn.cluster import KMeans
^
sklearn/metrics/cluster/tests/test_common.py:12:1: F401 '_jaccard' imported but unused
from sklearn.metrics.cluster.bicluster import _jaccard
^
sklearn/metrics/cluster/tests/test_common.py:13:1: F401 'consensus_score' imported but unused
from sklearn.metrics.cluster import consensus_score
^
sklearn/metrics/cluster/tests/test_common.py:17:1: F401 'contingency_matrix' imported but unused
from sklearn.metrics.cluster import contingency_matrix
^
sklearn/metrics/cluster/tests/test_common.py:19:1: F401 'expected_mutual_information' imported but unused
from sklearn.metrics.cluster import expected_mutual_information
^
sklearn/metrics/cluster/tests/test_common.py:20:1: F401 'homogeneity_completeness_v_measure' imported but unused
from sklearn.metrics.cluster import homogeneity_completeness_v_measure
^
sklearn/metrics/cluster/tests/test_common.py:26:1: F401 'silhouette_samples' imported but unused
from sklearn.metrics.cluster import silhouette_samples
^
sklearn/metrics/cluster/tests/test_common.py:30:1: F401 'assert_false' imported but unused
from sklearn.utils.testing import assert_false
^
sklearn/metrics/cluster/tests/test_common.py:31:1: F401 'assert_array_equal' imported but unused
from sklearn.utils.testing import assert_array_equal
^
sklearn/metrics/cluster/tests/test_common.py:32:1: F401 'assert_raises_regexp' imported but unused
from sklearn.utils.testing import assert_raises_regexp
^
sklearn/metrics/cluster/tests/test_common.py:33:1: F401 'assert_raise_message' imported but unused
from sklearn.utils.testing import assert_raise_message
^
sklearn/metrics/cluster/tests/test_common.py:35:1: F401 'assert_true' imported but unused
from sklearn.utils.testing import assert_true
^
sklearn/metrics/cluster/tests/test_common.py:46:80: E501 line too long (92 > 79 characters)
#   - SUPERVISED_METRICS: all supervised cluster metrics - (when given a ground truth value)
                                                                               ^
sklearn/metrics/cluster/tests/test_common.py:53:52: W291 trailing whitespace
# Metrics used to test similarity between bicluster
                                                   ^
sklearn/metrics/cluster/tests/test_common.py:55:33: E231 missing whitespace after ':'
    "adjusted_mutual_info_score":adjusted_mutual_info_score ,
                                ^
sklearn/metrics/cluster/tests/test_common.py:55:60: E203 whitespace before ','
    "adjusted_mutual_info_score":adjusted_mutual_info_score ,
                                                           ^
sklearn/metrics/cluster/tests/test_common.py:56:26: E231 missing whitespace after ':'
    "adjusted_rand_score":adjusted_rand_score,
                         ^
sklearn/metrics/cluster/tests/test_common.py:57:25: E231 missing whitespace after ':'
    "completeness_score":completeness_score,
                        ^
sklearn/metrics/cluster/tests/test_common.py:58:24: E231 missing whitespace after ':'
    "homogeneity_score":homogeneity_score,
                       ^
sklearn/metrics/cluster/tests/test_common.py:59:24: E231 missing whitespace after ':'
    "mutual_info_score":mutual_info_score,
                       ^
sklearn/metrics/cluster/tests/test_common.py:60:35: E231 missing whitespace after ':'
    "normalized_mutual_info_score":normalized_mutual_info_score,
                                  ^
sklearn/metrics/cluster/tests/test_common.py:61:22: E231 missing whitespace after ':'
    "v_measure_score":v_measure_score,
                     ^
sklearn/metrics/cluster/tests/test_common.py:62:28: E231 missing whitespace after ':'
    "fowlkes_mallows_score":fowlkes_mallows_score
                           ^
sklearn/metrics/cluster/tests/test_common.py:64:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:66:23: E231 missing whitespace after ':'
    "silhouette_score":silhouette_score,
                      ^
sklearn/metrics/cluster/tests/test_common.py:67:29: E231 missing whitespace after ':'
    "calinski_harabaz_score":calinski_harabaz_score
                            ^
sklearn/metrics/cluster/tests/test_common.py:69:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:81:1: E265 block comment should start with '# '
#---------------------------------------------------------------------
^
sklearn/metrics/cluster/tests/test_common.py:85:26: E231 missing whitespace after ','
    "adjusted_rand_score","v_measure_score",
                         ^
sklearn/metrics/cluster/tests/test_common.py:86:24: E231 missing whitespace after ','
    "mutual_info_score","adjusted_mutual_info_score",
                       ^
sklearn/metrics/cluster/tests/test_common.py:87:35: E231 missing whitespace after ','
    "normalized_mutual_info_score","fowlkes_mallows_score"
                                  ^
sklearn/metrics/cluster/tests/test_common.py:89:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:90:45: E203 whitespace before ','
NON_SYMMETRIC_METRICS = ["homogeneity_score" , "completeness_score"]
                                            ^
sklearn/metrics/cluster/tests/test_common.py:94:35: E231 missing whitespace after ','
    "normalized_mutual_info_score","v_measure_score",
                                  ^
sklearn/metrics/cluster/tests/test_common.py:97:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:98:38: W291 trailing whitespace
# Metrics with output between 0 and 1
                                     ^
sklearn/metrics/cluster/tests/test_common.py:100:26: E231 missing whitespace after ','
    "adjusted_rand_score","homogeneity_score","completeness_score",
                         ^
sklearn/metrics/cluster/tests/test_common.py:100:46: E231 missing whitespace after ','
    "adjusted_rand_score","homogeneity_score","completeness_score",
                                             ^
sklearn/metrics/cluster/tests/test_common.py:101:22: E231 missing whitespace after ','
    "v_measure_score","adjusted_mutual_info_score","fowlkes_mallows_score",
                     ^
sklearn/metrics/cluster/tests/test_common.py:101:51: E231 missing whitespace after ','
    "v_measure_score","adjusted_mutual_info_score","fowlkes_mallows_score",
                                                  ^
sklearn/metrics/cluster/tests/test_common.py:104:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:105:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:106:23: E231 missing whitespace after ','
def assert_between(var,score_1,score_2):
                      ^
sklearn/metrics/cluster/tests/test_common.py:106:31: E231 missing whitespace after ','
def assert_between(var,score_1,score_2):
                              ^
sklearn/metrics/cluster/tests/test_common.py:110:26: E231 missing whitespace after ','
    if assert_greater(var,score_1) and assert_less(var,score_2):
                         ^
sklearn/metrics/cluster/tests/test_common.py:110:55: E231 missing whitespace after ','
    if assert_greater(var,score_1) and assert_less(var,score_2):
                                                      ^
sklearn/metrics/cluster/tests/test_common.py:115:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:122:38: E231 missing whitespace after ','
        assert_almost_equal(metric(y1,y2),
                                     ^
sklearn/metrics/cluster/tests/test_common.py:123:38: E231 missing whitespace after ','
                            metric(y2,y1))
                                     ^
sklearn/metrics/cluster/tests/test_common.py:124:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:127:54: E231 missing whitespace after ','
        assert_almost_equal(metric([0, 1, 2, 5, 4, 9],[0, 1, 9, 4, 3, 5]),
                                                     ^
sklearn/metrics/cluster/tests/test_common.py:128:54: E231 missing whitespace after ','
                            metric([0, 1, 9, 4, 3, 5],[0, 1, 2, 5, 4, 9]))
                                                     ^
sklearn/metrics/cluster/tests/test_common.py:130:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:134:27: E225 missing whitespace around operator
        labels_a, labels_b=(np.ones(i, dtype=np.int),
                          ^
sklearn/metrics/cluster/tests/test_common.py:135:31: E127 continuation line over-indented for visual indent
                              np.arange(i, dtype=np.int))
                              ^
sklearn/metrics/cluster/tests/test_common.py:137:19: E225 missing whitespace around operator
            metric=SUPERVISED_METRICS_DICT[name]
                  ^
sklearn/metrics/cluster/tests/test_common.py:138:48: E231 missing whitespace after ','
            assert_almost_equal(metric(labels_a,labels_b), 0.0)
                                               ^
sklearn/metrics/cluster/tests/test_common.py:140:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:145:15: E225 missing whitespace around operator
        metric=SUPERVISED_METRICS_DICT[name]
              ^
sklearn/metrics/cluster/tests/test_common.py:146:49: E231 missing whitespace after ','
        assert_between(metric([0, 0, 0, 1, 1, 1],[0, 0, 0, 1, 2, 2]), 0.0, 1.0)
                                                ^
sklearn/metrics/cluster/tests/test_common.py:147:49: E231 missing whitespace after ','
        assert_between(metric([0, 0, 1, 1, 2, 2],[0, 0, 1, 1, 1, 1]), 0.0, 1.0)
                                                ^
sklearn/metrics/cluster/tests/test_common.py:148:42: E231 missing whitespace after ','
        assert_equal(metric(upper_bound_1,upper_bound_2), 1.0)
                                         ^
sklearn/metrics/cluster/tests/test_common.py:149:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:150:5: E265 block comment should start with '# '
    #For symmetric metrics the lower bound is defined
    ^
sklearn/metrics/cluster/tests/test_common.py:154:15: E225 missing whitespace around operator
        metric=SUPERVISED_METRICS_DICT[name]
              ^
sklearn/metrics/cluster/tests/test_common.py:155:42: E231 missing whitespace after ','
        assert_equal(metric(lower_bound_1,lower_bound_2), 0.0)
                                         ^
sklearn/metrics/cluster/tests/test_common.py:158:36: W291 trailing whitespace
# that is when 0 and 1 exchchanged.
                                   ^
sklearn/metrics/cluster/tests/test_common.py:159:1: E302 expected 2 blank lines, found 1
def test_permute_labels():
^
sklearn/metrics/cluster/tests/test_common.py:164:52: E231 missing whitespace after ','
        assert_almost_equal(metric(y_pred, y_label),metric(y_pred, 1-y_label))
                                                   ^
sklearn/metrics/cluster/tests/test_common.py:165:52: E231 missing whitespace after ','
        assert_almost_equal(metric(y_pred, y_label),metric(1-y_pred, y_label))
                                                   ^
sklearn/metrics/cluster/tests/test_common.py:166:52: E231 missing whitespace after ','
        assert_almost_equal(metric(y_pred, y_label),metric(1-y_pred, 1-y_label))
                                                   ^
sklearn/metrics/cluster/tests/test_common.py:166:80: E501 line too long (80 > 79 characters)
        assert_almost_equal(metric(y_pred, y_label),metric(1-y_pred, 1-y_label))
                                                                               ^
sklearn/metrics/cluster/tests/test_common.py:167:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:168:5: E265 block comment should start with '# '
    #Test for Silhouette_score
    ^
sklearn/metrics/cluster/tests/test_common.py:176:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:177:5: E265 block comment should start with '# '
    #Test for calinski_harabaz_score
    ^
sklearn/metrics/cluster/tests/test_common.py:189:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:190:1: W293 blank line contains whitespace

^
sklearn/metrics/cluster/tests/test_common.py:191:58: W291 trailing whitespace
# For ALL clustering metrics Input parameters can be both
                                                         ^
sklearn/metrics/cluster/tests/test_common.py:200:35: E231 missing whitespace after ','
        score_list = metric(list_a,list_b)
                                  ^
sklearn/metrics/cluster/tests/test_common.py:201:35: E231 missing whitespace after ','
        score_array = metric(arr_a,arr_b)
                                  ^
sklearn/metrics/cluster/tests/test_common.py:202:32: E231 missing whitespace after ','
        assert_equal(score_list,score_array)
                               ^
sklearn/metrics/cluster/tests/test_common.py:202:45: W292 no newline at end of file
        assert_equal(score_list,score_array)
                                            ^

@jnothman
Copy link
Member

jnothman commented Jan 4, 2017

maybe autopep8 would help on this occasion, though usually I'd recommend against it.

@jnothman
Copy link
Member

jnothman commented Jan 8, 2017

Thank you for that. Much appreciated.

If you feel like this is basically what you hope to have merged (with some likely changes under review), please change the [WIP] to [MRG]

@anki08 anki08 changed the title [WIP] Invariance tests for clustering metrics #8102 [MRG] Invariance tests for clustering metrics #8102 Jan 8, 2017
@anki08
Copy link
Author

anki08 commented Jan 8, 2017

Okay . Thanks a lot for your help .

@anki08
Copy link
Author

anki08 commented Jan 17, 2017

I can't figure out why the code is failing in CircleCI . Could You please help me @tguillemot

@tguillemot
Copy link
Contributor

@anki08 Don't waste your time the problem appears in several PR. It is not related to your work.

@tguillemot
Copy link
Contributor

@anki08 Ci problems seems solved now, can you try a rebase ?

@anki08
Copy link
Author

anki08 commented Jan 18, 2017

Its failing even now :

@jnothman
Copy link
Member

Don't worry about the CI.

@jnothman jnothman added this to the 0.20 milestone Jun 18, 2017
@jnothman
Copy link
Member

I think this may be mergeable, though we should merge in master to check that CI runs fine, and ideally another core dev should give it a quick look.

@amueller
Copy link
Member

what's with the weird sphinx error?

remove outdated comment
@jnothman
Copy link
Member

We might need to merge in master to get updated circle script

@jnothman
Copy link
Member

jnothman commented Feb 8, 2018

@glemaitre, want to review this?

@glemaitre
Copy link
Member

I look at it.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some pytest updates in fact. The tests themselves seem ok.

for name in SYMMETRIC_METRICS:
metric = SUPERVISED_METRICS[name]
assert_almost_equal(metric(y1, y2), metric(y2, y1),
err_msg="%s is not symmetric" % name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add:

... %s was expected to be symmetric.

for name in NON_SYMMETRIC_METRICS:
metric = SUPERVISED_METRICS[name]
assert_not_equal(metric(y1, y2), metric(y2, y1),
msg="%s is symmetric" % name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same suggestion as before but non symmetric

assert_not_equal(metric(y1, y2), metric(y2, y1),
msg="%s is symmetric" % name)

assert_equal(sorted(SYMMETRIC_METRICS + NON_SYMMETRIC_METRICS),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert sorted(SYMMETRIC_METRICS + NON_SYMMETRIC_METRICS) == sorted(SUPERVISED_METRICS)


for name in NON_SYMMETRIC_METRICS:
metric = SUPERVISED_METRICS[name]
assert_not_equal(metric(y1, y2), metric(y2, y1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: can we have surprises if there is some numerical error which do not trigger the equality.
In this regard, I would write:

assert metric(y1, y2) != pytest.approx(metric(y2, y1))

If we go in this direction I would suggest to replace assert_almost_equal by

assert x == pytest.approx(y)

upper_bound_2 = [0, 0, 0, 1, 1, 1]
for name in NORMALIZED_METRICS:
metric = SUPERVISED_METRICS[name]
assert_greater(metric([0, 0, 0, 1, 1], [0, 0, 0, 1, 2]), 0.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace:

  • assert_greater(x, y) by assert x > y
  • assert_less(x, y) by assert x < y
  • assert_equal(x, y) by assert x == y

score_1 = metric(y_pred, y_label)
assert_almost_equal(score_1, metric(1 - y_pred, y_label),
err_msg="%s failed labels permutation" % name)
assert_almost_equal(score_1, metric(1 - y_pred, 1 - y_label),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use approx

y_true = [0, 0, 0, 0, 1, 1, 1, 1]
y_pred = [0, 1, 2, 3, 4, 5, 6, 7]

def generate_formats(y):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can parametrize using pytest


for name in UNSUPERVISED_METRICS:
metric = UNSUPERVISED_METRICS[name]
X = np.random.randint(10, size=(8, 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

randint(..., dtype=np.float64) will avoid a astype afterwords

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups my comment is wrong

@jnothman
Copy link
Member

@glemaitre aside from cosmetics does this look good to you? I'd like to merge this so we can be more confident about PRs like #10827

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman The only thing that I can see is about the upper bound which is not tested while the lower bound is tested.

Otherwise, I am fine on the principle. Checking the other common test for metric, I see that we don't test:

  • infinite or nan input
  • single sample

Should it be considered in the tests?

assert_equal(metric(upper_bound_1, upper_bound_2), 1.0,
msg="%s has upper_bound greater than 1" % name)

lower_bound_1 = [0, 0, 0, 0, 0, 0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just by curiosity, shall test the upper bound as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad it was just above

metric = SUPERVISED_METRICS[name]
score = [metric(lower_bound_1, lower_bound_2),
metric(lower_bound_2, lower_bound_1)]
assert_true(0.0 in score,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_allclose


for name in UNSUPERVISED_METRICS:
metric = UNSUPERVISED_METRICS[name]
X = np.random.randint(10, size=(8, 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups my comment is wrong

@jnothman
Copy link
Member

jnothman commented Mar 18, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants