Skip to content

[MRG] Add unicode support to sklearn.metrics.classification_report #2462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 22, 2013

Conversation

kmike
Copy link
Contributor

@kmike kmike commented Sep 20, 2013

sklearn.metrics.classification_report with unicode labels was broken in Python 2.x because of how '{0}'.format works.

'{0}'.format(arg) doesn't promote the whole string to unicode if arg is unicode - it tries to encode arg to sys.getdefaultencoding() instead.
"%s" doesn't have this gotcha.
@GaelVaroquaux
Copy link
Member

Looks good to me. Merging. Thanks a lot.

GaelVaroquaux added a commit that referenced this pull request Sep 22, 2013
[MRG] Add unicode support to sklearn.metrics.classification_report
@GaelVaroquaux GaelVaroquaux merged commit 3b8c0a1 into scikit-learn:master Sep 22, 2013
@glouppe
Copy link
Contributor

glouppe commented Sep 25, 2013

Jenkins seems to fail since this PR has been merged.

FAIL: sklearn.metrics.tests.test_metrics.test_classification_report_multiclass_with_unicode_label
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/slave/virtualenvs/cpython-2.6/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "<https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/ws/sklearn/metrics/tests/test_metrics.py",> line 856, in test_classification_report_multiclass_with_unicode_label
    assert_equal(report, expected_report)
AssertionError: u'             precision    recall  f1-score   support\n\n      blue\xa2       0.78      0.44      0.56        16\n     green\xa2       0.52      0.31      0.39        39\n       red\xa2       0.42      0.90      0.57        20\n\navg / total       0.55      0.49      0.47        75\n' != u'             precision    recall  f1-score   support\n\n      blue\xa2       0.83      0.79      0.81        24\n     green\xa2       0.33      0.10      0.15        31\n       red\xa2       0.42      0.90      0.57        20\n\navg / total       0.51      0.53      0.47        75\n'
>>  raise self.failureException, \
          (None or '%r != %r' % (u'             precision    recall  f1-score   support\n\n      blue\xa2       0.78      0.44      0.56        16\n     green\xa2       0.52      0.31      0.39        39\n       red\xa2       0.42      0.90      0.57        20\n\navg / total       0.55      0.49      0.47        75\n', u'             precision    recall  f1-score   support\n\n      blue\xa2       0.83      0.79      0.81        24\n     green\xa2       0.33      0.10      0.15        31\n       red\xa2       0.42      0.90      0.57        20\n\navg / total       0.51      0.53      0.47        75\n'))

Any idea? CC: @kmike @arjoly @larsmans @jnothman

@@ -1138,6 +1159,10 @@ def test_invariance_string_vs_numbers_labels():
labels_str = ["eggs", "spam"]

for name, metric in CLASSIFICATION_METRICS.items():
if isinstance(metric, partial) and 'labels' in metric.keywords:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, why doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 tests that run on ALL_METRICS; two other tests support classification metrics without 'labels' argument, this one doesn't support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I wrong? I wrote some code to support this see line 1168 and beyond. But it is true that your solution is smarter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you proposing? Check that fails is

        assert_array_equal(measure_with_number, measure_with_str,
                           err_msg="{0} failed string vs number invariance "
                                   "test".format(name))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smarter test code is usually a worse test code :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, we assume that the user put the appropriate labels in the labels argument.
If labels is None, we try to infer it the best we can.
(This could lead to some problems see #2029 and the discussion in #2094)

So "I'm not sure what is correct behavior here. y values are "egg" and "spam" and labels are [1, 2, 3]"
is undetermined behavior at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it was correct to skip that test, at least now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct action would be to delete "confusion_matrix_with_labels" from CLASSIFICATION_METRICS.

By adding this line if isinstance(metric, partial) and 'labels' in metric.keywords:, you skip any metric define
in CLASSIFICATION_METRICS with a partial and a label keyword. It means that you also skip test for macro/micro/weighted precision/recall/f-score.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it skips those tests: keywords are keyword arguments passed to functool.partial function, not arguments of the decorated function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right on this. Sorry.

@kmike
Copy link
Contributor Author

kmike commented Sep 25, 2013

Interesting: this is not just a formatting issue, numbers in classification report are different. And I've just copied a first part of test_classification_report_multiclass_with_string_label into that test and perform one seemingly innocent tweak.

It seems that the only difference that can cause this failure is

    y_true = np.array(["blue", "green", "red"])[y_true]
    y_pred = np.array(["blue", "green", "red"])[y_pred]

vs

    labels = np.array([u("blue\xa2"), u("green\xa2"), u("red\xa2")])
    y_true = labels[y_true]
    y_pred = labels[y_pred]

@@ -710,9 +711,9 @@ def test_precision_recall_f1_score_multiclass_pos_label_none():
def test_zero_precision_recall():
"""Check that pathological cases do not bring NaNs"""

try:
old_error_settings = np.seterr(all='raise')
old_error_settings = np.seterr(all='raise')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we could have used with np.errstate.

@kmike
Copy link
Contributor Author

kmike commented Sep 28, 2013

I think that what cause the failure is bytes vs unicode change itself - it turns out that LabelEncoder works incorrectly for unicode values under Python 2.6 + numpy 1.3, and this error propagates up to classification_report.

@arjoly
Copy link
Member

arjoly commented Sep 28, 2013

Thanks @kmike for investigating !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants