-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Testing log_loss and hinge_loss under THRESHOLDED_METRICS #2717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@arjoly , Would you be able to review this? |
for name, metrics in THRESHOLDED_METRICS.items(): | ||
# Hinge_loss raises TypeError | ||
THRESHOLDED_METRICS_ITEMS = THRESHOLDED_METRICS.items() | ||
THRESHOLDED_METRICS_ITEMS.remove(('hinge_loss', hinge_loss)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't raise a ValueError
, then it should give proper results with string input. Can you add a test for that?
edit: see below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment like that ("raises TypeError") needs to be a bit more explanatory.
Also, exceptions in invariance tests are usually coded as:
for ... :
if exceptional condition:
continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add test to check that it raises TypeError
? It's strange, I though that log_loss support string label.
Thanks @jnothman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add test to check that it raises TypeError? It's strange, I though that log_loss support string label.
The docstring of los_loss
have the following example
>>> log_loss(["spam", "ham", "ham", "spam"], # doctest: +ELLIPSIS
... [[.1, .9], [.9, .1], [.8, .2], [.35, .65]])
Thus this test should pass for the log_loss
.
Note that a bug has glimpsed into the test and it shouldn't try to test with y2_str, but with y2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since y2_str
has non float/int values doing y_true*pred_decision raises TypeError.
EDIT:
Something similar to this.
>>> hinge_loss(["spam", "ham", "ham", "spam"], ["ham", "spam", "ham", "spam"] )
TypeError: unsupported operand type(s) for *: 'numpy.ndarray' and 'numpy.ndarray
It is hinge_loss by the way. log_loss does raise a ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not, it's consistent with the documentation. This features can be considered as not implemented (not tested means not implemented in my opinion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused what you are asking for, @arjoly. I think the test is correct at master, and hinge_loss
should also raise ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is
for name, metrics in THRESHOLDED_METRICS_ITEMS:
assert_raises(ValueError, metrics, y1_str, y2_str)
should be (or at least the following lines of code added)
for name, metrics in THRESHOLDED_METRICS_ITEMS:
assert_raises(ValueError, metrics, y1_str, y2)
Because y2_str can't be a valid input for thresholded metrics. The goal of the test_invariance_string_vs_numbers_labels
in my opinion is to check that metrics can handle string labels correctly.
If log_loss supports string input for labels, it should also be tested with y1_str.astype('O'). Either the invariance test could be enhanced, either one more assertion could be added in test_log_loss
.
Is it clearer?
I think the test is correct at master, and hinge_loss should also raise ValueError
Sorry, I wasn't clear. I wasn't talking about the exception to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's clearer. Though I don't understand whether or not you're suggesting it should actually raise an error when the first arg is strings and the second thresholds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I don't understand whether or not you're suggesting it should actually raise an error when the first arg is strings and the second thresholds.
If it doesn't work with string, it should raise an helpful error message as in #2723. Otherwise, It should be tested that it give the correct answer. :-)
@@ -265,7 +271,12 @@ def hinge_loss(y_true, pred_decision, pos_label=None, neg_label=None): | |||
else: | |||
y_true = LabelBinarizer(neg_label=-1).fit_transform(y_true)[:, 0] | |||
|
|||
margin = y_true * np.asarray(pred_decision) | |||
# Done to be homogeneous with the output of LabelBinarizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment is clear, and the code speaks well enough for itself, given the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman This was actually meant for the line
pred_decision = np.ravel(pred_decision)
below. I wrote this comment because, y_true
is a 1-D array after
y_true = LabelBinarizer(neg_label=-1).fit_transform(y_true)[:, 0]
So pred_decision
should also be 1-D for correct multiplication
+1 for merge, preferably with the two comments about |
Changes Unknown when pulling 9d1276b on Manoj-Kumar-S:test_log_loss into * on scikit-learn:master*. |
@jnothman : Any more comments or are you +1 with this? |
Yes, LGTM. |
T, Y = check_arrays(T, Y) | ||
if T.shape[1] != Y.shape[1]: | ||
raise ValueError("y_true and y_pred have different number of classes " | ||
"%d, %d" % (T.shape[1], Y.shape[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is missed in the line coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arjoly You mean a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be nice :-) to have an invariance test for that !!
I would also add a test with string object array for the log_loss function. |
except from the minor comment, LGTM |
@arjoly: A string object array is already tested. On Wed, Jan 8, 2014 at 6:14 PM, Arnaud Joly notifications@github.comwrote:
|
Sorry for the noise, can you point the testing to refresh my memory ? |
@arjoly Doesn't L1500-1501 in test_metrics.py(in master) test the case when y_true and y_pred are both strings. Also the new test for invariance test can be added under |
If you remember my previous comment,
will alway raise an exception since y_score must be an array of float. I think this a bug / misstake that I made. It should be instead
where y2 correspond to numerical scores.
Yes, if you plan to test only log_loss. |
Ah okay, Sorry I misinterpreted |
@arjoly : Please check the latest commit and can you merge if tests pass? |
Can you replace lines 1501-1503 by
in test_invariance_string_vs_numbers_labels`? |
Ah okay, now I understand what you mean. |
@arjoly : Done. |
@arjoly : Tests pass |
Cool thanks !! |
Testing log_loss and hinge_loss under THRESHOLDED_METRICS
Fixes Issue #2637