-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] multilabel accuracy with jaccard the index #1795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Had a quick look..looks cool so far. Nice work |
Thanks for you encouragements! :-) |
score = np.array([len(set(true) ^ set(pred)) == 0 | ||
for pred, true in zip(y_pred, y_true)]) | ||
|
||
elif similarity == "jaccard": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you create a new dedicated function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the multilabel litterature, subset accuracy and jaccard index accuracy are often simply called accuracy.
They will give the same score in the multiclass case.
And I think that it would reduce the amount of redundant code.
Note: the hamming loss metrics could be integrate in the accuracy score function with the hamming similarity.
Note2 : I got inspired to do this with the design of the precision, recall, f-score function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too would prefer a new function, say jaccard_accuracy
. Redundant code isn't the first worry: it's consistent and easy to use APIs. The implementation can always be changed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the feedback ! I will do as you propose.
What do you think of the interface of Given the new interface, I would like to merge the hamming loss function in the |
I have written the narrative doc and finished to implement the new jaccard metric. |
I am not able to retrieve the travis log :-( |
Travis should be happy now. |
I've rebased on top of master |
Reviews are welcomed. |
Nice! I'll have a read through this evening. Thanks |
Thanks :-) |
Read through this. Very nice PR.. |
Thanks for the review ! :-) |
>>> y_pred = [0, 2, 1, 3] | ||
>>> y_true = [0, 1, 2, 3] | ||
>>> jaccard_similarity_score(y_true, y_pred) | ||
0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If y_pred
and y_true
are sets, shouldn't the score be 1.0
in this case? (Both y_pred
and y_true
contains the same labels, modulo their order, but this shouldn't be of importance since we are talking about sets, should it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okat, I get it, y_true[i]
and y_pred[i]
are the sets to be compared. I think this should be make clearer then: jaccard_similarity_score
computes the average (?) Jaccard similarity (as you define it above) over several pairs of sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve the doc in 7267e3c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reads easier now 👍
Besides my comments above, this looks good to me. +1 after those are fixed. |
If everything is ok, I can squash and push. |
👍 |
Thanks a lot for the review !!! |
|
I'm looking for the bug. |
Oddly with python 2.6 and numpy 1.3, I got :
instead of
patch is coming. |
For reference, this is a new step to solve #558. |
This pr intends to bring multilabel accuracy and zero-one loss based on the jaccard index.
For reference, see section 7.1.1 of Mining Multi-label Data and the Wikipedia entry on Jaccard index.
TODO list:
This was removed of the pr scope: