-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG] average parameter for jaccard_similarity_score #10083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/metrics/classification.py
Outdated
score = y_true == y_pred | ||
C = confusion_matrix(y_true, y_pred, sample_weight=sample_weight) | ||
den = C.sum(0) + C.sum(1) - C.diagonal() | ||
score = C.diagonal()/den |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave spaces around / please
sklearn/metrics/classification.py
Outdated
score = y_true == y_pred | ||
C = confusion_matrix(y_true, y_pred, sample_weight=sample_weight) | ||
den = C.sum(0) + C.sum(1) - C.diagonal() | ||
score = C.diagonal()/den | ||
|
||
return _weighted_sum(score, sample_weight, normalize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this still work??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't seem to understand as to why this might be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well in the previous code, score was an array of length n_samples. Now score is an array of length n_classes. It clearly cannot be weighted by sample_weight which is an array of length n_samples.
I can see that I am getting error like ======================================================================
ERROR: /home/gxyd/foss/scikit-learn/sklearn/metrics/tests/test_common.py.test_sample_weight_invariance:check_sample_weight_invariance(jaccard_similarity_score)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/gxyd/anaconda3/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/gxyd/foss/scikit-learn/sklearn/utils/testing.py", line 774, in __call__
return self.check(*args, **kwargs)
File "/home/gxyd/foss/scikit-learn/sklearn/utils/testing.py", line 309, in wrapper
return fn(*args, **kwargs)
File "/home/gxyd/foss/scikit-learn/sklearn/metrics/tests/test_common.py", line 957, in check_sample_weight_invariance
metric(y1, y2, sample_weight=np.ones(shape=len(y1))),
File "/home/gxyd/foss/scikit-learn/sklearn/metrics/classification.py", line 468, in jaccard_similarity_score
return _weighted_sum(score, sample_weight, normalize)
File "/home/gxyd/foss/scikit-learn/sklearn/metrics/classification.py", line 108, in _weighted_sum
return np.average(sample_score, weights=sample_weight)
File "/home/gxyd/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 944, in average
"Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ. |
I hope it wouldn't be a problem if I get back on this after my exams are over, I'll have more time then. |
sure, just let us know if time runs away and you'd like someone else to
complete it
|
I had some time, I tried to fix the issue. But still there seems to be some error FAIL: sklearn.metrics.tests.test_common.test_normalize_option_binary_classification
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/gxyd/anaconda3/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/gxyd/foss/scikit-learn/sklearn/metrics/tests/test_common.py", line 776, in test_normalize_option_binary_classification
/ n_samples, measure)
File "/home/gxyd/anaconda3/lib/python3.6/site-packages/numpy/testing/utils.py", line 539, in assert_almost_equal
raise AssertionError(_build_err_msg())
AssertionError:
Arrays are not almost equal to 7 decimals
ACTUAL: 0.037259615384615384
DESIRED: 0.37259615384615385
>> raise AssertionError(_build_err_msg())
======================================================================
FAIL: sklearn.metrics.tests.test_common.test_normalize_option_multiclass_classification
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/gxyd/anaconda3/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/gxyd/foss/scikit-learn/sklearn/metrics/tests/test_common.py", line 792, in test_normalize_option_multiclass_classification
/ n_samples, measure)
File "/home/gxyd/anaconda3/lib/python3.6/site-packages/numpy/testing/utils.py", line 539, in assert_almost_equal
raise AssertionError(_build_err_msg())
AssertionError:
Arrays are not almost equal to 7 decimals
ACTUAL: 0.016666666666666666
DESIRED: 0.083333333333333329
>> raise AssertionError(_build_err_msg()) I am guessing I need to modify 'other' test. I'll give some more time and see. |
There shouldn't be a normalize option on jaccard, as far as I'm
concerned... But I think you will need to change test_common to make sure
these tests only run for metrics that normalize over the sample size.
|
So I should remove `normalize` from the API of
`jaccard_similarity_score`? If so, I have seen that most of the metric
scores do have `normalize` in their API.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to reconsider how we're going about this. Even in the multilabel case it's appropriate to do this averaging across classes (as discussed in the issue), rather than the sample-wise measure we currently report. This is equivalent to the average=macro/samples distinction in P/R/F. The current implementation arguably is faithfully average=samples, even in the multiclass case. As in P/R/F with average=micro, it is only an interesting metric if a subset of classes is selected with a labels parameter, and as there, the binary case, where one class is important and the other disregarded, needs to be distinguished from the all-classes average. Normalize in the usual sense is only applicable if average=samples. Basically we need to make the interface the same as for precision_recall_fscore_support
, but perhaps retaining normalize in the average=samples case. Does that all make sense? Sorry it makes the task much larger. I hope you are able to nonetheless complete it!
BTW, are you using P/R/F for 'Precision/Recall/F1'? Also is this generally used this way in machine learning? (I searched on the web, but still couldn't find any reference to usage of P/R/F). |
yes, that is my abbreviation, i suppose. but you will often see results
reported with columns named P, R and F, at least in NLP...
|
There is probably an issue with a docstring example (it is not printed correctly, may be even not tested), here http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html (look at the bottom). I tried building docs locally using
|
Okay. For the last message I got some help on gitter, and I have addressed my own last comment. I'll now try to address your comment. |
I've tried to move in that direction. I'll add tests and make documentation changes, its getting a little late for today. |
This pull request introduces 1 alert - view on lgtm.com new alerts:
Comment posted by lgtm.com |
I'll need to think about multi-label case, about whether we can have all three types of 'averages' (i.e. macro, micro, weighted). Am I doing things right? |
Should I mention any references to point to the current definitions of changes made in Jaccard index? I remember reading on the issue that the current references don't suffice for the 'correct definition' of Jaccard index. |
I think your doc building issue is just that you've not got the dev version
of scikit-learn on your path (e.g. pip install --editable
~/path/to/scikit-learn)
…On 25 November 2017 at 22:07, Gaurav Dhingra ***@***.***> wrote:
Should I mention any references to point to the current definitions of
changes made in Jaccard index? I remember reading on the issue that the
current references don't suffice for the 'correct definition' of Jaccard
index.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10083 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65xXfWfX4AR19G2KxaKLpcF-kxpGks5s5_UJgaJpZM4QVXCl>
.
|
I fixed the issue of doc building though. I don't seem to be having
problem with it now, I got help on gitter.
|
@jnothman I am not able figure out how do I handle the case of multi-label case. Things like, what should happen if the user passes |
micro is easy in the multilabel case. I'm not sure it makes much sense in the multiclass case. In the multilabel case, it's identical to treating it all as one binary problem: you divide total true positives over total (tp + fp + fn). |
Can you please explain the meaning of multi-label problem in scikit-learn. I am looking at an example, for this I don't understand the meaning of I tried searching but haven't found understandable explanation for this on the docs. If you could direct me to docs link, that had be great. |
think of each column as a category or a set that the instance is either in
or not. see for instance the Reuters rcv1 data.
|
doesn't seem ready and not a regression from what I can tell. untagging for 0.20. complain if you disagree. |
This PR is ready, and it fixes a bad implementation (i.e. not a regression,
but against expectations), but it is not pretty. Using multilabel confusion
matrix in #11179 is a much cleaner way of coding the same.
|
We can now make use of |
(Also, the changelog entry will need to move to |
Yay! tests pass. I might yet make a few tweaks myself, but I'd be keen on others' review here. I consider this a bug fix as well as substantial enhancement to Jaccard. |
Is this related to the cron failure in master? I hadn't seen that one before :-/ |
I don't see how this can relate to that travis failure. That's jaccard as a distance metric, this is jaccard as an evaluation metric. Seeing as that failure is on Cron, and it's related to Cython code, perhaps it's a change there. |
I'm going to pull this into a new PR, just to clear out cobwebs and make it clear that I'm championing it and awaiting review. |
Superseded by #13092 |
Reference Issues/PRs
Fixes #7332
What does this implement/fix? Explain your changes.
Fixes the issue of 'jaccard similarity' being same as accuracy score for multi-class classification problem.
Any other comments?