Skip to content

Conversation

gxyd
Copy link
Contributor

@gxyd gxyd commented Nov 7, 2017

Reference Issues/PRs

Fixes #7332

What does this implement/fix? Explain your changes.

Fixes the issue of 'jaccard similarity' being same as accuracy score for multi-class classification problem.

Any other comments?

@gxyd gxyd changed the title multiclass jaccard similarity not equal to accurary_score [MRG] multiclass jaccard similarity not equal to accurary_score Nov 7, 2017
score = y_true == y_pred
C = confusion_matrix(y_true, y_pred, sample_weight=sample_weight)
den = C.sum(0) + C.sum(1) - C.diagonal()
score = C.diagonal()/den
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave spaces around / please

score = y_true == y_pred
C = confusion_matrix(y_true, y_pred, sample_weight=sample_weight)
den = C.sum(0) + C.sum(1) - C.diagonal()
score = C.diagonal()/den

return _weighted_sum(score, sample_weight, normalize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this still work??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't seem to understand as to why this might be a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in the previous code, score was an array of length n_samples. Now score is an array of length n_classes. It clearly cannot be weighted by sample_weight which is an array of length n_samples.

@gxyd
Copy link
Contributor Author

gxyd commented Nov 8, 2017

I can see that I am getting error like

======================================================================
ERROR: /home/gxyd/foss/scikit-learn/sklearn/metrics/tests/test_common.py.test_sample_weight_invariance:check_sample_weight_invariance(jaccard_similarity_score)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gxyd/anaconda3/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/gxyd/foss/scikit-learn/sklearn/utils/testing.py", line 774, in __call__
    return self.check(*args, **kwargs)
  File "/home/gxyd/foss/scikit-learn/sklearn/utils/testing.py", line 309, in wrapper
    return fn(*args, **kwargs)
  File "/home/gxyd/foss/scikit-learn/sklearn/metrics/tests/test_common.py", line 957, in check_sample_weight_invariance
    metric(y1, y2, sample_weight=np.ones(shape=len(y1))),
  File "/home/gxyd/foss/scikit-learn/sklearn/metrics/classification.py", line 468, in jaccard_similarity_score
    return _weighted_sum(score, sample_weight, normalize)
  File "/home/gxyd/foss/scikit-learn/sklearn/metrics/classification.py", line 108, in _weighted_sum
    return np.average(sample_score, weights=sample_weight)
  File "/home/gxyd/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 944, in average
    "Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.

@gxyd
Copy link
Contributor Author

gxyd commented Nov 9, 2017

I hope it wouldn't be a problem if I get back on this after my exams are over, I'll have more time then.

@jnothman
Copy link
Member

jnothman commented Nov 9, 2017 via email

@gxyd
Copy link
Contributor Author

gxyd commented Nov 10, 2017

I had some time, I tried to fix the issue. But still there seems to be some error

FAIL: sklearn.metrics.tests.test_common.test_normalize_option_binary_classification
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gxyd/anaconda3/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/gxyd/foss/scikit-learn/sklearn/metrics/tests/test_common.py", line 776, in test_normalize_option_binary_classification
    / n_samples, measure)
  File "/home/gxyd/anaconda3/lib/python3.6/site-packages/numpy/testing/utils.py", line 539, in assert_almost_equal
    raise AssertionError(_build_err_msg())
AssertionError: 
Arrays are not almost equal to 7 decimals
 ACTUAL: 0.037259615384615384
 DESIRED: 0.37259615384615385
>>  raise AssertionError(_build_err_msg())
    

======================================================================
FAIL: sklearn.metrics.tests.test_common.test_normalize_option_multiclass_classification
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gxyd/anaconda3/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/gxyd/foss/scikit-learn/sklearn/metrics/tests/test_common.py", line 792, in test_normalize_option_multiclass_classification
    / n_samples, measure)
  File "/home/gxyd/anaconda3/lib/python3.6/site-packages/numpy/testing/utils.py", line 539, in assert_almost_equal
    raise AssertionError(_build_err_msg())
AssertionError: 
Arrays are not almost equal to 7 decimals
 ACTUAL: 0.016666666666666666
 DESIRED: 0.083333333333333329
>>  raise AssertionError(_build_err_msg())

I am guessing I need to modify 'other' test. I'll give some more time and see.

@jnothman
Copy link
Member

jnothman commented Nov 11, 2017 via email

@gxyd
Copy link
Contributor Author

gxyd commented Nov 19, 2017 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to reconsider how we're going about this. Even in the multilabel case it's appropriate to do this averaging across classes (as discussed in the issue), rather than the sample-wise measure we currently report. This is equivalent to the average=macro/samples distinction in P/R/F. The current implementation arguably is faithfully average=samples, even in the multiclass case. As in P/R/F with average=micro, it is only an interesting metric if a subset of classes is selected with a labels parameter, and as there, the binary case, where one class is important and the other disregarded, needs to be distinguished from the all-classes average. Normalize in the usual sense is only applicable if average=samples. Basically we need to make the interface the same as for precision_recall_fscore_support, but perhaps retaining normalize in the average=samples case. Does that all make sense? Sorry it makes the task much larger. I hope you are able to nonetheless complete it!

@gxyd
Copy link
Contributor Author

gxyd commented Nov 23, 2017

BTW, are you using P/R/F for 'Precision/Recall/F1'? Also is this generally used this way in machine learning? (I searched on the web, but still couldn't find any reference to usage of P/R/F).

@jnothman
Copy link
Member

jnothman commented Nov 23, 2017 via email

@gxyd
Copy link
Contributor Author

gxyd commented Nov 24, 2017

There is probably an issue with a docstring example (it is not printed correctly, may be even not tested), here http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html (look at the bottom).

I tried building docs locally using make html in doc directory, but I get an error

Exception occurred:
  File "/home/gxyd/anaconda3/lib/python3.6/site-packages/sphinx_gallery/gen_gallery.py", line 322, in sumarize_failing_examples
    "\n" + "-" * 79)
ValueError: Here is a summary of the problems encountered when running the examples

Unexpected failing examples:
/home/gxyd/foss/scikit-learn/examples/ensemble/plot_feature_transformation.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/gxyd/anaconda3/lib/python3.6/site-packages/sphinx_gallery/gen_rst.py", line 450, in execute_code_block
    exec(code_block, example_globals)
  File "<string>", line 15, in <module>
ImportError: cannot import name 'CategoricalEncoder'


/home/gxyd/foss/scikit-learn/examples/ensemble/plot_gradient_boosting_early_stopping.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/gxyd/anaconda3/lib/python3.6/site-packages/sphinx_gallery/gen_rst.py", line 450, in execute_code_block
    exec(code_block, example_globals)
  File "<string>", line 40, in <module>
TypeError: __init__() got an unexpected keyword argument 'validation_fraction'


-------------------------------------------------------------------------------
The full traceback has been saved in /tmp/sphinx-err-nnff1ycd.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
make: *** [html] Error 1

@gxyd
Copy link
Contributor Author

gxyd commented Nov 24, 2017

Okay. For the last message I got some help on gitter, and I have addressed my own last comment. I'll now try to address your comment.

@gxyd
Copy link
Contributor Author

gxyd commented Nov 24, 2017

I've tried to move in that direction. I'll add tests and make documentation changes, its getting a little late for today.

@jnothman
Copy link
Member

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Variable defined multiple times

Comment posted by lgtm.com

@gxyd
Copy link
Contributor Author

gxyd commented Nov 25, 2017

I'll need to think about multi-label case, about whether we can have all three types of 'averages' (i.e. macro, micro, weighted).

Am I doing things right?

@gxyd
Copy link
Contributor Author

gxyd commented Nov 25, 2017

Should I mention any references to point to the current definitions of changes made in Jaccard index? I remember reading on the issue that the current references don't suffice for the 'correct definition' of Jaccard index.

@jnothman
Copy link
Member

jnothman commented Nov 26, 2017 via email

@gxyd
Copy link
Contributor Author

gxyd commented Nov 26, 2017 via email

@gxyd
Copy link
Contributor Author

gxyd commented Nov 26, 2017

@jnothman I am not able figure out how do I handle the case of multi-label case. Things like, what should happen if the user passes average=micro and the problem is a multi-label case problem?

@jnothman
Copy link
Member

micro is easy in the multilabel case. I'm not sure it makes much sense in the multiclass case. In the multilabel case, it's identical to treating it all as one binary problem: you divide total true positives over total (tp + fp + fn).

@gxyd
Copy link
Contributor Author

gxyd commented Nov 26, 2017

Can you please explain the meaning of multi-label problem in scikit-learn. I am looking at an example, for this I don't understand the meaning of y_true = np.array([[0, 1], [1, 1]]) and y_pred = np.array([[1, 1], [1, 1]]), what does this input represent in a physical machine learning problem?

I tried searching but haven't found understandable explanation for this on the docs. If you could direct me to docs link, that had be great.

@jnothman
Copy link
Member

jnothman commented Nov 26, 2017 via email

@jnothman jnothman added the Bug label May 30, 2018
@glemaitre glemaitre added this to the 0.20 milestone Jun 8, 2018
@amueller
Copy link
Member

doesn't seem ready and not a regression from what I can tell. untagging for 0.20. complain if you disagree.

@amueller amueller modified the milestones: 0.20, 0.21 Jul 20, 2018
@jnothman
Copy link
Member

jnothman commented Jul 22, 2018 via email

@jnothman
Copy link
Member

We can now make use of multilabel_confusion_matrix to make the implementation here very small (see precision_recall_fscore_support now). Are you interested in changing the implementation, @gxyd, or should I get someone else to?

@jnothman
Copy link
Member

(Also, the changelog entry will need to move to doc/whats_new/v0.21.rst)

@jnothman
Copy link
Member

Yay! tests pass. I might yet make a few tweaks myself, but I'd be keen on others' review here. I consider this a bug fix as well as substantial enhancement to Jaccard.

@amueller
Copy link
Member

Is this related to the cron failure in master?
https://travis-ci.org/scikit-learn/scikit-learn/builds/457399104?utm_source=github_status&utm_medium=notification

I hadn't seen that one before :-/

@jnothman
Copy link
Member

I don't see how this can relate to that travis failure. That's jaccard as a distance metric, this is jaccard as an evaluation metric. Seeing as that failure is on Cron, and it's related to Cython code, perhaps it's a change there.

@jnothman
Copy link
Member

jnothman commented Feb 4, 2019

I'm going to pull this into a new PR, just to clear out cobwebs and make it clear that I'm championing it and awaiting review.

@jnothman
Copy link
Member

jnothman commented Feb 4, 2019

Superseded by #13092

@jnothman jnothman closed this Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multiclass jaccard_similarity_score should not be equal to accuracy_score
5 participants