-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[WIP] ENH Multilabel confusion matrix #10628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request fixes 2 alerts - view on lgtm.com fixed alerts:
Comment posted by lgtm.com |
This pull request fixes 2 alerts - view on lgtm.com fixed alerts:
Comment posted by lgtm.com |
This pull request fixes 2 alerts - view on lgtm.com fixed alerts:
Comment posted by lgtm.com |
This pull request fixes 2 alerts when merging 542ec86 into e78263f - view on lgtm.com fixed alerts:
Comment posted by lgtm.com |
Hi @jnothman , I am continuing your work on this. But I am not familiar with the codecov thing, this check seems to be failing? Do I need to fix this? |
benchmark means seeing if this is as fast or faster than the existing
precision/recall implementation.
codecov tells you if there are tests that run every line of new code. there
should be
|
yes, it looks a lot slower, at least in some cases. can you profile and
work out where it's much slower?
sanple_weight should be 1d. 2d should raise an exception
|
This PR considers a helper for multilabel/set-wise evaluation metrics such as precision, recall, fbeta, jaccard (#10083), fall-out, miss rate and specificity (#5516). It also incorporates suggestions from #8126 regarding efficiency of multilabel true positives calculation (but does not optimise for micro-average, perhaps unfortunately). Unlike
confusion_matrix
it is optimised for the multilabel case, but also handles multiclass problems like they are handled inprecision_recall_fscore_support
: as binarised OvR problems.It benefits us by simplifying the
precision_recall_fscore_support
and future jaccard implementations greatly, and allows for further refactors between them. It also benefits us by making a clear calculation of sufficient statistics (although perhaps more statistics than necessary) from which standard metrics are a simple calculation: it makes the code less mystifying. In that sense, this is mostly a cosmetic change, but it provides users with the ability to easily generalise the P/R/F/S implementation to related metrics.TODO:
multilabel_confusion_matrix
and use it inprecision_recall_fscore_support
as an indirect form of testingmultilabel_confusion_matrix
multilabel_confusion_matrix
If another contributor would like to take this on, I would welcome it. I have marked this as Easy because the code and technical knowledge involved is not hard, but it will take a bit of work, and clarity of understanding.