-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA Confusion matrix derived metrics #19556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9dbfbc8
64a5a7b
523eaa0
b977216
5a061ef
6493977
141fa4a
9615ae8
79e1562
8f21052
3ffd830
fb73c6e
c780053
408c2db
4adfe2e
53d6fd2
88c41af
a5b5262
f74fc10
44f8b7d
56132a3
18074d2
995a65c
88b21e6
769b3af
edd6ff0
b8100d7
aeed5e5
dc5423f
9c1a0a4
2e8efc9
91b4479
867e287
8473832
f29b0c9
6311b30
50125d7
bae4c36
5a391fa
07b9a68
366a113
9214303
1e5159f
8fc7ec0
459708f
ff3cb6a
b6167ec
8560720
2b8efcc
bea1cb0
d150e06
cf36e6a
aa76f66
3e6ecd6
54b5bfb
eb56ee0
d2d150b
66a6bb4
ff585f1
ff79159
83f2788
657eac7
0426087
636f87c
356659c
71cf7cf
fa77a96
dc81e7e
2dc9b25
23f6d01
c7084c4
77421ae
5027a6a
8b8a196
483178d
34849f6
ecc055a
136130f
60e27ed
20b87a2
29b682c
0791962
d04a4e2
3b5853c
18361ea
797c69f
d1701fb
dc2602f
e61eb88
6daf00d
d253c58
e2544f0
f38be61
ba5f5ea
3199d00
244920a
4e9a3ee
93c5293
1b5611a
21065f4
2c621a6
4ff05d1
264731e
1806ad0
955a286
d3798f5
90f2bb0
cc3dc7a
3ac784c
0958afb
79473c8
83e84df
bdf7920
81573d5
e75c3fa
1a3068c
e4c8cc8
cfb6a62
eb9e340
8687508
13e7f98
1f60a9c
b7b2e0f
e2801db
713ac2a
0d87bf0
3cb2c1c
f31bd33
ce20ed1
75edeed
831d424
221d7ca
e63212f
41d5359
32b0908
68c9dc2
0d66c27
8b29be7
7ce9cd5
6baffdb
27c9f2f
c5ad594
b0fbdd1
5b4b5e8
5c5d579
0172034
c7bc276
a0b9cfc
6b4cdc3
d36b6b1
88d4382
71f9642
23d354e
ad868f3
1c94949
644e9e3
843625f
f7e7b8b
6daa85e
52fc7ab
ae55a07
602a690
31bd50c
b036a50
5468e46
aa8e1cd
b175784
d51cba2
4f89b18
452b986
1c839e5
3660888
1787120
9826317
7e99660
7f74ff6
0718030
19ec849
a56d590
4eef595
5b7bd15
801406e
861c59a
7df0be9
7f14b02
8c3aa4e
1155f2f
9e3fa26
29769a7
86fa01a
c4be363
bdf83b2
582387c
4aa4449
a41837c
918c720
7877673
9da387c
d7e513e
5ee31a5
d4bbd82
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -227,6 +227,8 @@ Scoring string name Function | |
'precision' etc. :func:`metrics.precision_score` suffixes apply as with 'f1' | ||
'recall' etc. :func:`metrics.recall_score` suffixes apply as with 'f1' | ||
'jaccard' etc. :func:`metrics.jaccard_score` suffixes apply as with 'f1' | ||
'specificity' etc. :func:`metrics.specificity_score` suffixes apply as with 'f1' | ||
'npv' etc. :func:`metrics.npv_score` suffixes apply as with 'f1' | ||
'roc_auc' :func:`metrics.roc_auc_score` | ||
'roc_auc_ovr' :func:`metrics.roc_auc_score` | ||
'roc_auc_ovo' :func:`metrics.roc_auc_score` | ||
|
@@ -536,6 +538,8 @@ Some also work in the multilabel case: | |
precision_recall_fscore_support | ||
precision_score | ||
recall_score | ||
specificity_score | ||
npv_score | ||
roc_auc_score | ||
zero_one_loss | ||
d2_log_loss_score | ||
|
@@ -603,7 +607,6 @@ The :func:`accuracy_score` function computes the | |
`accuracy <https://en.wikipedia.org/wiki/Accuracy_and_precision>`_, either the fraction | ||
(default) or the count (normalize=False) of correct predictions. | ||
|
||
|
||
In multilabel classification, the function returns the subset accuracy. If | ||
the entire set of predicted labels for a sample strictly match with the true | ||
set of labels, then the subset accuracy is 1.0; otherwise it is 0.0. | ||
|
@@ -742,7 +745,7 @@ or *informedness*. | |
|
||
* Our definition: [Mosley2013]_, [Kelleher2015]_ and [Guyon2015]_, where | ||
[Guyon2015]_ adopt the adjusted version to ensure that random predictions | ||
have a score of :math:`0` and perfect predictions have a score of :math:`1`.. | ||
have a score of :math:`0` and perfect predictions have a score of :math:`1`. | ||
* Class balanced accuracy as described in [Mosley2013]_: the minimum between the precision | ||
and the recall for each class is computed. Those values are then averaged over the total | ||
number of classes to get the balanced accuracy. | ||
|
@@ -855,6 +858,42 @@ false negatives and true positives as follows:: | |
for an example of using a confusion matrix to classify text | ||
documents. | ||
|
||
.. _tpr_fpr_tnr_fnr_score: | ||
|
||
TPR FPR TNR FNR score | ||
--------------------- | ||
|
||
The :func:`tpr_fpr_tnr_fnr_score` function computes the true positive rate (TPR), | ||
false positive rate (FPR), true negative rate (TNR) and false negative rate (FNR) | ||
of predictions, based on the `confusion matrix <https://en.wikipedia.org/wiki/Confusion_matrix>`_. | ||
The rates are defined as | ||
|
||
.. math:: | ||
|
||
\texttt{TPR} = \frac{TP}{P}} = \frac{TP}{TP + FN}} = 1 - FNR | ||
|
||
\texttt{FPR} = \frac{FP}{N}} = \frac{FP}{TN + FP}} = 1 - TNR | ||
|
||
\texttt{TNR} = \frac{TN}{N}} = \frac{TN}{TN + FP}} = 1 - FPR | ||
|
||
\texttt{FNR} = \frac{FN}{P}} = \frac{FN}{TP + FN}} = 1 - TPR | ||
|
||
>>> from sklearn.metrics import tpr_fpr_tnr_fnr_score | ||
>>> y_true = [2, 0, 2, 2, 0, 1] | ||
>>> y_pred = [0, 0, 2, 2, 0, 2] | ||
>>> tpr_fpr_tnr_fnr_score(y_true, y_pred) | ||
(array([1. , 0. , 0.66666667]), | ||
array([0.25 , 0. , 0.33333333]), | ||
array([0.75 , 1. , 0.66666667]), | ||
array([0. , 1. , 0.33333333])) | ||
|
||
.. note:: | ||
|
||
* True positive rate (TPR) is also called recall, sensitivity, or hit rate. | ||
* False positive rate (FPR) is also called fall-out. | ||
* True negative rate (TNR) is also called specificity, or selectivity. | ||
* false negative rate (FNR) is also called miss rate. | ||
|
||
.. _classification_report: | ||
|
||
Classification report | ||
|
@@ -1006,6 +1045,18 @@ precision-recall curve as follows. | |
:scale: 75 | ||
:align: center | ||
|
||
Precision can also be referred to as the `positive predictive value (PPV) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am thinking that we could isolate these two metrics in a new section where we can provide more details regarding binary and multiclass case and the effect of the averaging similarly to the precision-recall section There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Attempted. |
||
<https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values>`_, | ||
e.g. in the context of bioscience. A closely related metric is | ||
`negative predictive value (NPV) <https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values>`_ | ||
, implemented by the :func:`npv_score`. | ||
|
||
Recall can also be called the hit rate, or true positive rate (TPR). Especially | ||
in biostatistics, it is also known as `sensitivity <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>`_ | ||
, which is related to `specificity <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>`_. | ||
In turn, specificity is also referred to as selectivity, or true negative rate (TNR), | ||
and is implemented by the :func:`specificity_score`. | ||
|
||
.. rubric:: Examples | ||
|
||
* See :ref:`sphx_glr_auto_examples_model_selection_plot_grid_search_digits.py` | ||
|
@@ -1044,10 +1095,10 @@ following table: | |
+-------------------+------------------------------------------------+ | ||
| | Actual class (observation) | | ||
+-------------------+---------------------+--------------------------+ | ||
| Predicted class | tp (true positive) | fp (false positive) | | ||
| Predicted class | TP (true positive) | FP (false positive) | | ||
| (expectation) | Correct result | Unexpected result | | ||
| +---------------------+--------------------------+ | ||
| | fn (false negative) | tn (true negative) | | ||
| | FN (false negative) | TN (true negative) | | ||
| | Missing result | Correct absence of result| | ||
+-------------------+---------------------+--------------------------+ | ||
|
||
|
@@ -1117,10 +1168,9 @@ Here are some small examples in binary classification:: | |
>>> average_precision_score(y_true, y_scores) | ||
0.83... | ||
|
||
|
||
|
||
Multiclass and multilabel classification | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In a multiclass and multilabel classification task, the notions of precision, | ||
recall, and F-measures can be applied to each label independently. | ||
There are a few ways to combine results across labels, | ||
|
@@ -1994,6 +2044,59 @@ the same does a lower Brier score loss always mean better calibration" | |
and probability estimation." <https://drops.dagstuhl.de/opus/volltexte/2008/1382/>`_ | ||
Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2008). | ||
|
||
.. _true_negatives_metrics: | ||
|
||
Specificity and negative predictive value (NPV) | ||
----------------------------------------------- | ||
|
||
`Specificity <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>`_ | ||
(also called selectivity or true negative rate) and | ||
`NPV <https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values>`_ | ||
are both ratios of true negatives to, respectively, actual negatives and | ||
predicted negatives in a classification task. | ||
|
||
Binary classification | ||
^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In a binary classification task, specificity and NPV are defined simply as | ||
|
||
..math:: | ||
|
||
\text{specificity} = \frac{TN}{N}} = \frac{TN}{TN + FP}} | ||
|
||
\text{NPV} = \frac{TN}{TN + FN}} | ||
|
||
Multiclass and multilabel classification | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In a multiclass or multilabel classification task, the notions of specificity | ||
and NPV can be applied to each label independently. There are a few ways | ||
to combine results across labels, specified by the ``average`` argument | ||
to the :func:`specificity_score` and :func:`npv_score` functions, as described | ||
:ref:`above <average>`. | ||
|
||
To make this more explicit, consider the following examples: | ||
>>> from sklearn.metrics import specificity_score | ||
>>> from sklearn.metrics import npv_score | ||
>>> y_true = [2, 0, 2, 2, 0, 1] | ||
>>> y_pred = [0, 0, 2, 2, 0, 2] | ||
>>> specificity_score(y_true, y_pred, average=None) | ||
array([0.75 , 1. , 0.66666667]) | ||
>>> npv_score(y_true, y_pred, average=None) | ||
array([1. , 0.83333333, 0.66666667]) | ||
>>> specificity_score(y_true, y_pred, average='macro') | ||
0.805... | ||
>>> npv_score(y_true, y_pred, average='macro') | ||
0.83... | ||
>>> specificity_score(y_true, y_pred, average='micro') | ||
0.83... | ||
>>> npv_score(y_true, y_pred, average='micro') | ||
0.83... | ||
>>> specificity_score(y_true, y_pred, average='weighted') | ||
0.75 | ||
>>> npv_score(y_true, y_pred, average='weighted') | ||
0.805... | ||
|
||
.. _class_likelihood_ratios: | ||
|
||
Class likelihood ratios | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking that we should document the function returning the ratios.
I think that we should move the confusion matrix presentation before the accuracy score that use the TP FP TN FR already. Like this we could document the ratio function just after the confusion matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempted.