ENH add pos_label to confusion_matrix #26839

glemaitre · 2023-07-14T23:21:59Z

While working on #26120, we figure out that we don't have a pos_label parameter in the confusion_matrix function. To follow the documentation and since we don't report the column/index name, we should not change the positions of the TP/TN/FP/FN.

pos_label allows flipping the matrix in case this is not the expected positive label.

I also added the pos_label to the ConfusionMatrixDisplay to be complete.

github-actions · 2023-07-14T23:23:05Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 74a083c. Link to the linter CI: here}

adrinjalali

LGTM.

sklearn/metrics/_classification.py

glemaitre · 2023-07-15T09:59:47Z

And the current CIs failure shows that we currently badly handle the ambiguity ;)

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

…nto confusion_matrix_pos_label

glemaitre · 2023-07-16T10:21:13Z

We should probably make a similar change to the class_likelihood_ratios class as well.

OmarManzoor

Thanks for the PR @glemaitre. Just a few minor changes mostly related to the docs otherwise looks good overall.

doc/whats_new/v1.4.rst

sklearn/metrics/_classification.py

sklearn/metrics/_plot/confusion_matrix.py

sklearn/metrics/_classification.py

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

sklearn/metrics/_classification.py

adrinjalali

Could we also have a test to check that passing pos_label correctly inverts the output matrix?

glemaitre · 2023-08-12T10:50:37Z

Could we also have a test to check that passing pos_label correctly inverts the output matrix?

This test is already integrated with some parametrization in others, e.g.

https://github.com/scikit-learn/scikit-learn/pull/26839/files#diff-a76654897c4d4bc4ed46a93a74acd44319aac42e96411ec816220e09d897e493R468-R475

thomasjpfan

Overall, I am concerned with all the extra calls to _check_targets, which adds more calls to unique.

thomasjpfan · 2023-07-16T15:18:13Z

doc/whats_new/v1.4.rst

+:mod:`sklearn.pipeline`
+.......................
+
+- |Feature| :class:`pipeline.Pipeline` now supports metadata routing according


Looks like a merge error.

thomasjpfan · 2023-08-15T15:52:27Z

sklearn/metrics/_classification.py

@@ -670,7 +698,17 @@ class labels [2]_.
    .. [3] `Wikipedia entry for the Cohen's kappa
            <https://en.wikipedia.org/wiki/Cohen%27s_kappa>`_.
    """
-    confusion = confusion_matrix(y1, y2, labels=labels, sample_weight=sample_weight)
+    y_type, y1, y2 = _check_targets(y1, y2)


Since _check_targets call unique, this adds more overhead to the computation.

REF: #26820

I don't know how to go around that. Somehow having private function to call could help and passing the unique label could help but I would not block this PR for it.

…label

glemaitre · 2023-11-03T10:29:29Z

pinging @OmarManzoor, @thomasjpfan, and @lucyleeow on this one.

lucyleeow · 2023-11-09T05:49:35Z

Sorry mis-click.

In light of #10010 (comment), do we want to set default to pos_label=1 and not use None ? This would be inline with precision_recall_fscore_support.

glemaitre · 2023-11-09T10:17:49Z

sklearn/metrics/_classification.py

+
+    if y_type == "binary":
+        pos_label = _check_pos_label_consistency(pos_label, y_true)
+    elif pos_label is not None:


I am wondering if we are not shooting in our own foot with this ValueError.

If we change the default pos_label=1 then we will by default raise this error whenever y_type != "binary". If we choose, pos_label=1 as a default, then we need to have the same strategy than the precision recall meaning that we need to ignore pos_label.

Am I right @lucyleeow or I missed something?

Another advantages to implicitely ignoring pos_label is that we don't need anymore to call _check_target and thus the unique function because we can pass the default value most of the time.

hen we will by default raise this error whenever y_type != "binary"

I think we could set default to 1 and change this to:

Suggested change

elif pos_label is not None:

elif pos_label != 1:

_check_set_wise_labels does similar checking:

scikit-learn/sklearn/metrics/_classification.py

Lines 1549 to 1556 in e5178c5

elif pos_label not in (None, 1):

warnings.warn(

"Note that pos_label (set to %r) is ignored when "

"average != 'binary' (got %r). You may use "

"labels=[pos_label] to specify a single positive class."

% (pos_label, average),

UserWarning,

)

the precision_recall_fscore_support functions use it and default is pos_label=1 (None option is allowed byt not documented)

(I may have gotten something wrong, this stuff gets confusing)

I realise you could technically just flip the labels by using the 'reorder' functionality of labels though it is less clear and not consistent with other places where we use pos_label for binary case.

'reorder' functionality of labels

Indeed but once I found this out, I was surprise because the API is kind of not consistent. I find it a bit better to have consistency across metrics if possible.

I think we could set default to 1 and change this to:

The problem is that you let the case where someone pass pos_label=1 explicitly in case other than "binary". So if we skip this test, I would rather prefer to be lenient as in other metrics.

do you mean you would not raise an error if pos_label is not the default value and it is not the binary case?

Yes, no raising any error.

Options:

default is None and mention that it is equivalent to 1 in binary case

default is 1 and completely ignore it in cases other than binary classification (do not raise error/warning)

default is 1, in case off binary classification, raise warning if pos_label is not the default

Now I do not know what my preference would be. I would think that it is okay to raise a warning when pos_label is not default but target is not binary. We do miss the case when the user wants pos_label to be default. (1) has the problem of pos_label=None being ill-defined.

But maybe more opinions are needed here. I will try to summarise in #10010

Thanks. I will try to ping more dev in #10010 such that we can settle on a solution.

I would be in favor of defaulting to None everywhere to make it easy to spot cases where users provide a non-default value in case we need to raise an informative error message.

For the specific case of confusion_matrix I think it we should not pass a pos_label as the matrix is always defined irrespective of what we consider positive or not (it does not care). What matters would be to pass the labels=class_labels when calling confusion_matrix indirectly from a public function such as f1_score which itself asks the user to specify a pos_label, so that there is no ambiguity about the meaning of the rows and columns of the CM.

To make this more efficient, I think type_of_target could be extended to also have an alternative type_of_target_and_class_labels to avoid computing xp.unique many times.

Also, when y is a dataframe or series we should use y.unique to make this more efficient (pandas does not need to sort the data to compute the unique values).

glemaitre · 2023-11-09T10:21:19Z

sklearn/metrics/_classification.py

@@ -1921,11 +1968,18 @@ class after being classified as negative. This is the case when the
            f"problems, got targets of type: {y_type}"
        )

+    if labels is None:
+        classes = np.unique(y_true)


Here, we should probably call _unique instead just to be sure.
I assume that we could introduce a pos_label as well.

ENH add pos_label to confusion_matrix

15b8b98

github-actions bot added the module:metrics label Jul 14, 2023

update changelog

1ad6db8

adrinjalali approved these changes Jul 15, 2023

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

glemaitre and others added 6 commits July 15, 2023 11:59

Update sklearn/metrics/_classification.py

6a05658

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

iter

72767e7

Merge remote-tracking branch 'glemaitre/confusion_matrix_pos_label' i…

a65c6a4

…nto confusion_matrix_pos_label

iter

47b0f33

iter

6ca1b39

iter

93b7936

improve coverage

7958dd5

OmarManzoor reviewed Jul 19, 2023

View reviewed changes

Apply suggestions from code review

d39ef7a

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

glemaitre commented Jul 24, 2023

View reviewed changes

sklearn/metrics/_classification.py Show resolved Hide resolved

Update sklearn/metrics/_classification.py

3d6b388

adrinjalali reviewed Jul 24, 2023

View reviewed changes

sklearn/metrics/_classification.py Show resolved Hide resolved

sklearn/metrics/_classification.py Show resolved Hide resolved

adrinjalali reviewed Jul 31, 2023

View reviewed changes

Merge branch 'main' into confusion_matrix_pos_label

4bc2abd

thomasjpfan reviewed Aug 15, 2023

View reviewed changes

lucyleeow mentioned this pull request Nov 3, 2023

Different(wrong?) meaning of pos_label=None #10010

Open

Merge remote-tracking branch 'origin/main' into confusion_matrix_pos_…

74a083c

…label

lucyleeow closed this Nov 9, 2023

lucyleeow reopened this Nov 9, 2023

glemaitre commented Nov 9, 2023

View reviewed changes

	elif pos_label not in (None, 1):
	warnings.warn(
	"Note that pos_label (set to %r) is ignored when "
	"average != 'binary' (got %r). You may use "
	"labels=[pos_label] to specify a single positive class."
	% (pos_label, average),
	UserWarning,
	)

Uh oh!

ENH add pos_label to confusion_matrix #26839

Are you sure you want to change the base?

ENH add pos_label to confusion_matrix #26839

Uh oh!

Conversation

glemaitre commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Jul 15, 2023

Uh oh!

glemaitre commented Jul 16, 2023

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Aug 12, 2023

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Nov 3, 2023

Uh oh!

lucyleeow commented Nov 9, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Jul 14, 2023 •

edited

Loading

github-actions bot commented Jul 14, 2023 •

edited

Loading