Different(wrong?) meaning of pos_label=None #10010

qinhanmin2014 · 2017-10-26T08:08:48Z

Currently, in scikit-learn, we have multiple definitions of pos_label=None and the doc seems not clear, which might confuse users.
(1) precision/recall/f1/fbeta: do not support pos_label=None
(2) brier_score_loss: default pos_label=None, means the greater of two values (the implementation currently under fix by me in #9562)
(3) precision_recall_curve/roc_auc_curve: default pos_label=None, means pos_label=1

According to @jnothman

Actually, I suspect None should mean "the greater of two values" (unless there is only one value present).

possible solutions:
(1)deprecate pos_label=None and change the default value to 1
(2)correct the behaviour of pos_label=None in precision_recall_curve/roc_auc_curve
(3)ignore the difference and simply update the doc

Personally, I advocate for (1), at least in precision_recall_curve/roc_auc_curve, because
(1) It will not influence user's code too much, unless you are using something like y_true = ['a', 'b'] and don't provide a pos_label. In this case, it seems not reasonable to let scikit-learn set pos_label='b' for you.
(2) We don't have a correct implementation of pos_label=None in scikit-learn currently. The meaning of pos_label=None is hard to understand by user and hard to define. If there are two labels, ok, we pick the greater one. How about the situation if there is only one label? Pick that one? I'm afraid it is not always the case ,e.g., when there's only negative samples.

cc @jnothman

jnothman · 2017-11-01T23:37:04Z

It will not influence user's code too much, unless you are using something like y_true = ['a', 'b'] and don't provide a pos_label. In this case, it seems not reasonable to let scikit-learn set pos_label='b' for you.

Well, this implicit choice of "which is positive" is made also by LabelBinarizer (and label_binarize) and binary classifiers' decision_function. I don't think this is unreasonable, but I do think it a bit obscure/magical.

On the basis of "explicit is better than implicit" we should perhaps deprecate its effect on scoring. On the other hand, raising an error when a model is being scored is not very a friendly reaction to a user changing their classification problem from a 1/-1 one to a 'yes'/'no' one. In the case of the PR curve, there is rotational symmetry, so there's not much problem if you get it the wrong way around, as long as you understand which axis is which.

I think overall I'm in support of deprecating pos_label=None for removal.

qinhanmin2014 · 2017-11-02T01:48:00Z

@jnothman Thanks a lot for the reply :)
I'll start with precision_recall_curve/roc_auc_curve to hear from more suggestions. This will not break user's code unless users explicitly set pos_label to None(in this case, we will have a deprecation cycle).
I think I need some time to consider how to handle pos_label in brier_score_loss, I'll temporarily close #9562(ENH and BUG for brier_score_loss) and split/reopen the PR if necessary.

@jnothman

…ision_score (#9980)  #### Reference Issues/PRs  part of #9829 #### What does this implement/fix? Explain your changes. (1)add pos_label parameter to average_precision_score (Although we finally decide not to introduce pos_label in roc_auc_score, I think we might need pos_label here. Because there are no relationship between the results if we reverse the true labels, also, precision/recall all support pos_label) (2)fix a bug where average_precision_score will sometimes return nan when sample_weight contains 0 ```python y_true = np.array([0, 0, 0, 1, 1, 1]) y_score = np.array([0.1, 0.4, 0.85, 0.35, 0.8, 0.9]) average_precision_score(y_true, y_score, sample_weight=[1, 1, 0, 1, 1, 0]) # output:nan ``` I do it here because of (3) (3)move average_precision scores out of METRIC_UNDEFINED_BINARY (this should contain the regression test for (1) and (2)) Some comments: (1)For the underlying method(precision_recall_curve), the default value of pos_label is None, but I choose to set the default value of pos_label to 1 because this is what P/R/F is doing. What's more, the meaning of pos_label=None is not clear even in scikit-learn itself (see #10010) (2)I slightly modified the common test. Currently, the part I modified is only designed for brier_score_loss(I'm doing the same thing in #9562) . I think it is right because as a common test, it seems not good to force metrics to accept str y_true without pos_label. #### Any other comments? cc @jnothman Could you please take some time to review or at least judge whether this is the right way to go? Thanks a lot :)

amueller · 2018-10-10T04:11:55Z

I recently feel that maybe that the user doing y == pos_label is easier and more explicit in many cases instead of us providing a lot of machinery. I think of the scorers as the primary API (because I like grid-search and CV) and defining the pos-label there requires some extra hoops. We could either try to guess pos_label, which is risky (because we have to guess consistently on the training and test set!) or we could encourage the user to just provide obvious labels...

qinhanmin2014 · 2018-10-12T15:14:04Z

I used to like pos_label, but now I've changed my mind.
pos_label has introduced lots of problems. It's hard to define things like pos_label=None. If we use pos_label=1, then users will need to use make_scorer, and maybe y == pos_label is an easier solution.
I think I need to spare some time to read through all related issues and think about it carefully. Hope that we can figure out a solution for 0.21 (or some of you can make the decision here :)).

jnothman · 2018-10-14T02:54:41Z

What I think we would benefit from is a tool that constructed a bunch of scorers for a task, returning a mapping that could be used directly in scoring. It gets around the user having to specify things like labels and pos_label while giving other advantages. The issue of pos_label=None is related to that of passing around classes/labels. It generally relates to the need to pass around metadata, or the problems of inferring things about the task from the data.

amueller · 2019-03-18T18:29:37Z

Should this go on the roadmap? I feel like it's a major API issue that's not reflected there too clearly.

qinhanmin2014 · 2019-03-19T03:44:54Z

Should this go on the roadmap? I feel like it's a major API issue that's not reflected there too clearly.

Let's deprecate pos_label=None? @amueller If so, it won't be difficult to solve the issue.

I think it's impossible to define pos_label=None, see my previous comment:
The meaning of pos_label=None is hard to understand by user and hard to define. If there are two labels, ok, we pick the greater one. How about the situation if there is only one label? Pick that one? I'm afraid it is not always the case ,e.g., when there's only negative samples.

jnothman · 2019-03-19T12:12:59Z

I feel like it's a major API issue that's not reflected there too clearly.

Please clarify what you think the API issue is @amueller.

ogrisel · 2019-10-30T14:40:40Z

I think we should only do pos_label inference for binary classification with non-ambiguous integer classes, that is {-1, 1} or {0, 1}.

For instance the labels in y_true are string/object valued, we should require the caller to pass an explicit value for pos_labels.

In particular roc_auc_score should not internally label encode y_true otherwise it hides the issue: the positive label depends on the lexicographical ordering of the class label names which is arbitrary and can cause silent and therefore hard to debug statistical problems.

adrinjalali · 2021-08-22T09:06:05Z

Moving to the 2.0 milestone. Pinging in case you want to get this in faster.

lucyleeow · 2023-10-24T04:57:15Z

Updated summary of pos_label=None meanings:

Does not support pos_label=None:
- average_precision_score
- label_binarize / LabelBinarizer
Support for None not documented but technically will work when average != 'binary' and means average score of the 2 binary labels
- jaccard_score
- precision_recall_fscore_support / f1_score / fbeta_score / precision_score / recall_score
means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised
- det_curve
- calibration_curve
- CalibrationDisplay.from_predictions
- precision_recall_curve
- roc_curve
- DetCurveDisplay.from_predictions
- PrecisionRecallDisplay.from_predictions
- RocCurveDisplay.from_predictions
means the greater of two values
- CalibrationDisplay.from_estimator (can be string - uses classes[-1], which was sorted via np.unique)
- brier_score_loss (can't be string)
- DetCurveDisplay.from_estimator (Can be string. Documented wrong)
- PrecisionRecallDisplay.from_estimator (can be string)
- RocCurveDisplay.from_estimator (can be string)

It would be good to make pos_label more consistent.

(2) - the meaning does sort of make sense, as user needs to set average != 'binary'. I think it would be nice to drop support for this, especially as it isn't documented anyway, unless it could cause backwards compatibility problems?
(4) - concern about lexicographical ordering when labels are str has been raised before: #10010 (comment) and #18307) . But since the Display functions are new, maybe there is good reason to use this meaning? Edit: maybe we could change the name of the parameter in these cases as suggested here: #18101 ?

cc @glemaitre @ogrisel ?

glemaitre · 2023-10-24T15:28:41Z

Yes. It think that we should make the parameter consistent whenever possible. I also think that we don't have pos_label in confusion_matrix while it could be good to have it for the same reason.

lucyleeow · 2023-10-25T00:22:18Z

I propose:

Amend items in (4) to mean the same as (3) (pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised)
Drop support for items in (2) (will this need a deprecation cycle?)

@glemaitre WDYT? I'm happy implement or we could wait for another dev to weigh in if you want?

lucyleeow · 2023-11-02T05:58:27Z

For completeness, here is a summary of private pos_label functions and where they are used:

_binary_clf_curve - means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised
- det_curve / precision_recall_curve / roc_curve
_check_pos_label_consistency - means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised
- _binary_clf_curve
- calibration_curve
- brier_score_loss
- _BinaryClassifierCurveDisplayMixin._validate_from_predictions_params
_process_predict_proba / _process_decision_function - support not documented but not prevented
- _get_response_values
_get_response_values (/ _get_response_values_binary) - means classes[-1]
- CalibrationDisplay
- DecisionBoundaryDisplay.from_estimator
- _BaseScorer / _MultimetricScorer
- DetCurveDisplay.from_estimator / PrecisionRecallDisplay.from_estimator / RocCurveDisplay.from_estimator
- _BinaryClassifierCurveDisplayMixin._validate_and_get_response_values

Hopefully didn't miss anything.

lucyleeow · 2023-11-03T03:13:59Z

After discussion with @glemaitre we have a plan forward (referring to numbered items in above comment):

For (2):

Improve documentation and remove all mention of pos_label=None.
Remove support for pos_label=None, with a deprecation cycle. This would not remove any functionality:
- pos_label is ignored if average != 'binary'. When average = 'binary' pos_label=None would give an error.

For (3):

Change default to be pos_label=1 (raising error if targets not binary):
- this would not change any functionality as pos_label=None would result in 1, if targets binary
- matches functions in (2), where default is also pos_label=1
Remove support for pos_label=None with deprecation cycle

For (4):

As above, change default to be pos_label=1 (raising error if targets not binary). This would prevent positive label inference from depending on arbitrary lexicographical ordering of the class label.
Remove support for pos_label=None with deprecation cycle.

With regard to @ogrisel's comment:

In particular roc_auc_score should not internally label encode y_true

For roc_auc_score, which does not support pos_label param and internally infers the positive label:

Add pos_label param, with default of 1 (raising error if targets not binary)

And for completeness, PR that adds pos_label to confusion_matrix: #26839

Thoughts welcome, happy to amend the plan.

lucyleeow · 2023-11-24T23:19:22Z

More detailed summary of above: #10010 (comment)

Binary input only

The following accepts binary input only:

None means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised
- det_curve
- calibration_curve
- CalibrationDisplay.from_predictions
- precision_recall_curve
- roc_curve
- DetCurveDisplay.from_predictions
- PrecisionRecallDisplay.from_predictions
- RocCurveDisplay.from_predictions
None means the greater of two values
- CalibrationDisplay.from_estimator (can be string - uses classes[-1], which was sorted via np.unique)
- brier_score_loss (can't be string)
- DetCurveDisplay.from_estimator (Can be string. Documented wrong)
- PrecisionRecallDisplay.from_estimator (can be string)
- RocCurveDisplay.from_estimator (can be string)

Binary and Multi input

The following accepts multiclass and multilabel input.

The default for the below functions is pos_label=1.

Does not support pos_label=None. Does not have a label parameter.
- average_precision_score
- label_binarize / LabelBinarizer
Support for None not documented. Prior to v0.18 you needed to set pos_label=None if targets were binary but you wanted to use average != 'binary'. Now pos_label is just ignored if average != 'binary', so you no longer need to worry about setting pos_label to any specific value. Currently, will allow pos_label=None but pos_label is only used if average='binary' and in this case pos_label=None will raise an error.
pos_label is a convenience, you can use labels=[pos_label] and average=None to get the same result.
- jaccard_score
- precision_recall_fscore_support / f1_score / fbeta_score / precision_score / recall_score

lucyleeow · 2023-11-24T23:25:07Z

From #26839 (comment)

I would be in favor of defaulting to None everywhere to make it easy to spot cases where users provide a non-default value in case we need to raise an informative error message.

Looking at the summary above, @ogrisel what about defaulting to pos_label=1 when only binary input allowed, and using pos_label=None when multi input is allowed also. I think pos_label=1 it would aid in reducing any confusion with the ill-defined None option?

cc also @glemaitre

glemaitre · 2023-11-27T07:29:49Z

Looking at the summary above, @ogrisel what about defaulting to pos_label=1 when only binary input allowed, and using pos_label=None when multi input is allowed also. I think pos_label=1 it would aid in reducing any confusion with the ill-defined None option?

To understand, you mean to do the same than what we do with None meaning that None means 1 when binary otherwise it is None but the other way around (1 means 1 in binary but None in multioutput).

lucyleeow · 2023-11-27T10:19:13Z

No I don't think so. I was not clear.

When input is binary only:

    pos_label : int, float, bool or str, default=1

When input is multi or binary:

    pos_label : int, float, bool or str, default=None
       The positive class when target type is binary. If `None`, `pos_label` defaults to 1,
       only if `y_true` in {-1, 1} or {0, 1}.
       When target type is not binary this parameter is ignored.

ogrisel · 2023-11-27T17:21:01Z

I think I like #10010 (comment).

Not sure about what to do for methods that only accept binary and multilabel inputs but not multiclass inputs (are there any?).

lucyleeow · 2023-11-28T01:39:34Z

Not sure about what to do for methods that only accept binary and multilabel inputs but not multiclass inputs

AFAICT these functions all extend to multilabel and multiclass by doing one-vs-rest (ref: user guide):

average_precision_score
jaccard_score
precision_recall_fscore_support / f1_score / fbeta_score / precision_score / recall_score

And for label_binarize / LabelBinarizer, it will accept for y: "Sequence of integer labels or multilabel data to encode."

So I think all will accept multilabel and multiclass.

lucyleeow · 2023-12-06T22:21:32Z

Thinking about this more I am not sure about proposal in #10010 (comment) .

In the case of binary input only (1) where currently:

None means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised.

If we were to change the default to pos_label=1 and error if pos_label not in classes_ - we can't know if the user changed the default, so we cannot error if y_true is NOT in {-1, 1} or {0, 1}. This is not ideal and is an annoying 'silent' change (though we could give a warning).

I now think we could change all behaviour to:

None means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised

It is consistent and works for binary and multi inputs.

lucyleeow · 2025-01-07T02:48:06Z

ping @glemaitre @ogrisel (and anyone else)
I bumped into this while working on #30399. I think it would be great to push this forward and get pos_label consistency.

I think my last comment stands, and behaviour of:

None means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised

is what I would hope for (due to not being able to know when pos_label=1 due to it being left as the default or when user picked it, see #10010 (comment)) but also happy for any consistency.

glemaitre · 2025-03-01T21:55:48Z

I agree with this consistency of pos_label=None.

qinhanmin2014 mentioned this issue Oct 28, 2017

[MRG+2] ENH&BUG Add pos_label parameter and fix a bug in average_precision_score #9980

Merged

qinhanmin2014 mentioned this issue Feb 1, 2018

[MRG] Fixed brier_score_loss function when y_true only has one label #8459

Closed

qinhanmin2014 mentioned this issue Aug 18, 2018

[MRG + 1] brier_score_loss returns incorrect value when all y_true values are True/1 #9301

Closed

qinhanmin2014 mentioned this issue Oct 6, 2018

average_precision_score breaks on string labels #12312

Closed

amueller added the API label Mar 18, 2019

amueller added this to the 1.0 milestone Mar 18, 2019

qinhanmin2014 mentioned this issue Oct 28, 2019

plot_roc_curve doesn't correctly infer pos_label #15303

Closed

qinhanmin2014 mentioned this issue Oct 31, 2019

Improve error message with invalid pos_label in plot_roc_curve and implicit pos_label in precision_recall_curve #15405

Closed

lucyleeow mentioned this issue Aug 31, 2020

RFC: remove support of pos_label inference with str in brier_scorer_label #18307

Open

adrinjalali modified the milestones: 1.0, 2.0 Aug 22, 2021

EJueon mentioned this issue Nov 25, 2022

AUPRC divide 0로 인한 NaN 수치 발생 에러 boostcampaitech4lv23nlp2/level2_klue_nlp-level2-nlp-10#21

Closed

lucyleeow added this to Quansight's scikit-learn Project Board Oct 25, 2023

github-project-automation bot moved this to Todo📬 in Quansight's scikit-learn Project Board Oct 25, 2023

lucyleeow moved this from Todo📬 to In Progress🏗 in Quansight's scikit-learn Project Board Oct 25, 2023

lucyleeow mentioned this issue Oct 31, 2023

DOC Remove None option in pos_label in f1_score #27635

Closed

This was referenced Nov 3, 2023

DOC Improve pos_label and labels in precison/recall/f1 and jaccard #27714

Merged

ENH add pos_label to confusion_matrix #26839

Open

MAINT Deprecate None option in pos_label for precison/recall/f1 and jaccard metrics #27762

Closed

glemaitre mentioned this issue Mar 22, 2024

FEA add TunedThresholdClassifier meta-estimator to post-tune the cut-off threshold #26120

Merged

Uh oh!

Different(wrong?) meaning of pos_label=None #10010

Different(wrong?) meaning of pos_label=None #10010

Comments

qinhanmin2014 commented Oct 26, 2017 • edited by lucyleeow Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

jnothman commented Nov 1, 2017

Uh oh!

qinhanmin2014 commented Nov 2, 2017

Uh oh!

amueller commented Oct 10, 2018

Uh oh!

qinhanmin2014 commented Oct 12, 2018

Uh oh!

jnothman commented Oct 14, 2018 via email

Uh oh!

amueller commented Mar 18, 2019

Uh oh!

qinhanmin2014 commented Mar 19, 2019

Uh oh!

jnothman commented Mar 19, 2019 via email

Uh oh!

ogrisel commented Oct 30, 2019

Uh oh!

adrinjalali commented Aug 22, 2021

Uh oh!

lucyleeow commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Oct 24, 2023

Uh oh!

lucyleeow commented Oct 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucyleeow commented Nov 2, 2023

Uh oh!

lucyleeow commented Nov 3, 2023

Uh oh!

lucyleeow commented Nov 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary input only

Binary and Multi input

Uh oh!

lucyleeow commented Nov 24, 2023

Uh oh!

glemaitre commented Nov 27, 2023

Uh oh!

lucyleeow commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Nov 27, 2023

Uh oh!

lucyleeow commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucyleeow commented Dec 6, 2023

Uh oh!

lucyleeow commented Jan 7, 2025

Uh oh!

glemaitre commented Mar 1, 2025

Uh oh!

qinhanmin2014 commented Oct 26, 2017 •

edited by lucyleeow

Loading

lucyleeow commented Oct 24, 2023 •

edited

Loading

lucyleeow commented Oct 25, 2023 •

edited

Loading

lucyleeow commented Nov 24, 2023 •

edited

Loading

lucyleeow commented Nov 27, 2023 •

edited

Loading

lucyleeow commented Nov 28, 2023 •

edited

Loading