Skip to content

Different(wrong?) meaning of pos_label=None #10010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
qinhanmin2014 opened this issue Oct 26, 2017 · 24 comments
Open

Different(wrong?) meaning of pos_label=None #10010

qinhanmin2014 opened this issue Oct 26, 2017 · 24 comments
Labels
Milestone

Comments

@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Oct 26, 2017

Currently, in scikit-learn, we have multiple definitions of pos_label=None and the doc seems not clear, which might confuse users.
(1) precision/recall/f1/fbeta: do not support pos_label=None
(2) brier_score_loss: default pos_label=None, means the greater of two values (the implementation currently under fix by me in #9562)
(3) precision_recall_curve/roc_auc_curve: default pos_label=None, means pos_label=1

According to @jnothman

Actually, I suspect None should mean "the greater of two values" (unless there is only one value present).

possible solutions:
(1)deprecate pos_label=None and change the default value to 1
(2)correct the behaviour of pos_label=None in precision_recall_curve/roc_auc_curve
(3)ignore the difference and simply update the doc

Personally, I advocate for (1), at least in precision_recall_curve/roc_auc_curve, because
(1) It will not influence user's code too much, unless you are using something like y_true = ['a', 'b'] and don't provide a pos_label. In this case, it seems not reasonable to let scikit-learn set pos_label='b' for you.
(2) We don't have a correct implementation of pos_label=None in scikit-learn currently. The meaning of pos_label=None is hard to understand by user and hard to define. If there are two labels, ok, we pick the greater one. How about the situation if there is only one label? Pick that one? I'm afraid it is not always the case ,e.g., when there's only negative samples.

cc @jnothman

@jnothman
Copy link
Member

jnothman commented Nov 1, 2017

It will not influence user's code too much, unless you are using something like y_true = ['a', 'b'] and don't provide a pos_label. In this case, it seems not reasonable to let scikit-learn set pos_label='b' for you.

Well, this implicit choice of "which is positive" is made also by LabelBinarizer (and label_binarize) and binary classifiers' decision_function. I don't think this is unreasonable, but I do think it a bit obscure/magical.

On the basis of "explicit is better than implicit" we should perhaps deprecate its effect on scoring. On the other hand, raising an error when a model is being scored is not very a friendly reaction to a user changing their classification problem from a 1/-1 one to a 'yes'/'no' one. In the case of the PR curve, there is rotational symmetry, so there's not much problem if you get it the wrong way around, as long as you understand which axis is which.

I think overall I'm in support of deprecating pos_label=None for removal.

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks a lot for the reply :)
I'll start with precision_recall_curve/roc_auc_curve to hear from more suggestions. This will not break user's code unless users explicitly set pos_label to None(in this case, we will have a deprecation cycle).
I think I need some time to consider how to handle pos_label in brier_score_loss, I'll temporarily close #9562(ENH and BUG for brier_score_loss) and split/reopen the PR if necessary.

amueller pushed a commit that referenced this issue Jul 16, 2018
…ision_score (#9980)

<!--
Thanks for contributing a pull request! Please ensure you have taken a look at
the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist
-->

#### Reference Issues/PRs
<!--
Example: Fixes #1234. See also #3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull requests
you resolved, so that they will automatically be closed when your pull request
is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests
-->
part of #9829

#### What does this implement/fix? Explain your changes.
(1)add pos_label parameter to average_precision_score (Although we finally decide not to introduce pos_label in roc_auc_score, I think we might need pos_label here. Because there are no relationship between the results if we reverse the true labels, also, precision/recall all support pos_label)
(2)fix a bug where average_precision_score will sometimes return nan when sample_weight contains 0
```python
y_true = np.array([0, 0, 0, 1, 1, 1])
y_score = np.array([0.1, 0.4, 0.85, 0.35, 0.8, 0.9])
average_precision_score(y_true, y_score, sample_weight=[1, 1, 0, 1, 1, 0])
# output:nan
```
I do it here because of (3)
(3)move average_precision scores out of METRIC_UNDEFINED_BINARY (this should contain the regression test for (1) and (2))

Some comments:
(1)For the underlying method(precision_recall_curve), the default value of pos_label is None, but I choose to set the default value of pos_label to 1 because this is what P/R/F is doing. What's more, the meaning of pos_label=None is not clear even in scikit-learn itself (see #10010)
(2)I slightly modified the common test. Currently, the part I modified is only designed for brier_score_loss(I'm doing the same thing in #9562) . I think it is right because as a common test, it seems not good to force metrics to accept str y_true without pos_label.

#### Any other comments?
cc @jnothman Could you please take some time to review or at least judge whether this is the right way to go? Thanks a lot :) 

<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:
http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->
@amueller
Copy link
Member

I recently feel that maybe that the user doing y == pos_label is easier and more explicit in many cases instead of us providing a lot of machinery. I think of the scorers as the primary API (because I like grid-search and CV) and defining the pos-label there requires some extra hoops. We could either try to guess pos_label, which is risky (because we have to guess consistently on the training and test set!) or we could encourage the user to just provide obvious labels...

@qinhanmin2014
Copy link
Member Author

I used to like pos_label, but now I've changed my mind.
pos_label has introduced lots of problems. It's hard to define things like pos_label=None. If we use pos_label=1, then users will need to use make_scorer, and maybe y == pos_label is an easier solution.
I think I need to spare some time to read through all related issues and think about it carefully. Hope that we can figure out a solution for 0.21 (or some of you can make the decision here :)).

@jnothman
Copy link
Member

jnothman commented Oct 14, 2018 via email

@amueller
Copy link
Member

Should this go on the roadmap? I feel like it's a major API issue that's not reflected there too clearly.

@amueller amueller added the API label Mar 18, 2019
@amueller amueller added this to the 1.0 milestone Mar 18, 2019
@qinhanmin2014
Copy link
Member Author

Should this go on the roadmap? I feel like it's a major API issue that's not reflected there too clearly.

Let's deprecate pos_label=None? @amueller If so, it won't be difficult to solve the issue.

I think it's impossible to define pos_label=None, see my previous comment:
The meaning of pos_label=None is hard to understand by user and hard to define. If there are two labels, ok, we pick the greater one. How about the situation if there is only one label? Pick that one? I'm afraid it is not always the case ,e.g., when there's only negative samples.

@jnothman
Copy link
Member

jnothman commented Mar 19, 2019 via email

@ogrisel
Copy link
Member

ogrisel commented Oct 30, 2019

I think we should only do pos_label inference for binary classification with non-ambiguous integer classes, that is {-1, 1} or {0, 1}.

For instance the labels in y_true are string/object valued, we should require the caller to pass an explicit value for pos_labels.

In particular roc_auc_score should not internally label encode y_true otherwise it hides the issue: the positive label depends on the lexicographical ordering of the class label names which is arbitrary and can cause silent and therefore hard to debug statistical problems.

@adrinjalali
Copy link
Member

Moving to the 2.0 milestone. Pinging in case you want to get this in faster.

@lucyleeow
Copy link
Member

lucyleeow commented Oct 24, 2023

Updated summary of pos_label=None meanings:

  1. Does not support pos_label=None:
    • average_precision_score
    • label_binarize / LabelBinarizer
  2. Support for None not documented but technically will work when average != 'binary' and means average score of the 2 binary labels
    • jaccard_score
    • precision_recall_fscore_support / f1_score / fbeta_score / precision_score / recall_score
  3. means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised
    • det_curve
    • calibration_curve
    • CalibrationDisplay.from_predictions
    • precision_recall_curve
    • roc_curve
    • DetCurveDisplay.from_predictions
    • PrecisionRecallDisplay.from_predictions
    • RocCurveDisplay.from_predictions
  4. means the greater of two values
    • CalibrationDisplay.from_estimator (can be string - uses classes[-1], which was sorted via np.unique)
    • brier_score_loss (can't be string)
    • DetCurveDisplay.from_estimator (Can be string. Documented wrong)
    • PrecisionRecallDisplay.from_estimator (can be string)
    • RocCurveDisplay.from_estimator (can be string)

It would be good to make pos_label more consistent.

(2) - the meaning does sort of make sense, as user needs to set average != 'binary'. I think it would be nice to drop support for this, especially as it isn't documented anyway, unless it could cause backwards compatibility problems?
(4) - concern about lexicographical ordering when labels are str has been raised before: #10010 (comment) and #18307) . But since the Display functions are new, maybe there is good reason to use this meaning? Edit: maybe we could change the name of the parameter in these cases as suggested here: #18101 ?

cc @glemaitre @ogrisel ?

@glemaitre
Copy link
Member

Yes. It think that we should make the parameter consistent whenever possible. I also think that we don't have pos_label in confusion_matrix while it could be good to have it for the same reason.

@lucyleeow
Copy link
Member

lucyleeow commented Oct 25, 2023

I propose:

  • Amend items in (4) to mean the same as (3) (pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised)
  • Drop support for items in (2) (will this need a deprecation cycle?)

@glemaitre WDYT? I'm happy implement or we could wait for another dev to weigh in if you want?

@lucyleeow
Copy link
Member

For completeness, here is a summary of private pos_label functions and where they are used:

  • _binary_clf_curve - means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised
    • det_curve / precision_recall_curve / roc_curve
  • _check_pos_label_consistency - means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised
    • _binary_clf_curve
    • calibration_curve
    • brier_score_loss
    • _BinaryClassifierCurveDisplayMixin._validate_from_predictions_params
  • _process_predict_proba / _process_decision_function - support not documented but not prevented
    • _get_response_values
  • _get_response_values (/ _get_response_values_binary) - means classes[-1]
    • CalibrationDisplay
    • DecisionBoundaryDisplay.from_estimator
    • _BaseScorer / _MultimetricScorer
    • DetCurveDisplay.from_estimator / PrecisionRecallDisplay.from_estimator / RocCurveDisplay.from_estimator
    • _BinaryClassifierCurveDisplayMixin._validate_and_get_response_values

Hopefully didn't miss anything.

@lucyleeow
Copy link
Member

After discussion with @glemaitre we have a plan forward (referring to numbered items in above comment):

For (2):

  • Improve documentation and remove all mention of pos_label=None.
  • Remove support for pos_label=None, with a deprecation cycle. This would not remove any functionality:
    • pos_label is ignored if average != 'binary'. When average = 'binary' pos_label=None would give an error.

For (3):

  • Change default to be pos_label=1 (raising error if targets not binary):
    • this would not change any functionality as pos_label=None would result in 1, if targets binary
    • matches functions in (2), where default is also pos_label=1
  • Remove support for pos_label=None with deprecation cycle

For (4):

  • As above, change default to be pos_label=1 (raising error if targets not binary). This would prevent positive label inference from depending on arbitrary lexicographical ordering of the class label.
  • Remove support for pos_label=None with deprecation cycle.

With regard to @ogrisel's comment:

In particular roc_auc_score should not internally label encode y_true

For roc_auc_score, which does not support pos_label param and internally infers the positive label:

  • Add pos_label param, with default of 1 (raising error if targets not binary)

And for completeness, PR that adds pos_label to confusion_matrix: #26839

Thoughts welcome, happy to amend the plan.

@lucyleeow
Copy link
Member

lucyleeow commented Nov 24, 2023

More detailed summary of above: #10010 (comment)

Binary input only

The following accepts binary input only:

  1. None means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised
    • det_curve
    • calibration_curve
    • CalibrationDisplay.from_predictions
    • precision_recall_curve
    • roc_curve
    • DetCurveDisplay.from_predictions
    • PrecisionRecallDisplay.from_predictions
    • RocCurveDisplay.from_predictions
  2. None means the greater of two values
    • CalibrationDisplay.from_estimator (can be string - uses classes[-1], which was sorted via np.unique)
    • brier_score_loss (can't be string)
    • DetCurveDisplay.from_estimator (Can be string. Documented wrong)
    • PrecisionRecallDisplay.from_estimator (can be string)
    • RocCurveDisplay.from_estimator (can be string)

Binary and Multi input

The following accepts multiclass and multilabel input.

The default for the below functions is pos_label=1.

  1. Does not support pos_label=None. Does not have a label parameter.

    • average_precision_score
    • label_binarize / LabelBinarizer
  2. Support for None not documented. Prior to v0.18 you needed to set pos_label=None if targets were binary but you wanted to use average != 'binary'. Now pos_label is just ignored if average != 'binary', so you no longer need to worry about setting pos_label to any specific value. Currently, will allow pos_label=None but pos_label is only used if average='binary' and in this case pos_label=None will raise an error.
    pos_label is a convenience, you can use labels=[pos_label] and average=None to get the same result.

    • jaccard_score
    • precision_recall_fscore_support / f1_score / fbeta_score / precision_score / recall_score

@lucyleeow
Copy link
Member

From #26839 (comment)

I would be in favor of defaulting to None everywhere to make it easy to spot cases where users provide a non-default value in case we need to raise an informative error message.

Looking at the summary above, @ogrisel what about defaulting to pos_label=1 when only binary input allowed, and using pos_label=None when multi input is allowed also. I think pos_label=1 it would aid in reducing any confusion with the ill-defined None option?

cc also @glemaitre

@glemaitre
Copy link
Member

Looking at the summary above, @ogrisel what about defaulting to pos_label=1 when only binary input allowed, and using pos_label=None when multi input is allowed also. I think pos_label=1 it would aid in reducing any confusion with the ill-defined None option?

To understand, you mean to do the same than what we do with None meaning that None means 1 when binary otherwise it is None but the other way around (1 means 1 in binary but None in multioutput).

@lucyleeow
Copy link
Member

lucyleeow commented Nov 27, 2023

No I don't think so. I was not clear.

When input is binary only:

    pos_label : int, float, bool or str, default=1

When input is multi or binary:

    pos_label : int, float, bool or str, default=None
       The positive class when target type is binary. If `None`, `pos_label` defaults to 1,
       only if `y_true` in {-1, 1} or {0, 1}.
       When target type is not binary this parameter is ignored.

@ogrisel
Copy link
Member

ogrisel commented Nov 27, 2023

I think I like #10010 (comment).

Not sure about what to do for methods that only accept binary and multilabel inputs but not multiclass inputs (are there any?).

@lucyleeow
Copy link
Member

lucyleeow commented Nov 28, 2023

Not sure about what to do for methods that only accept binary and multilabel inputs but not multiclass inputs

AFAICT these functions all extend to multilabel and multiclass by doing one-vs-rest (ref: user guide):

  • average_precision_score
  • jaccard_score
  • precision_recall_fscore_support / f1_score / fbeta_score / precision_score / recall_score

And for label_binarize / LabelBinarizer, it will accept for y: "Sequence of integer labels or multilabel data to encode."

So I think all will accept multilabel and multiclass.

@lucyleeow
Copy link
Member

Thinking about this more I am not sure about proposal in #10010 (comment) .

In the case of binary input only (1) where currently:

None means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised.

If we were to change the default to pos_label=1 and error if pos_label not in classes_ - we can't know if the user changed the default, so we cannot error if y_true is NOT in {-1, 1} or {0, 1}. This is not ideal and is an annoying 'silent' change (though we could give a warning).

I now think we could change all behaviour to:

None means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised

It is consistent and works for binary and multi inputs.

@lucyleeow
Copy link
Member

ping @glemaitre @ogrisel (and anyone else)
I bumped into this while working on #30399. I think it would be great to push this forward and get pos_label consistency.

I think my last comment stands, and behaviour of:

None means pos_label=1 if y_true is in {-1, 1} or {0, 1}, otherwise error raised

is what I would hope for (due to not being able to know when pos_label=1 due to it being left as the default or when user picked it, see #10010 (comment)) but also happy for any consistency.

@glemaitre
Copy link
Member

I agree with this consistency of pos_label=None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants