Skip to content

[MRG] Improvement and bug fix for brier_score_loss #9562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed

[MRG] Improvement and bug fix for brier_score_loss #9562

wants to merge 12 commits into from

Conversation

qinhanmin2014
Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Aug 16, 2017

Reference Issue

Finish up #9521

What does this implement/fix? Explain your changes.

(1)raise an error for multiclass y_true
(2)raise an error for invalid pos_label
(3)fix an old bug when y_true only has one label (closes #8459, closes #9300, closes #9301)

Any other comments?

@qinhanmin2014 qinhanmin2014 changed the title [WIP] Improvement and bug fix for brier_score_loss [MRG] Improvement and bug fix for brier_score_loss Aug 16, 2017
if (np.array_equal(classes, [0]) or
np.array_equal(classes, [-1]) or
np.array_equal(classes, [1])):
pos_label = 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this treatment from logloss? it seems a bit magical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller Thanks.
Currently, log_loss don't support pos_label so the code is taken from _binary_clf_curve. But in the test we have to support situations when y_pred = ['egg', 'spam'] without a pos_label (see test_invariance_string_vs_numbers_labels). In order not to remove test, I give this awkward solution.
Btw, it might be OK not to use float like _binary_clf_curve because np.array_equal([1.0], [1])=True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. That test uses pos_label right?

@qinhanmin2014
Copy link
Member Author

ping @jnothman @amueller Thanks. :)

@qinhanmin2014
Copy link
Member Author

ping @jnothman @amueller This pull request is based on your suggestions in #9521, could you kindly please review it? Thanks :)

@jnothman
Copy link
Member

Patience... I'm mostly doing this on volunteer time, and Andy's had some presentations to give. Thanks @qinhanmin2014 for your very good work.

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks. Kindly give me some suggestions when you have time and I'll improve accordingly as soon as possible.

if len(classes) > 2:
raise ValueError("Only binary classification is supported.")

# ensure valid y_true if pos_label is not specified
if pos_label is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why instead don't you put this logic in _check_binary_probabilistic_predictions? Surely it (if not pos_label, at least the handling of a single class) applies there too.

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks. _check_binary_probabilistic_predictions already has enough logic so I think you're asking me to use the function directly. The previous implementation use the function after y_true is binarized(y_true = np.array(y_true == pos_label, int)). This is why we accept multiclass classification previously. Now the issue fix two bugs along with an improvement. I've updated the main content to provide more information.

_check_binary_probabilistic_predictions(y_true, y_prob)

# ensure valid y_true if pos_label is not specified
classes = np.unique(y_true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that this logic could move to the helper, so that other binary probabilistic metrics would similarly identify -1 alone as the negative class, and to avoid rubbing unique multiple times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman Sorry but I don't quite understand. This part is about giving pos_label the right value and raise an error when needed. Currently, _check_binary_probabilistic_predictions don't accept pos_label as a parameter. So it might be hard to move the logic into _check_binary_probabilistic_predictions .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After determining pos_label's appropriate value, it is used once: to binarise y_true into {0,1}. This is also what _check_binary_probabilistic_predictions does, only without the ability to configure pos_label. If _check_binary_probabilistic_predictions took an optional pos_label you could handle all cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman Sorry to disturb again but I have some questions:
(1)It seems that _check_binary_probabilistic_predictions is only used in brier_score_loss(metrics.classification) and calibration_curve(calibration).
The latter do not support pos_label.
(2)I am wondering how we should took the optional pos_label since even if we support pos_label, there are cases when pos_label is not provided by users(default to None).
It seems that I cannot use def(..., pos_label=None) and it seems ugly to use something like def(..., **dict) .
(3)From my perspective, since _check_binary_probabilistic_predictions is only used in two functions and in each of these two functions, some part of it is meaningless(e.g., label_binarize in brier_score_loss and the check of y_prob in calibration_curve), how about remove the function(no test directly use the function)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why can't you use def _check_binary_probabilistic_predictions(y_true, y_prob, pos_label=None)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman For example, in the function, if pos_label=None, there are two situations
(1)I don't pass pos_label(used in calibration)
(2)I pass pos_label=None(used in brier_score_loss when pos_label is not provided)
How can I distinguish them? Thanks. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to distinguish them? Should they not be treated the same?

@qinhanmin2014 qinhanmin2014 changed the title [MRG] Improvement and bug fix for brier_score_loss [MRG] Raise an error for multiclass y_true in brier_score_loss Aug 30, 2017
@qinhanmin2014
Copy link
Member Author

@jnothman @amueller
Sorry for wasting your time but I think fixing too many bugs in a pull request make me a bit confused. I just searched for scikit-learn and find other pull requests(#8459, #9301) addressing the bug related to single class in brier. So I make this pull request more focused. If needed, I'll propose more pull requests for the rest of my work.

@@ -1912,8 +1912,10 @@ def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None):
assert_all_finite(y_true)
assert_all_finite(y_prob)

# currently, we only support binary classification
_check_binary_probabilistic_predictions(y_true, y_prob)

if pos_label is None:
Copy link
Member

@jnothman jnothman Sep 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two lines are unnecessary if you just set y_true to the output of _check_binary_probabilistic_predictions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman Sorry but I don't understand your meaning. Which two lines? The core issue I want to solve here is that previous implementation use the function after y_true is binarized. Kindly provide more details please. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I realise again that you're not using _check_binary_probabilistic_predictions because it does not support pos_label. But it probably should.

My issue is that _check_binary_probabilistic_predictions does processing and returns something. It's weird not to use that return value.

I think I would add to _check_binary_probabilistic_predictions:

out = label_binarize...
if pos_label is not None:
    if pos_label != labels[-1]:
        if pos_label != labels[0]:
            raise ValueError...
        out = 1 - out

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks a lot for your detailed suggestions :)
Now the pull request
(1)raise an error for multiclass y_true
(2)raise an error for invalid pos_label
(3)fix an old bug when y_true only has one label (#8459, #9300, #9301)

@qinhanmin2014
Copy link
Member Author

ping @jnothman only for the existance of the old PR

@jnothman Thanks a lot for your detailed suggestions :)
Now the pull request
(1)raise an error for multiclass y_true
(2)raise an error for invalid pos_label
(3)fix an old bug when y_true only has one label (#8459, #9300, #9301)

@qinhanmin2014 qinhanmin2014 changed the title [MRG] Raise an error for multiclass y_true in brier_score_loss [MRG] Improve and bug fix for brier_score_loss Oct 15, 2017
@qinhanmin2014 qinhanmin2014 changed the title [MRG] Improve and bug fix for brier_score_loss [MRG] Improvement and bug fix for brier_score_loss Oct 15, 2017
@qinhanmin2014
Copy link
Member Author

ping @jnothman, over a month since the last reply. Could you please give a further review on it? Thanks a lot :)

@jnothman
Copy link
Member

jnothman commented Oct 17, 2017 via email

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks a lot :) I'll wait.

@qinhanmin2014
Copy link
Member Author

Temporarily closing to consider the solution. Will split/reopen the PR if necessary.

amueller pushed a commit that referenced this pull request Jul 16, 2018
…ision_score (#9980)

<!--
Thanks for contributing a pull request! Please ensure you have taken a look at
the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist
-->

#### Reference Issues/PRs
<!--
Example: Fixes #1234. See also #3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull requests
you resolved, so that they will automatically be closed when your pull request
is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests
-->
part of #9829

#### What does this implement/fix? Explain your changes.
(1)add pos_label parameter to average_precision_score (Although we finally decide not to introduce pos_label in roc_auc_score, I think we might need pos_label here. Because there are no relationship between the results if we reverse the true labels, also, precision/recall all support pos_label)
(2)fix a bug where average_precision_score will sometimes return nan when sample_weight contains 0
```python
y_true = np.array([0, 0, 0, 1, 1, 1])
y_score = np.array([0.1, 0.4, 0.85, 0.35, 0.8, 0.9])
average_precision_score(y_true, y_score, sample_weight=[1, 1, 0, 1, 1, 0])
# output:nan
```
I do it here because of (3)
(3)move average_precision scores out of METRIC_UNDEFINED_BINARY (this should contain the regression test for (1) and (2))

Some comments:
(1)For the underlying method(precision_recall_curve), the default value of pos_label is None, but I choose to set the default value of pos_label to 1 because this is what P/R/F is doing. What's more, the meaning of pos_label=None is not clear even in scikit-learn itself (see #10010)
(2)I slightly modified the common test. Currently, the part I modified is only designed for brier_score_loss(I'm doing the same thing in #9562) . I think it is right because as a common test, it seems not good to force metrics to accept str y_true without pos_label.

#### Any other comments?
cc @jnothman Could you please take some time to review or at least judge whether this is the right way to go? Thanks a lot :) 

<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:
http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->
@qinhanmin2014 qinhanmin2014 deleted the my-feature-2 branch May 5, 2019 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

brier_score_loss returns incorrect value when all y_true values are True/1
3 participants