-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Improvement and bug fix for brier_score_loss #9562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/metrics/classification.py
Outdated
if (np.array_equal(classes, [0]) or | ||
np.array_equal(classes, [-1]) or | ||
np.array_equal(classes, [1])): | ||
pos_label = 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this treatment from logloss? it seems a bit magical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller Thanks.
Currently, log_loss don't support pos_label so the code is taken from _binary_clf_curve. But in the test we have to support situations when y_pred = ['egg', 'spam'] without a pos_label (see test_invariance_string_vs_numbers_labels). In order not to remove test, I give this awkward solution.
Btw, it might be OK not to use float like _binary_clf_curve because np.array_equal([1.0], [1])=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. That test uses pos_label
right?
Patience... I'm mostly doing this on volunteer time, and Andy's had some presentations to give. Thanks @qinhanmin2014 for your very good work. |
@jnothman Thanks. Kindly give me some suggestions when you have time and I'll improve accordingly as soon as possible. |
sklearn/metrics/classification.py
Outdated
if len(classes) > 2: | ||
raise ValueError("Only binary classification is supported.") | ||
|
||
# ensure valid y_true if pos_label is not specified | ||
if pos_label is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why instead don't you put this logic in _check_binary_probabilistic_predictions
? Surely it (if not pos_label
, at least the handling of a single class) applies there too.
@jnothman Thanks. |
sklearn/metrics/classification.py
Outdated
_check_binary_probabilistic_predictions(y_true, y_prob) | ||
|
||
# ensure valid y_true if pos_label is not specified | ||
classes = np.unique(y_true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that this logic could move to the helper, so that other binary probabilistic metrics would similarly identify -1 alone as the negative class, and to avoid rubbing unique multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Sorry but I don't quite understand. This part is about giving pos_label the right value and raise an error when needed. Currently, _check_binary_probabilistic_predictions
don't accept pos_label as a parameter. So it might be hard to move the logic into _check_binary_probabilistic_predictions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After determining pos_label
's appropriate value, it is used once: to binarise y_true
into {0,1}. This is also what _check_binary_probabilistic_predictions
does, only without the ability to configure pos_label
. If _check_binary_probabilistic_predictions
took an optional pos_label
you could handle all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Sorry to disturb again but I have some questions:
(1)It seems that _check_binary_probabilistic_predictions
is only used in brier_score_loss(metrics.classification) and calibration_curve(calibration).
The latter do not support pos_label.
(2)I am wondering how we should took the optional pos_label since even if we support pos_label, there are cases when pos_label is not provided by users(default to None).
It seems that I cannot use def(..., pos_label=None)
and it seems ugly to use something like def(..., **dict)
.
(3)From my perspective, since _check_binary_probabilistic_predictions
is only used in two functions and in each of these two functions, some part of it is meaningless(e.g., label_binarize in brier_score_loss and the check of y_prob in calibration_curve), how about remove the function(no test directly use the function)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label_binarize isn't meaningless in brier: it's basically identical to what's happening in https://github.com/scikit-learn/scikit-learn/pull/9562/files/adfbbec50f326fe0a3678ea29c630269ef258fe1#diff-b04acd877dd793f28ae7be13a999ed88R1931
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why can't you use def _check_binary_probabilistic_predictions(y_true, y_prob, pos_label=None)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman For example, in the function, if pos_label=None, there are two situations
(1)I don't pass pos_label(used in calibration)
(2)I pass pos_label=None(used in brier_score_loss when pos_label is not provided)
How can I distinguish them? Thanks. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to distinguish them? Should they not be treated the same?
@jnothman @amueller |
sklearn/metrics/classification.py
Outdated
@@ -1912,8 +1912,10 @@ def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None): | |||
assert_all_finite(y_true) | |||
assert_all_finite(y_prob) | |||
|
|||
# currently, we only support binary classification | |||
_check_binary_probabilistic_predictions(y_true, y_prob) | |||
|
|||
if pos_label is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these two lines are unnecessary if you just set y_true
to the output of _check_binary_probabilistic_predictions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Sorry but I don't understand your meaning. Which two lines? The core issue I want to solve here is that previous implementation use the function after y_true is binarized. Kindly provide more details please. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I realise again that you're not using _check_binary_probabilistic_predictions
because it does not support pos_label
. But it probably should.
My issue is that _check_binary_probabilistic_predictions
does processing and returns something. It's weird not to use that return value.
I think I would add to _check_binary_probabilistic_predictions
:
out = label_binarize...
if pos_label is not None:
if pos_label != labels[-1]:
if pos_label != labels[0]:
raise ValueError...
out = 1 - out
ping @jnothman, over a month since the last reply. Could you please give a further review on it? Thanks a lot :) |
thanks for your great contributions, but my availability has been reduced
over the last month with 0.19.1 the major priority
|
@jnothman Thanks a lot :) I'll wait. |
Temporarily closing to consider the solution. Will split/reopen the PR if necessary. |
…ision_score (#9980) <!-- Thanks for contributing a pull request! Please ensure you have taken a look at the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist --> #### Reference Issues/PRs <!-- Example: Fixes #1234. See also #3456. Please use keywords (e.g., Fixes) to create link to the issues or pull requests you resolved, so that they will automatically be closed when your pull request is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests --> part of #9829 #### What does this implement/fix? Explain your changes. (1)add pos_label parameter to average_precision_score (Although we finally decide not to introduce pos_label in roc_auc_score, I think we might need pos_label here. Because there are no relationship between the results if we reverse the true labels, also, precision/recall all support pos_label) (2)fix a bug where average_precision_score will sometimes return nan when sample_weight contains 0 ```python y_true = np.array([0, 0, 0, 1, 1, 1]) y_score = np.array([0.1, 0.4, 0.85, 0.35, 0.8, 0.9]) average_precision_score(y_true, y_score, sample_weight=[1, 1, 0, 1, 1, 0]) # output:nan ``` I do it here because of (3) (3)move average_precision scores out of METRIC_UNDEFINED_BINARY (this should contain the regression test for (1) and (2)) Some comments: (1)For the underlying method(precision_recall_curve), the default value of pos_label is None, but I choose to set the default value of pos_label to 1 because this is what P/R/F is doing. What's more, the meaning of pos_label=None is not clear even in scikit-learn itself (see #10010) (2)I slightly modified the common test. Currently, the part I modified is only designed for brier_score_loss(I'm doing the same thing in #9562) . I think it is right because as a common test, it seems not good to force metrics to accept str y_true without pos_label. #### Any other comments? cc @jnothman Could you please take some time to review or at least judge whether this is the right way to go? Thanks a lot :) <!-- Please be aware that we are a loose team of volunteers so patience is necessary; assistance handling other issues is very welcome. We value all user contributions, no matter how minor they are. If we are slow to review, either the pull request needs some benchmarking, tinkering, convincing, etc. or more likely the reviewers are simply busy. In either case, we ask for your understanding during the review process. For more information, see our FAQ on this topic: http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention. Thanks for contributing! -->
Reference Issue
Finish up #9521
What does this implement/fix? Explain your changes.
(1)raise an error for multiclass y_true
(2)raise an error for invalid pos_label
(3)fix an old bug when y_true only has one label (closes #8459, closes #9300, closes #9301)
Any other comments?