-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Add scorer based on brier_score_loss #9521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/metrics/scorer.py
Outdated
@@ -514,6 +514,7 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False, | |||
log_loss_scorer = make_scorer(log_loss, greater_is_better=False, | |||
needs_proba=True) | |||
log_loss_scorer._deprecation_msg = deprecation_msg | |||
brier_score_loss_scorer = make_scorer(brier_score_loss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @qinhanmin2014, I think that the greater_is_better
parameter in make_scorer
should be set to False
because brier_score_loss
is a loss function. It looks like it is True
by default (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L400)
@jnothman Sorry to disturb but I think currently we can't implement scorer based on brier_score_loss because like other similar metrics(e.g., log_loss with a scorer log_loss_scorer), it depends on predicted probabilities but it simply can't support the result returned by predict_proba(shape= (n_samples, n_classes)). |
you just need to pass just the second column, maybe using make_scorer(lambda y_true, y_pred: brier_score_loss(y_true, y_pred[:, 1]),
needs_proba=True, greater_is_better=False) (untested) |
@amueller Thanks a lot. It works! I also add a comment to address the reason(maybe limitation) of current implementation. |
sklearn/metrics/scorer.py
Outdated
@@ -514,6 +514,13 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False, | |||
log_loss_scorer = make_scorer(log_loss, greater_is_better=False, | |||
needs_proba=True) | |||
log_loss_scorer._deprecation_msg = deprecation_msg | |||
# Currently brier_score_loss don't support the shape of result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But surely we need the same for roc_auc_score
(although that can fall back on decision_function
when available). What's going on in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Thanks. I checked the document. roc_auc_score support shape = [n_samples] or [n_samples, n_classes]. (note that roc_auc_score belongs to _ThresholdScorer, and log_loss_score belongs to _ProbaScorer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If brier_score
is the odd one in not supporting the standard predict_proba shape, we might want to fix it there instead?
hmm it's not altogether the fault of the metric function. I think something
like what you have here is okay, except that it needs some validating, to
make sure the input is of the right shape...
I'm struggling to see where roc selects one column from the input. Can you
trace what's going on there?
…On 14 Aug 2017 1:50 am, "Andreas Mueller" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/metrics/scorer.py
<#9521 (comment)>
:
> @@ -514,6 +514,13 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False,
log_loss_scorer = make_scorer(log_loss, greater_is_better=False,
needs_proba=True)
log_loss_scorer._deprecation_msg = deprecation_msg
+# Currently brier_score_loss don't support the shape of result
If brier_score is the odd one in not supporting the standard
predict_proba shape, we might want to fix it there instead? I'm not
entirely certain, though...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9521 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yEe34tFk7qI4_4pK2W1f4z9PAfEks5sXxtYgaJpZM4O0JGq>
.
|
@jnothman Thanks. |
@jnothman Here's my observation. for c in range(n_classes):
y_true_c = y_true.take([c], axis=not_average_axis).ravel()
y_score_c = y_score.take([c], axis=not_average_axis).ravel()
score[c] = binary_metric(y_true_c, y_score_c,
sample_weight=score_weight) Note that in the loop, the shape of y_true_c and y_score_c has become [n_samples]. So that's why roc_auc_score is based on roc_curve and roc_curve only supports shape = [n_samples]. |
But if someone does |
yes, the binary case is what I was wondering about. How is that working? Is
it working?
…On 15 Aug 2017 3:21 am, "Andreas Mueller" ***@***.***> wrote:
But if someone does cross_val_score(DecisionTreeClassifier(), X, y,
scoring="roc_auc") that works with binary even though predict_proba is (n_samples,
2) and y is not multi-label.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9521 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65fxrCKhiu79o6r2ZXZhEdQCd8-Kks5sYIIzgaJpZM4O0JGq>
.
|
@jnothman @amueller (kindly regret me for a long comment and possible typo:)) y_true_1 = np.array([[0, 1], [1, 0], [1, 0], [0, 1]]*10) # multilabel-indicator
y_true_2 = np.array([1, 0, 0, 1]*10) # binary
y_prob_1 = np.array([[0.1, 0.9], [0.8, 0.2], [0.3, 0.7], [0.4, 0.6]]*10)
y_prob_2 = np.array([0.9, 0.2, 0.7, 0.6]*10) roc_auc_score(y_true_1, y_prob_1) # no error, can extent to multiclass
roc_auc_score(y_true_1, y_prob_2) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_1) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_2)
# no error, result same as the first one, only for binary classification However, since roc_auc_scorer is wrapped by _ThresholdScorer, it can support more complicated situations. if y_type == "binary":
y_pred = y_pred[:, 1] roc_auc_scorer can actually support binary y_true(shape=[n_samples]) and the output of predict_proba (shape=[n_samples, n_classes]). X = y_prob_1 # only for convinience
cross_val_score(DecisionTreeClassifier(), X, y_true_1, scoring="roc_auc")
# no problem like roc_auc_score
cross_val_score(DecisionTreeClassifier(), X, y_true_2, scoring="roc_auc")
# no problem with the wrapper When implementing scorer based on brier_loss_score, we need to use _ThresholdScorer. But unlike _ProbaScorer, it is naive since the only scorer based on it(log_loss_scorer) has done everything by itself. log_loss(y_true_1, y_prob_1) # no error
log_loss(y_true_1, y_prob_2) # no error
log_loss(y_true_2, y_prob_1) # no error
log_loss(y_true_2, y_prob_2) # no error
# they are the same |
The question was for binary classification "why does it work at all" and the answer is: I think we can be flexible and do the same in Btw, it looks like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this approach is fine, copying the ThresholdScorer behaviour to ProbaScorer. LGTM
Alright, I'm fine with merging this as well. Note, though, that it changes what is passed for Anyway, I'm ok to merge. |
@jnothman @amueller Thanks. I updated what's new along with some minor fix about myself (things in 0.19.X but not in master and a strange duplicate entry.) |
slight ping @jnothman Thanks :) |
y_pred = clf.predict_proba(X) | ||
if y_type == "binary": | ||
y_pred = y_pred[:, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change needs a non-regression test.
Otherwise LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Sorry but I don't quite understand your meaning. Currently, it seems that we don't test the wrapper of the scorer. Instead, we directly test the scorer to ensure a reasonable behaviour. And I have added the new scorer to the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I think I got confused about what we were looking at. LGTM.
Reference Issue
Fixes #9512
What does this implement/fix? Explain your changes.
Add scorer based on brier_score_loss.
Any other comments?