Skip to content

[MRG+1] Add scorer based on brier_score_loss #9521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 21, 2017
Merged

[MRG+1] Add scorer based on brier_score_loss #9521

merged 11 commits into from
Aug 21, 2017

Conversation

qinhanmin2014
Copy link
Member

Reference Issue

Fixes #9512

What does this implement/fix? Explain your changes.

Add scorer based on brier_score_loss.

Any other comments?

@@ -514,6 +514,7 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False,
log_loss_scorer = make_scorer(log_loss, greater_is_better=False,
needs_proba=True)
log_loss_scorer._deprecation_msg = deprecation_msg
brier_score_loss_scorer = make_scorer(brier_score_loss)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @qinhanmin2014, I think that the greater_is_better parameter in make_scorer should be set to False because brier_score_loss is a loss function. It looks like it is True by default (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L400)

@qinhanmin2014 qinhanmin2014 changed the title [MRG] Add scorer based on brier_score_loss [WIP] Add scorer based on brier_score_loss Aug 11, 2017
@qinhanmin2014
Copy link
Member Author

@jnothman Sorry to disturb but I think currently we can't implement scorer based on brier_score_loss because like other similar metrics(e.g., log_loss with a scorer log_loss_scorer), it depends on predicted probabilities but it simply can't support the result returned by predict_proba(shape= (n_samples, n_classes)).
From log_loss(already has a scorer):
2017-08-11_230437
From brier_score_loss:
2017-08-11_230453
Do I make a mistake? Thanks.

@amueller
Copy link
Member

amueller commented Aug 11, 2017

you just need to pass just the second column, maybe using

make_scorer(lambda y_true, y_pred: brier_score_loss(y_true, y_pred[:, 1]),
            needs_proba=True, greater_is_better=False)

(untested)

@qinhanmin2014 qinhanmin2014 changed the title [WIP] Add scorer based on brier_score_loss [MRG] Add scorer based on brier_score_loss Aug 12, 2017
@qinhanmin2014
Copy link
Member Author

@amueller Thanks a lot. It works! I also add a comment to address the reason(maybe limitation) of current implementation.

@@ -514,6 +514,13 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False,
log_loss_scorer = make_scorer(log_loss, greater_is_better=False,
needs_proba=True)
log_loss_scorer._deprecation_msg = deprecation_msg
# Currently brier_score_loss don't support the shape of result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But surely we need the same for roc_auc_score (although that can fall back on decision_function when available). What's going on in that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman Thanks. I checked the document. roc_auc_score support shape = [n_samples] or [n_samples, n_classes]. (note that roc_auc_score belongs to _ThresholdScorer, and log_loss_score belongs to _ProbaScorer)

Copy link
Member

@amueller amueller Aug 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If brier_score is the odd one in not supporting the standard predict_proba shape, we might want to fix it there instead?

@jnothman
Copy link
Member

jnothman commented Aug 13, 2017 via email

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks.
(1)According to the docstring, brier_score_loss is only used in binary cases. If this requirement is satisfied, the implementation of the scorers seems to be correct.
(2)However currently, we don't have such check in brier_score_loss, e.g., if we change y_true from [0,1,1,0] to [0,2,2,1], we get the same result(because we regard the biggest value as 1, the rest as 0).
I'm inspecting roc_auc_score and will reply soon :)

@qinhanmin2014
Copy link
Member Author

@jnothman Here's my observation.
For roc_auc_score, when the shape of y_score is [n_samples, n_classes], the shape of y_true also need to be [n_samples, n_classes] (multilabel-indicator). Then each corresponding line is selected. The scores are then calculated and averaged.
Core function:_average_binary_score(in base.py)

for c in range(n_classes):
        y_true_c = y_true.take([c], axis=not_average_axis).ravel()
        y_score_c = y_score.take([c], axis=not_average_axis).ravel()
        score[c] = binary_metric(y_true_c, y_score_c,
                                 sample_weight=score_weight)

Note that in the loop, the shape of y_true_c and y_score_c has become [n_samples]. So that's why roc_auc_score is based on roc_curve and roc_curve only supports shape = [n_samples].

@amueller
Copy link
Member

But if someone does cross_val_score(DecisionTreeClassifier(), X, y, scoring="roc_auc") that works with binary even though predict_proba is (n_samples, 2) and y is not multi-label.

@jnothman
Copy link
Member

jnothman commented Aug 14, 2017 via email

@qinhanmin2014
Copy link
Member Author

qinhanmin2014 commented Aug 15, 2017

@jnothman @amueller (kindly regret me for a long comment and possible typo:))
From my perspective, both @amueller and myself are right. The core issue here is that scorer has a wrapper(_PredictScorer/_ProbaScorer/_ThresholdScorer), which can help it support more complicated situations.
roc_auc_score support binary y_true(shape=[n_samples]) or multilabel-indicator y_true(shape=[n_samples, n_classes]). For binary classification, we can use binary y_true or multilabel-indicator y_true, but for multiclass classification, we can only use multilabel-indicator y_true. The shape of y_pred has to be consistent with y_true.

y_true_1 = np.array([[0, 1], [1, 0], [1, 0], [0, 1]]*10) # multilabel-indicator
y_true_2 = np.array([1, 0, 0, 1]*10) # binary
y_prob_1 = np.array([[0.1, 0.9], [0.8, 0.2], [0.3, 0.7], [0.4, 0.6]]*10)
y_prob_2 = np.array([0.9, 0.2, 0.7, 0.6]*10)
roc_auc_score(y_true_1, y_prob_1) # no error, can extent to multiclass
roc_auc_score(y_true_1, y_prob_2) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_1) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_2) 
# no error, result same as the first one, only for binary classification 

However, since roc_auc_scorer is wrapped by _ThresholdScorer, it can support more complicated situations.
For example, through the following code in _ThresholdScorer:

if y_type == "binary":
    y_pred = y_pred[:, 1]

roc_auc_scorer can actually support binary y_true(shape=[n_samples]) and the output of predict_proba (shape=[n_samples, n_classes]).

X = y_prob_1 # only for convinience
cross_val_score(DecisionTreeClassifier(), X, y_true_1, scoring="roc_auc") 
# no problem like roc_auc_score
cross_val_score(DecisionTreeClassifier(), X, y_true_2, scoring="roc_auc") 
# no problem with the wrapper

When implementing scorer based on brier_loss_score, we need to use _ThresholdScorer. But unlike _ProbaScorer, it is naive since the only scorer based on it(log_loss_scorer) has done everything by itself.

log_loss(y_true_1, y_prob_1) # no error
log_loss(y_true_1, y_prob_2) # no error
log_loss(y_true_2, y_prob_1) # no error
log_loss(y_true_2, y_prob_2) # no error
# they are the same

@qinhanmin2014
Copy link
Member Author

@jnothman @amueller
I have come up with another way (might better) for the scorer. Could you please have a look? Thanks.

@amueller
Copy link
Member

The question was for binary classification "why does it work at all" and the answer is: _ThresholdScorer has an if y_type == 'binary' and we're using predict_proba we slice the probabilities appropriately.
However _ProbaScorer doesn't do that. log_loss, the only other probabilistic score that we have right now, accepts either (n_samples,) and assumes it's the positive class (the larger one) or (n_samples, 2).

I think we can be flexible and do the same in brier_score_loss as we do in log_loss. Making the interface more flexible there also benefits people that directly use the function.

Btw, it looks like brier_score_loss might want to error if only one class is present and pos_label=None as we do in log_loss.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this approach is fine, copying the ThresholdScorer behaviour to ProbaScorer. LGTM

@jnothman jnothman changed the title [MRG] Add scorer based on brier_score_loss [MRG+1] Add scorer based on brier_score_loss Aug 15, 2017
@qinhanmin2014
Copy link
Member Author

@jnothman @amueller
I opened a new pull request #9562 to address your concerns about brier_score_loss along with a bug fix. Could you please have a look? Thanks:)

@amueller
Copy link
Member

Alright, I'm fine with merging this as well. Note, though, that it changes what is passed for log_loss, but that's inconsequential (unless someone provided a malformed probability that doesn't sum to 1, in which case this patch will change results).

Anyway, I'm ok to merge.

@qinhanmin2014
Copy link
Member Author

qinhanmin2014 commented Aug 17, 2017

@jnothman @amueller Thanks. I updated what's new along with some minor fix about myself (things in 0.19.X but not in master and a strange duplicate entry.)
From my perspective, @jnothman could you please check @amueller 's previous comment. Current implementation will slightly change the behaviour of the scorer based on log_loss when users manage to pass probability that doesn't sum to 1 in binary case.
e.g.
y_pred=np.array([[0.1, 0.1], [0.9, 0.9], [0.2, 0.2], [0.8, 0.8]])
is previously perceived to be equal to y_pred=np.array([0.5, 0.5, 0.5, 0.5]) (normalized)
but now equal to y_pred=np.array([0.1, 0.9, 0.2, 0.8]) (we only take the second column and it's considered to be the probability of the positive class)
Note that the doc state that we only support predicted probabilities for log_loss. We can go back to the previous solution to remain everything the same but that might seems awkward for the scorer.

@qinhanmin2014
Copy link
Member Author

slight ping @jnothman Thanks :)

y_pred = clf.predict_proba(X)
if y_type == "binary":
y_pred = y_pred[:, 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change needs a non-regression test.

Otherwise LGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman Sorry but I don't quite understand your meaning. Currently, it seems that we don't test the wrapper of the scorer. Instead, we directly test the scorer to ensure a reasonable behaviour. And I have added the new scorer to the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I think I got confused about what we were looking at. LGTM.

@jnothman jnothman merged commit ee2025f into scikit-learn:master Aug 21, 2017
@qinhanmin2014 qinhanmin2014 deleted the my-feature-1 branch August 22, 2017 00:40
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
alien3211 pushed a commit to alien3211/scikit-learn that referenced this pull request Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No scorer for Brier
4 participants