FIX select the probability estimates or transform the decision values when pos_label is provided #18114

glemaitre · 2020-08-07T09:10:36Z

Solve a bug where the appropriate column of the probability estimates was not selected or that the decision values where not inversed when pos_label is passed and that it does not correspond to clf.classes_[1].

glemaitre · 2020-08-07T09:31:13Z

@jnothman @thomasjpfan OK so I think this is the only fixes required.
The tests are failing in master. It would have involved that someone creating a scorer based on average position with pos_label and string labels with an ordering where the positive label would have been clf.classes_[0] would have get something wrong.
In a GridSearchCV, you would have potentially optimized the opposite of what you would expect.

Most probably, nobody encountered this issue :)

glemaitre · 2020-08-07T09:32:23Z

Once this PR is merged, I plan to make another one with either an entry in the user guide or an example to show how to pass the pos_label and what could be the drawback if you don't do it.

glemaitre · 2020-08-07T09:35:23Z

ping @ogrisel as well.

thomasjpfan

This PR would handle the case where there is a mismatch between pos_label and estimator.classes_[1].

I keep on wondering if this behavior of selecting the column would be surprising to a user.

sklearn/metrics/_scorer.py

ogrisel

We might also have a similar problem with non-symmetric scorers that take hard class predictions such as f1 score for instance, no?

sklearn/metrics/tests/test_score_objects.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre · 2020-08-11T12:25:44Z

We might also have a similar problem with non-symmetric scorers that take hard class predictions such as f1 score for instance, no?

Since we don't need to select a column or inverse the decision, we should be safe. But I will add a test.

ogrisel

Some more review comments:

sklearn/metrics/tests/test_score_objects.py

doc/whats_new/v0.24.rst

sklearn/metrics/_scorer.py

ogrisel · 2020-08-11T12:54:31Z

sklearn/metrics/_scorer.py

+                # provided.
+                raise ValueError(err_msg)
+            elif not support_multi_class:
+                raise ValueError(err_msg)


I do not understand this. In both cases it returns the same message.

Also, what happens when support_multi_class and y_pred.shape[1] != 1?

It was what was intended in the past: https://github.com/scikit-learn/scikit-learn/pull/18114/files#diff-e907207584273858caf088b432e38d04L243-L247

Also, what happens when support_multi_class and y_pred.shape[1] != 1

It is a case where y_true is encoded as binary but that the y_pred` is multi-class. It is a case supported by ROC-AUC apparently and I needed to keep it like this for backward compatibility.

I am not sure that there is a better way of inferring the exact encoding in this case.

I added an example to be more specific. I think that we should investigate if we can pass labels to type_of_target which could be an optional argument given when we are using it in the metrics.

sklearn/metrics/_scorer.py

thomasjpfan

Another pass

sklearn/metrics/_scorer.py

sklearn/metrics/tests/test_score_objects.py

thomasjpfan

LGTM

sklearn/metrics/_scorer.py

ogrisel

I am still very much confused by the multiclass case. Could you please add tests for multiclass classification problem with proba and threshold scorers with string labels? Maybe that would clear the confusion and help me complete this review.

sklearn/metrics/_scorer.py

sklearn/metrics/tests/test_score_objects.py

ogrisel · 2020-09-09T15:47:38Z

sklearn/metrics/_scorer.py

+            #                     [0.  , 0.2 , 0.8 ]])
+            # roc_auc_score(
+            #     y_true, y_score, labels=[0, 1, 2], multi_class='ovo'
+            # )


I still do not understand this condition. This comment refers to the metric function outside of the scorer API.

I assume that this is related to to roc_auc_ovo_scorer which is a _ProbaScorer instance and this condition is about raising a ValueError when y_pred.shape[1] == 1 for some reason but I really do not see how this relates to the example you give here as y_pred has 3 columns in this example so it does not match the case of the condition.

I think this is trying to keep the logic from:

scikit-learn/sklearn/metrics/_scorer.py

Line 243 in 0a5af0d

elif y_pred.shape[1] == 1: # not multiclass

which I added in #15274. This was added because we can have a binary y_true with an estimator trained on > 2 classes.

y_pred.shape[1] == 1 was use to mean y_pred came from a classifier with only one class. The check for the shape of y_pred was added here: 94db3d9

But in scikit-learn we cannot have a classifier train with a single class, isn't it?

It is strange, but it can happen:

from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import make_blobs import numpy as np X, y = make_blobs(random_state=0, centers=2) clf = DecisionTreeClassifier() clf.fit(X, np.zeros_like(y)) clf.classes_ # array([0])

Uhm. Then, I am confused with check_classifiers_one_label :) I have to check. Anyway, I think that the last change make everything more explicit.

Oh I see a must_pass in the raises. It would be much more explicit to have tag for that I think: accept_single_label for instance.

sklearn/metrics/_scorer.py

…scorer_pos_label

thomasjpfan

Otherwise LGTM

sklearn/metrics/tests/test_score_objects.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

ogrisel

The code is much better organized that previous versions of this PR and the tests are great. Thanks very much for the fix @glemaitre.

… when pos_label is provided (scikit-learn#18114)

glemaitre added 4 commits August 7, 2020 09:42

TST wip

a565589

TST wip

6c0db49

TST wip

c41e999

TST PEP8 + comments

f53f833

github-actions bot added the module:metrics label Aug 7, 2020

glemaitre added 2 commits August 7, 2020 11:22

TST force to use predict_proba as well

dd4e9fe

DOC add whats + PEP8

e32cfa7

TST add some tolerance since the average of squared in diff ordered

07915e9

thomasjpfan reviewed Aug 7, 2020

View reviewed changes

sklearn/metrics/_scorer.py Outdated Show resolved Hide resolved

glemaitre added 3 commits August 10, 2020 10:42

STY add better error message and refactor code

fc1c422

fix

aa5cd16

fix

a669ecf

ogrisel reviewed Aug 11, 2020

View reviewed changes

sklearn/metrics/tests/test_score_objects.py Outdated Show resolved Hide resolved

Update sklearn/metrics/tests/test_score_objects.py

a477e7b

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

add test for PredictScorer

09b47bb

ogrisel reviewed Aug 11, 2020

View reviewed changes

glemaitre added 5 commits August 18, 2020 10:55

apply olivier suggestions

42e7f00

use list

e9d7873

fix

7025768

fix

536753f

PEP8

6a12a1f

glemaitre mentioned this pull request Aug 20, 2020

MNT Refactor scorer using _get_response #18212

Closed

glemaitre added 2 commits August 24, 2020 10:50

Merge branch 'master' into is/scorer_pos_label

38fc931

itter

40b3d0d

thomasjpfan reviewed Sep 2, 2020

View reviewed changes

sklearn/metrics/_scorer.py Outdated Show resolved Hide resolved

sklearn/metrics/_scorer.py Outdated Show resolved Hide resolved

sklearn/metrics/tests/test_score_objects.py Show resolved Hide resolved

iter

c2b97b5

glemaitre added 2 commits September 3, 2020 17:31

iter

eb18c83

iter

e9f608d

thomasjpfan approved these changes Sep 3, 2020

View reviewed changes

sklearn/metrics/_scorer.py Outdated Show resolved Hide resolved

ogrisel reviewed Sep 9, 2020

View reviewed changes

ogrisel mentioned this pull request Sep 9, 2020

Add multiclass support for brier_score_loss #16055

Closed

thomasjpfan reviewed Sep 9, 2020

View reviewed changes

sklearn/metrics/_scorer.py Show resolved Hide resolved

rth self-requested a review September 23, 2020 09:45

glemaitre added 6 commits October 5, 2020 16:47

Merge remote-tracking branch 'origin/master' into is/scorer_pos_label

e612795

Merge remote-tracking branch 'glemaitre/is/scorer_pos_label' into is/…

fc86836

…scorer_pos_label

iter

59a3e8f

only select probab in the binary case

bbe0e93

add small comment

98a745c

add assert suggested by ogrisel

19423af

glemaitre added the Waiting for Reviewer label Oct 5, 2020

glemaitre added 2 commits October 5, 2020 19:55

iter

ab30367

avoid warning

89c04f0

thomasjpfan approved these changes Oct 7, 2020

View reviewed changes

sklearn/metrics/tests/test_score_objects.py Outdated Show resolved Hide resolved

cmarmo removed the Waiting for Reviewer label Oct 8, 2020

Update sklearn/metrics/tests/test_score_objects.py

b3b0bfd

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

glemaitre added the Waiting for Reviewer label Oct 8, 2020

iter

a8d2c29

glemaitre mentioned this pull request Oct 8, 2020

[WIP] FEA New meta-estimator to post-tune the decision_function/predict_proba threshold for binary classifiers #16525

Closed

ogrisel self-requested a review October 9, 2020 14:37

ogrisel approved these changes Oct 9, 2020

View reviewed changes

ogrisel merged commit 193670c into scikit-learn:master Oct 9, 2020

amrcode pushed a commit to amrcode/scikit-learn that referenced this pull request Oct 19, 2020

FIX select the probability estimates or transform the decision values…

545ccba

… when pos_label is provided (scikit-learn#18114)

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

FIX select the probability estimates or transform the decision values…

87265b2

… when pos_label is provided (scikit-learn#18114)

Uh oh!

FIX select the probability estimates or transform the decision values when pos_label is provided #18114

FIX select the probability estimates or transform the decision values when pos_label is provided #18114

Uh oh!

Conversation

glemaitre commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Aug 7, 2020

Uh oh!

glemaitre commented Aug 7, 2020

Uh oh!

glemaitre commented Aug 7, 2020

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Aug 11, 2020

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Aug 7, 2020 •

edited

Loading