Skip to content

[WIP] FEA New meta-estimator to post-tune the decision_function/predict_proba threshold for binary classifiers #16525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 49 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
d3705d6
FEA add CutoffCalibration estimator
glemaitre Feb 23, 2020
b99218b
add example
glemaitre Feb 23, 2020
ce66427
PEP8
glemaitre Feb 23, 2020
3ee3b5a
add whats new entry
glemaitre Feb 23, 2020
c5a51eb
mark as only working with binary data
glemaitre Feb 23, 2020
0aab70c
common tests fixes
glemaitre Feb 23, 2020
6e12f8a
iter
glemaitre Feb 24, 2020
420df8d
xxx
glemaitre Feb 24, 2020
6b36f68
iter
glemaitre Feb 24, 2020
00da5f7
pep8
glemaitre Feb 24, 2020
e8837e0
iter
glemaitre Feb 25, 2020
1a03fad
move to model_selection
glemaitre Feb 25, 2020
63b285a
add the missing files
glemaitre Feb 25, 2020
65e8329
Remove current documentation
glemaitre Feb 25, 2020
d924684
DOC add docstring examples
glemaitre Feb 25, 2020
237c919
PEP8
glemaitre Feb 25, 2020
4210537
fix for predict proba
glemaitre Feb 25, 2020
255dfe8
add support for cost-sensitive
glemaitre Feb 25, 2020
f3a372d
revert calibration changes
glemaitre Feb 25, 2020
f998625
start doc
glemaitre Feb 26, 2020
ca4c50a
iter
glemaitre Mar 11, 2020
3eb0289
Merge remote-tracking branch 'origin/master' into is/10117
glemaitre Mar 19, 2020
c0acab0
skip test
glemaitre Mar 19, 2020
3edc421
iter
glemaitre Mar 20, 2020
982918a
remove unsued code
glemaitre Mar 20, 2020
a426131
docstring
glemaitre Mar 20, 2020
470a09c
pep9
glemaitre Mar 20, 2020
a565589
TST wip
glemaitre Aug 7, 2020
6c0db49
TST wip
glemaitre Aug 7, 2020
c41e999
TST wip
glemaitre Aug 7, 2020
f53f833
TST PEP8 + comments
glemaitre Aug 7, 2020
dd4e9fe
TST force to use predict_proba as well
glemaitre Aug 7, 2020
e32cfa7
DOC add whats + PEP8
glemaitre Aug 7, 2020
07915e9
TST add some tolerance since the average of squared in diff ordered
glemaitre Aug 7, 2020
fc1c422
STY add better error message and refactor code
glemaitre Aug 10, 2020
aa5cd16
fix
glemaitre Aug 10, 2020
a669ecf
fix
glemaitre Aug 10, 2020
7fadbfb
Merge remote-tracking branch 'origin/master' into is/10117
glemaitre Aug 10, 2020
a477e7b
Update sklearn/metrics/tests/test_score_objects.py
glemaitre Aug 11, 2020
09b47bb
add test for PredictScorer
glemaitre Aug 11, 2020
42e7f00
apply olivier suggestions
glemaitre Aug 18, 2020
e9d7873
use list
glemaitre Aug 18, 2020
add8320
Merge remote-tracking branch 'origin/master' into is/10117
glemaitre Aug 19, 2020
7025768
fix
glemaitre Aug 19, 2020
0520f96
Merge remote-tracking branch 'glemaitre/is/scorer_pos_label' into is/…
glemaitre Aug 19, 2020
536753f
fix
glemaitre Aug 19, 2020
6a12a1f
PEP8
glemaitre Aug 19, 2020
b8cfd34
Merge branch 'is/scorer_pos_label' into is/10117
glemaitre Aug 19, 2020
0713232
finally passing?
glemaitre Aug 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/model_selection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Model selection and evaluation

modules/cross_validation
modules/grid_search
modules/prediction
modules/model_evaluation
modules/model_persistence
modules/learning_curve
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1153,6 +1153,7 @@ Splitter Classes
:toctree: generated/
:template: class.rst

model_selection.CutoffClassifier
model_selection.GroupKFold
model_selection.GroupShuffleSplit
model_selection.KFold
Expand Down
34 changes: 34 additions & 0 deletions doc/modules/prediction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
.. currentmodule:: sklearn.model_selection

.. _prediction_tuning:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, is this more about threshold tuning than prediction tuning?


================================================
Tuning of the decision threshold of an estimator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tuning of the decision threshold of an estimator
Tuning of the decision threshold of a classifier

================================================

The real-valued decision functions, i.e. `decision_function` and
`predict_proba`, of machine-learning classifiers carry the inherited biases of
Comment on lines +9 to +10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The real-valued decision functions, i.e. `decision_function` and
`predict_proba`, of machine-learning classifiers carry the inherited biases of
The real-valued decision functions, i.e. :term:`decision_function` and
:term:`predict_proba`, of machine-learning classifiers carry the inherited biases of

the fitted model; e.g, in a class imbalanced setting, a classifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the fitted model; e.g, in a class imbalanced setting, a classifier
the fitted model; e.g, in a class-imbalance setting, a classifier

will naturally lean toward the most frequent class. In some other cases, the
generic objective function used to train a model is generally unaware of the
evaluation criteria used to evaluate the model; e.g., one might want to
penalized differently a false-positive and false-negative ---it will be less
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
penalized differently a false-positive and false-negative ---it will be less
penalize differently a false-positive and false-negative: it will be less

detrimental to show an MR image without a cancer (i.e., false-positive) to a
radiologist than hidding one with a cancer (i.e, false-negtative) when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
radiologist than hidding one with a cancer (i.e, false-negtative) when
radiologist than to hide one with a cancer (i.e, false-negtative) when

developing some computer-aided diagnosis system.

In a binary classification scenario, the hard-prediction, i.e. `predict`, for a
classifier most commonly use the `predict_proba` and apply a decision threshold
at 0.5 to output a positive or negative label. Thus, this hard-prediction
suffers from the same drawbacks than the one raised in the above paragraph.
Comment on lines +20 to +23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest:

In scikit-learn, classifiers apply a hard threshold to the output of predict_proba or decision_function in order to predict a class with :term:predict. This threshold is typically at 0.5 for probabilities as e.g. in :class:~sklearn.linear_model.LogisticRegression, and at 0 for decision functions as in the SVM classifiers. These hard-coded thresholds typically suffer from the class-imbalance issue and the false positive / false negative biases mentioned above.


Post-tuning of the decision threshold
=====================================

:class:`CutoffClassifier` allows for post-tuning the decision threshold using
either `decision_function` or `predict_proba` and an objective metric for which
we want our threshold to be optimized for.

Fine-tune using a single objective metric
-----------------------------------------

6 changes: 6 additions & 0 deletions doc/whats_new/v0.23.rst
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,12 @@ Changelog
be removed in 0.25. :pr:`16401` by
:user:`Arie Pratama Sutiono <ariepratama>`

- |MajorFeature| :class:`model_selection.CutoffClassifier` calibrates the
decision threshold function of a classifier by maximizing a binary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
decision threshold function of a classifier by maximizing a binary
decision threshold function of a binary classifier by maximizing a

classification metric through cross-validation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
classification metric through cross-validation.
classification metric through cross-validation.
That decision threshold is then used to convert a probability or the output of the decision function into a predicted class.

:pr:`16525` by :user:`Guillaume Lemaitre <glemaitre>` and
:user:`Prokopis Gryllos <PGryllos>`.

:mod:`sklearn.multioutput`
..........................

Expand Down
7 changes: 7 additions & 0 deletions doc/whats_new/v0.24.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,13 @@ Changelog
class to be used when computing the roc auc statistics.
:pr:`17651` by :user:`Clara Matos <claramatos>`.

- |Fix| Fix scorers that accept a pos_label parameter and compute their metrics
from values returned by `decision_function` or `predict_proba`. Previously,
they would return erroneous values when pos_label was not corresponding to
`classifier.classes_[1]`. This is especially important when training
classifiers directly with string labeled target classes.
:pr:`#18114` by :user:`Guillaume Lemaitre <glemaitre>`.

:mod:`sklearn.model_selection`
..............................

Expand Down
2 changes: 1 addition & 1 deletion sklearn/metrics/_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -1251,7 +1251,7 @@ def _check_set_wise_labels(y_true, y_pred, average, labels, pos_label):
str(average_options))

y_type, y_true, y_pred = _check_targets(y_true, y_pred)
present_labels = unique_labels(y_true, y_pred)
present_labels = unique_labels(y_true, y_pred).tolist()
if average == 'binary':
if y_type == 'binary':
if pos_label not in present_labels:
Expand Down
69 changes: 53 additions & 16 deletions sklearn/metrics/_scorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,48 @@ def __init__(self, score_func, sign, kwargs):
self._score_func = score_func
self._sign = sign

@staticmethod
def _check_pos_label(pos_label, classes):
if pos_label not in list(classes):
raise ValueError(
f"pos_label={pos_label} is not a valid label: {classes}"
)

def _check_decision_function(self, y_pred, classes):
"""Reverse the decision function depending of pos_label."""
pos_label = self._kwargs.get("pos_label", classes[1])
self._check_pos_label(pos_label, classes)
if pos_label == classes[0]:
# The implicit positive class of the binary classifier
# does not match `pos_label`: we need to invert the
# predictions
y_pred *= -1

return y_pred

def _select_proba(self, y_pred, classes, support_multi_class):
"""Select the column of y_pred when probabilities are provided."""
if y_pred.shape[1] == 2:
pos_label = self._kwargs.get("pos_label", classes[1])
self._check_pos_label(pos_label, classes)
col_idx = np.flatnonzero(classes == pos_label)[0]
y_pred = y_pred[:, col_idx]
else:
err_msg = (
f"Got predict_proba of shape {y_pred.shape}, but need "
f"classifier with two classes for {self._score_func.__name__} "
f"scoring"
)
if support_multi_class and y_pred.shape[1] == 1:
# In _ProbaScorer, y_true can be tagged as binary while the
# y_pred is multi_class. This case is supported when label is
# provided.
raise ValueError(err_msg)
elif not support_multi_class:
raise ValueError(err_msg)

return y_pred

def __repr__(self):
kwargs_string = "".join([", %s=%s" % (str(k), str(v))
for k, v in self._kwargs.items()])
Expand Down Expand Up @@ -238,13 +280,9 @@ def _score(self, method_caller, clf, X, y, sample_weight=None):
y_type = type_of_target(y)
y_pred = method_caller(clf, "predict_proba", X)
if y_type == "binary":
if y_pred.shape[1] == 2:
y_pred = y_pred[:, 1]
elif y_pred.shape[1] == 1: # not multiclass
raise ValueError('got predict_proba of shape {},'
' but need classifier with two'
' classes for {} scoring'.format(
y_pred.shape, self._score_func.__name__))
y_pred = self._select_proba(
y_pred, clf.classes_, support_multi_class=True
)
if sample_weight is not None:
return self._sign * self._score_func(y, y_pred,
sample_weight=sample_weight,
Expand Down Expand Up @@ -298,22 +336,21 @@ def _score(self, method_caller, clf, X, y, sample_weight=None):
try:
y_pred = method_caller(clf, "decision_function", X)

# For multi-output multi-class estimator
if isinstance(y_pred, list):
# For multi-output multi-class estimator
y_pred = np.vstack([p for p in y_pred]).T
elif y_type == "binary":
y_pred = self._check_decision_function(
y_pred, clf.classes_
)

except (NotImplementedError, AttributeError):
y_pred = method_caller(clf, "predict_proba", X)

if y_type == "binary":
if y_pred.shape[1] == 2:
y_pred = y_pred[:, 1]
else:
raise ValueError('got predict_proba of shape {},'
' but need classifier with two'
' classes for {} scoring'.format(
y_pred.shape,
self._score_func.__name__))
y_pred = self._select_proba(
y_pred, clf.classes_, support_multi_class=False,
)
elif isinstance(y_pred, list):
y_pred = np.vstack([p[:, -1] for p in y_pred]).T

Expand Down
12 changes: 7 additions & 5 deletions sklearn/metrics/tests/test_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
from itertools import chain
from itertools import permutations
import warnings
import re

import numpy as np
from scipy import linalg
Expand Down Expand Up @@ -1247,7 +1246,7 @@ def test_multilabel_hamming_loss():
def test_jaccard_score_validation():
y_true = np.array([0, 1, 0, 1, 1])
y_pred = np.array([0, 1, 0, 1, 1])
err_msg = r"pos_label=2 is not a valid label: array\(\[0, 1\]\)"
err_msg = r"pos_label=2 is not a valid label: \[0, 1\]"
with pytest.raises(ValueError, match=err_msg):
jaccard_score(y_true, y_pred, average='binary', pos_label=2)

Expand Down Expand Up @@ -2262,9 +2261,12 @@ def test_brier_score_loss():
# ensure to raise an error for multiclass y_true
y_true = np.array([0, 1, 2, 0])
y_pred = np.array([0.8, 0.6, 0.4, 0.2])
error_message = ("Only binary classification is supported. Labels "
"in y_true: {}".format(np.array([0, 1, 2])))
with pytest.raises(ValueError, match=re.escape(error_message)):
error_message = (
r"Only binary classification is supported. Labels in y_true: "
r"\[0 1 2\]"
)

with pytest.raises(ValueError, match=error_message):
brier_score_loss(y_true, y_pred)

# calculate correctly when there's only one class in y_true
Expand Down
Loading