Skip to content

[MRG] Add decision threshold calibration wrapper #10117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
9f5a360
add rough implementation of threshold calibrator
Nov 12, 2017
132864b
fit base estimator after calibration
Nov 12, 2017
6477392
add rough implementation of threshold calibrator
Nov 12, 2017
f1f9112
fit base estimator after calibration
Nov 12, 2017
97f1fa7
change name to OptimalCutoffClassifier
Dec 17, 2017
13ee903
support arbitrary target values
Dec 17, 2017
6b8fd24
add methods max_sp and max_se
Dec 17, 2017
c95df08
Merge branch 'master' into feat/8614_add_threshold_calibration_wrapper
Dec 17, 2017
904483f
Merge branch 'feat/8614_add_threshold_calibration_wrapper' of github.…
Dec 17, 2017
60f641f
rename to CutoffClassifier
Dec 22, 2017
4e5a018
rename sensitivity / specificity to tpr / tnr
Dec 22, 2017
cc2b163
Merge branch 'master' into feat/8614_add_threshold_calibration_wrapper
Dec 22, 2017
125a9b2
remove attribute set in __init__
Dec 22, 2017
6a822c6
remove target check for binary values
Dec 22, 2017
c42564d
fix pep8
Dec 22, 2017
31da085
fix input to label encoder
Dec 23, 2017
19285fc
use LinearSVC if base estimator not provided
Dec 23, 2017
84ed3bc
add check_is_fitted check
Dec 23, 2017
2e5a9bb
add input validation checks
Dec 23, 2017
e63657c
Not allow None base estimator
Dec 23, 2017
a145a16
readd target validation check for binary values
Dec 23, 2017
3abd1fe
add trailing underscores to attributes
Dec 23, 2017
2f778fb
make cutoffclassifier meta
Dec 26, 2017
971c0ee
fix pep8
Dec 27, 2017
e3d6a15
fix docstring
Jan 7, 2018
7a99622
add value validation for min_val_tpr and min_val_tnr
Jan 7, 2018
773a78e
add test for cutoff classifier
Jan 7, 2018
e7506ca
fix inverse transforming of predictions
Jan 8, 2018
bba7003
extend cutoff_prefit test
Jan 8, 2018
ba56ed8
ignore warning from train_test_split
Jan 8, 2018
a5807fa
add cutoff cv test
Jan 8, 2018
3058bc0
change affiliation
Jan 12, 2018
24d074c
add citation
Jan 15, 2018
6f9ce4a
fix docstring
Jan 15, 2018
8fda0c4
re-fix affiliation
Jan 17, 2018
cf0e800
add example for roc decision threshold calibration method
Jan 17, 2018
071ed09
fix flake8
Jan 17, 2018
fd648a5
update plot title
Jan 28, 2018
5110b3e
add decision threshold calibration example on breast cancer dataset
Jan 28, 2018
091dd37
fix docstring
Jan 30, 2018
dc94b52
rename min_val_tpr/tnr to min_tpr/tnr
Jan 30, 2018
1b743d4
Merge branch 'master' of github.com:scikit-learn/scikit-learn into fe…
Jan 30, 2018
c3df7ae
update example doc
Feb 12, 2018
4f1b936
rm redundant example
Feb 12, 2018
09af8ae
remove @ignore_warning from tests
Feb 12, 2018
cb35eb9
simplify min distance point calculation
Feb 12, 2018
9fa322f
Merge branch 'master' into feat/8614_add_threshold_calibration_wrapper
Feb 12, 2018
8214fcd
add docstring for predict
Feb 12, 2018
73ab4c9
remove unused import
Feb 12, 2018
11697e9
move validation in the beginning of fit
Feb 13, 2018
d91aa35
enable cutoff point estimation on decision_function
Feb 17, 2018
4d36f4e
fix docstring
Feb 17, 2018
45c2d4f
make naming consistent
Feb 17, 2018
d4d406b
fix flake8
Feb 17, 2018
329ce49
Merge branch 'master' into feat/8614_add_threshold_calibration_wrapper
Feb 17, 2018
c862de4
fix lgtm
Feb 17, 2018
6937714
extend validation checks for scoring param
Feb 18, 2018
59888aa
fix docstring
Feb 18, 2018
f9eaa66
change signature of _get_binary_score to be consistent
Feb 18, 2018
eff199b
Merge branch 'master' into feat/8614_add_threshold_calibration_wrapper
Mar 18, 2018
47d8d2c
fix docstring
Mar 18, 2018
59e97cf
rename threshold_ to decision_threshold_
Mar 18, 2018
93ee091
replace params min_tnr, min_tpr with param threshold
Mar 18, 2018
26b934c
fix example
Mar 18, 2018
7c1326c
add support for f_beta
Apr 8, 2018
6ff615e
update docstring
Apr 8, 2018
c658af4
Merge branch 'master' into feat/8614_add_threshold_calibration_wrapper
Apr 8, 2018
52fdb91
update example
Apr 29, 2018
02b61ea
fix docstring
Apr 29, 2018
73a3609
add user guide
Apr 29, 2018
9df6297
Merge branch 'master' into feat/8614_add_threshold_calibration_wrapper
Apr 29, 2018
e274c74
fix doc
Apr 29, 2018
193d7c7
fix example image
Apr 29, 2018
ef37050
fix doc example
Apr 29, 2018
1896a22
fix image
Apr 29, 2018
ac0bf60
fix example
Apr 29, 2018
00e110b
improve docs and example
Apr 30, 2018
549b186
rename scoring param to strategy
Apr 30, 2018
bb6afee
change order of parameters
Apr 30, 2018
404ac21
use positional args
Apr 30, 2018
595247e
avoid backslash
Apr 30, 2018
ec19891
use np.mean instead of sum / n
Apr 30, 2018
9eb3f16
fix flake
Apr 30, 2018
0aa70d4
remove backslash
Apr 30, 2018
9d7f861
fix beta check
Apr 30, 2018
2d5aca5
extend fbeta asserts
Apr 30, 2018
9218c0b
fix negative label case
May 1, 2018
fab9022
use string formatting
May 1, 2018
e2bf019
get rid of backslashes
May 1, 2018
febf970
update docstrings
May 1, 2018
d7c5943
minor update in error handlings
May 1, 2018
9601e2c
add standard deviation as diagnostic
May 1, 2018
3916f01
dump commit to re-trigger build
May 1, 2018
d61977e
fix docs
May 1, 2018
72c954e
update docs & example & rename param
May 2, 2018
9260a5f
fix typo
May 2, 2018
7692711
fix doctest
May 2, 2018
d3a3f3b
fix typo
May 10, 2018
d44ae62
fix docstring
May 10, 2018
17b9eec
check decision threshold and std
May 10, 2018
a0f00ec
replace assert helpers
May 10, 2018
f8a07a2
update docstring
May 10, 2018
ab6e3fd
add classes_
May 10, 2018
67af28a
make std None in prefit case
May 10, 2018
4e1f70a
test for std not None
May 10, 2018
3b6d9fb
Merge branch 'master' into feat/8614_add_threshold_calibration_wrapper
Oct 7, 2018
d1228dd
use test_size in train_test_split
Oct 7, 2018
8315204
change solver to liblinear
Oct 8, 2018
a995da6
remove unused import
Oct 8, 2018
e36a892
fix n_estimators for RF
Oct 8, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 122 additions & 1 deletion doc/modules/calibration.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _calibration:
.. _probability_calibration:

=======================
Probability calibration
Expand Down Expand Up @@ -208,3 +208,124 @@ a similar decrease in log-loss.
.. [5] On the combination of forecast probabilities for
consecutive precipitation periods. Wea. Forecasting, 5, 640–650.,
Wilks, D. S., 1990a


.. _decision_threshold_calibration:

==============================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will mean that the page has two top-level headings. Rather, at the top of the page, create a heading of this level called "Prediction calibration" and then change the heading level of "Probability Calibration" and "Decision Threshold calibration" to fall under it.

Decision Threshold calibration
==============================

.. currentmodule:: sklearn.calibration

Often Machine Learning classifiers base their
predictions on real-valued decision functions or probability estimates that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that" -> ". These"

Although I think we might land up rewriting some of this. I think these three paragraphs would be clearer with something like:
"Binary and multilabel classifiers often choose the class by thresholding a real-valued decision function. The default threshold may not be well calibrated for maximizing some specific evaluation metric. :class:CutoffClassifier can wrap a binary classifier to calibrate a task-appropriate decision threshold.
"

carry the inherited biases of their models. Additionally when using a machine
learning model the evaluation criteria can differ from the optimisation
objectives used by the model during training.

When predicting between two classes it is commonly advised that an appropriate
decision threshold is estimated based on some cutoff criteria rather than
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what "cutoff criteria" are as distinct from "decision threshold"

arbitrarily using the midpoint of the space of possible values. Estimating a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not entirely arbitrary. I don't really think that first sentence says a lot.

decision threshold for a specific use case can help to increase the overall
accuracy of the model and provide better handling for sensitive classes.

.. currentmodule:: sklearn.calibration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is redundant


:class:`CutoffClassifier` can be used as a wrapper around a model for binary
classification to help obtain a more appropriate decision threshold and use it
for predicting new samples.

Usage
-----

To use the :class:`CutoffClassifier` you need to provide an estimator that has
a ``decision_function`` or a ``predict_proba`` method. The ``method``
parameter controls whether the first will be preferred over the second if both
are available.

The wrapped estimator can be pre-trained, in which case ``cv = 'prefit'``, or
not. If the classifier is not trained then a cross-validation loop specified by
the parameter ``cv`` can be used to obtain a decision threshold by averaging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right thing / a sensible thing to do? Do we have references for this? For CallibratedClassifierCV we just keep all the models and average them. We could do the same here. That might make more sense, but I don't know of any literature. Have you done any experiments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What experiments do you have in mind? Not more than running evaluating classifiers prediction accuracy, tpr, tnr given the input parameters of the cutoffclassifier. Literature wise I didn't find anythings related to cross validation and cutoff point. But maybe I haven't searched enough.

For CallibratedClassifierCV we just keep all the models and average them

do suggest to instead of keeping the decision threshold to keep all the underlying trained models and combine / average their predictions ? what would be the combining criteria for the predictions in this case? just a mean of the predictions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it would be less obvious how to combine, but voting would be possible. Maybe some experiments that confirm that the current implementation is a sensible thing to do and works in practice? I.e. averaging the thresholds makes it better, not worse, than a single hold-out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. averaging the thresholds makes it better, not worse, than a single hold-out?

I've seen this in practice. My understanding was that the cv approach is also a way to obtain a threshold on your training data without worrying that it's overfit. Didn't expect it to necessarily improve the threshold or significantly.

So what would be the purpose of keeping the underlying models? allowing the user to combine them whatever way they want?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that I haven't seen it in practice and I don't know of any write-up that says it's happening in practice so I'd like to be convinced ;)
And yes, it's a way to avoid overfitting. But another way would be to use a single split, and using a single split "obviously works" in some sense, but it's not entirely clear to me that averaging leads to a meaningful result.

all decision thresholds calculated on the hold-out parts of each cross
validation iteration. Finally the model is trained on all the provided data.
When using ``cv = 'prefit'`` you need to make sure to use a hold-out part of
your data for calibration.

The strategies, controlled by the parameter ``strategy``, for finding
appropriate decision thresholds are based either on precision recall estimates
or true positive and true negative rates. Specifically:

.. currentmodule:: sklearn.metrics

* ``f_beta``
selects a decision threshold that maximizes the :func:`fbeta_score`. The
value of beta is specified by the parameter ``beta``. The ``beta`` parameter
determines the weight of precision. When ``beta = 1`` both precision recall
get the same weight therefore the maximization target in this case is the
:func:`f1_score`. if ``beta < 1`` more weight is given to precision whereas
if ``beta > 1`` more weight is given to recall.

* ``roc``
selects the decision threshold for the point on the :func:`roc_curve` that
is closest to the ideal corner (0, 1)

* ``max_tpr``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to think about how to avoid confusion with the roc_auc_score parameter, max_fpr which means "only get the area under a section of the curve below a specified maximum false positive rate", whereas here, max_* means "maximise * while constraining its counterpart"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. confusing indeed, what if we renamed it to maximise_tpr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maximize_tpr, or just tpr?

selects the decision threshold for the point that yields the highest true
positive rate while maintaining a minimum true negative rate, specified by
the parameter ``threshold``

* ``max_tnr``
selects the decision threshold for the point that yields the highest true
negative rate while maintaining a minimum true positive rate, specified by
the parameter ``threshold``

Here is a simple usage example::

>>> from sklearn.calibration import CutoffClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.metrics import precision_score
>>> from sklearn.model_selection import train_test_split

>>> X, y = load_breast_cancer(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(
... X, y, train_size=0.6, random_state=42)
>>> clf = CutoffClassifier(GaussianNB(), strategy='f_beta', beta=0.6,
... cv=3).fit(X_train, y_train)
>>> y_pred = clf.predict(X_test)
>>> precision_score(y_test, y_pred) # doctest: +ELLIPSIS
0.959...

.. topic:: Examples:

* :ref:`sphx_glr_auto_examples_calibration_plot_decision_threshold_calibration.py`: Decision
threshold calibration on the breast cancer dataset

.. currentmodule:: sklearn.calibration

The following image shows the results of using the :class:`CutoffClassifier`
for finding a decision threshold for a :class:`LogisticRegression` classifier
and an :class:`AdaBoostClassifier` for two use cases.

.. figure:: ../auto_examples/calibration/images/sphx_glr_plot_decision_threshold_calibration_001.png
:target: ../auto_examples/calibration/plot_decision_threshold_calibration.html
:align: center

In the first case we want to increase the overall accuracy of the classifier on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if this is described inside the example file, we do not need to repeat it here.

the breast cancer dataset. In the second case we want to find a decision
threshold that yields maximum true positive rate while maintaining a minimum
value for the true negative rate.

.. topic:: References:

* Receiver-operating characteristic (ROC) plots: a fundamental
evaluation tool in clinical medicine, MH Zweig, G Campbell -
Clinical chemistry, 1993

Notes
-----

Calibrating the decision threshold of a classifier does not guarantee increased
performance. The generalisation ability of the obtained decision threshold has
to be evaluated.
23 changes: 21 additions & 2 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Functions
set_config
show_versions

.. _calibration_ref:
.. _probability_calibration_ref:

:mod:`sklearn.calibration`: Probability Calibration
===================================================
Expand All @@ -59,7 +59,7 @@ Functions
:no-members:
:no-inherited-members:

**User guide:** See the :ref:`calibration` section for further details.
**User guide:** See the :ref:`probability_calibration` section for further details.

.. currentmodule:: sklearn

Expand All @@ -76,6 +76,25 @@ Functions

calibration.calibration_curve

.. _decision_threshold_calibration_ref:

:mod:`sklearn.calibration`: Decision Threshold Calibration
==========================================================

.. automodule:: sklearn.calibration
:no-members:
:no-inherited-members:

**User guide:** See the :ref:`decision_threshold_calibration` section for further details.

.. currentmodule:: sklearn

.. autosummary::
:toctree: generated/
:template: class.rst

calibration.CutoffClassifier

.. _cluster_ref:

:mod:`sklearn.cluster`: Clustering
Expand Down
2 changes: 1 addition & 1 deletion examples/calibration/README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
Calibration
-----------------------

Examples illustrating the calibration of predicted probabilities of classifiers.
Examples concerning the :mod:`sklearn.calibration` module.
167 changes: 167 additions & 0 deletions examples/calibration/plot_decision_threshold_calibration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
"""
======================================================================
Decision threshold (cutoff point) calibration on breast cancer dataset
======================================================================

Machine learning classifiers often base their predictions on real-valued
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be more succinct. The user guide says most of this.

decision functions that don't always have accuracy as their objective. Moreover
the learning objective of a model can differ from the user's needs hence using
an arbitrary decision threshold as defined by the model can be not ideal.

The CutoffClassifier can be used to calibrate the decision threshold of a model
in order to increase the classifier's trustworthiness. Optimization objectives
during the decision threshold calibration can be the true positive and / or
the true negative rate as well as the f beta score.

In this example the decision threshold calibration is applied on two
classifiers trained on the breast cancer dataset. The goal in the first case is
to maximize the f1 score of the classifiers whereas in the second the goal is
to maximize the true positive rate while maintaining a minimum true negative
rate.

As you can see after calibration the f1 score of the LogisticRegression
classifiers has increased slightly whereas the accuracy of the
AdaBoostClassifier classifier has stayed the same.

For the second goal as seen after calibration both classifiers achieve better
true positive rate while their respective true negative rates have decreased
slightly or remained stable.
"""

# Author: Prokopios Gryllos <prokopis.gryllos@sentiance.com>
#
# License: BSD 3 clause

from __future__ import division

import numpy as np

from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import confusion_matrix, f1_score
from sklearn.calibration import CutoffClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split


print(__doc__)

# percentage of the training set that will be used for calibration
calibration_samples_percentage = 0.2

X, y = load_breast_cancer(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.6,
random_state=42)

calibration_samples = int(len(X_train) * calibration_samples_percentage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use train_test_split again to get the calibration samples?


lr = LogisticRegression().fit(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use lists and loops rather than this repetitive code and hard-to-read variable names like "f_one_lr_f_beta". This should rather be in some record-based structure (dicts? arrays? I don't mind) with strategy='f1', estimator='logistic', f1=value.

X_train[:-calibration_samples], y_train[:-calibration_samples])

y_pred_lr = lr.predict(X_test)
tn_lr, fp_lr, fn_lr, tp_lr = confusion_matrix(y_test, y_pred_lr).ravel()
tpr_lr = tp_lr / (tp_lr + fn_lr)
tnr_lr = tn_lr / (tn_lr + fp_lr)
f_one_lr = f1_score(y_test, y_pred_lr)

ada = AdaBoostClassifier().fit(
X_train[:-calibration_samples], y_train[:-calibration_samples])

y_pred_ada = ada.predict(X_test)
tn_ada, fp_ada, fn_ada, tp_ada = confusion_matrix(y_test, y_pred_ada).ravel()
tpr_ada = tp_ada / (tp_ada + fn_ada)
tnr_ada = tn_ada / (tn_ada + fp_ada)
f_one_ada = f1_score(y_test, y_pred_ada)

# objective 1: we want to calibrate the decision threshold in order to achieve
# better f1 score
lr_f_beta = CutoffClassifier(
lr, strategy='f_beta', method='predict_proba', beta=1, cv='prefit').fit(
X_train[calibration_samples:], y_train[calibration_samples:])

y_pred_lr_f_beta = lr_f_beta.predict(X_test)
f_one_lr_f_beta = f1_score(y_test, y_pred_lr_f_beta)

ada_f_beta = CutoffClassifier(
ada, strategy='f_beta', method='predict_proba', beta=1, cv='prefit'
).fit(X_train[calibration_samples:], y_train[calibration_samples:])

y_pred_ada_f_beta = ada_f_beta.predict(X_test)
f_one_ada_f_beta = f1_score(y_test, y_pred_ada_f_beta)

# objective 2: we want to maximize the true positive rate while the true
# negative rate is at least 0.7
lr_max_tpr = CutoffClassifier(
lr, strategy='max_tpr', method='predict_proba', threshold=0.7, cv='prefit'
).fit(X_train[calibration_samples:], y_train[calibration_samples:])

y_pred_lr_max_tpr = lr_max_tpr.predict(X_test)
tn_lr_max_tpr, fp_lr_max_tpr, fn_lr_max_tpr, tp_lr_max_tpr = \
confusion_matrix(y_test, y_pred_lr_max_tpr).ravel()
tpr_lr_max_tpr = tp_lr_max_tpr / (tp_lr_max_tpr + fn_lr_max_tpr)
tnr_lr_max_tpr = tn_lr_max_tpr / (tn_lr_max_tpr + fp_lr_max_tpr)

ada_max_tpr = CutoffClassifier(
ada, strategy='max_tpr', method='predict_proba', threshold=0.7, cv='prefit'
).fit(X_train[calibration_samples:], y_train[calibration_samples:])

y_pred_ada_max_tpr = ada_max_tpr.predict(X_test)
tn_ada_max_tpr, fp_ada_max_tpr, fn_ada_max_tpr, tp_ada_max_tpr = \
confusion_matrix(y_test, y_pred_ada_max_tpr).ravel()
tpr_ada_max_tpr = tp_ada_max_tpr / (tp_ada_max_tpr + fn_ada_max_tpr)
tnr_ada_max_tpr = tn_ada_max_tpr / (tn_ada_max_tpr + fp_ada_max_tpr)

print('Calibrated threshold')
print('Logistic Regression classifier: {}'.format(
lr_max_tpr.decision_threshold_))
print('AdaBoost classifier: {}'.format(ada_max_tpr.decision_threshold_))
print('before calibration')
print('Logistic Regression classifier: tpr = {}, tnr = {}, f1 = {}'.format(
tpr_lr, tnr_lr, f_one_lr))
print('AdaBoost classifier: tpr = {}, tpn = {}, f1 = {}'.format(
tpr_ada, tnr_ada, f_one_ada))

print('true positive and true negative rates after calibration')
print('Logistic Regression classifier: tpr = {}, tnr = {}, f1 = {}'.format(
tpr_lr_max_tpr, tnr_lr_max_tpr, f_one_lr_f_beta))
print('AdaBoost classifier: tpr = {}, tnr = {}, f1 = {}'.format(
tpr_ada_max_tpr, tnr_ada_max_tpr, f_one_ada_f_beta))

#########
# plots #
#########
bar_width = 0.2

plt.subplot(2, 1, 1)
index = np.asarray([1, 2])
plt.bar(index, [f_one_lr, f_one_ada], bar_width, color='r',
label='Before calibration')

plt.bar(index + bar_width, [f_one_lr_f_beta, f_one_ada_f_beta], bar_width,
color='b', label='After calibration')

plt.xticks(index + bar_width / 2, ('f1 logistic', 'f1 adaboost'))

plt.ylabel('scores')
plt.title('f1 score')
plt.legend(bbox_to_anchor=(.5, -.2), loc='center', borderaxespad=0.)

plt.subplot(2, 1, 2)
index = np.asarray([1, 2, 3, 4])
plt.bar(index, [tpr_lr, tnr_lr, tpr_ada, tnr_ada],
bar_width, color='r', label='Before calibration')

plt.bar(index + bar_width,
[tpr_lr_max_tpr, tnr_lr_max_tpr, tpr_ada_max_tpr, tnr_ada_max_tpr],
bar_width, color='b', label='After calibration')

plt.xticks(
index + bar_width / 2,
('tpr logistic', 'tnr logistic', 'tpr adaboost', 'tnr adaboost'))
plt.ylabel('scores')
plt.title('true positive & true negative rate')

plt.subplots_adjust(hspace=0.6)
plt.show()
Loading