scikit-learn · PGryllos · Nov 12, 2017 · Nov 12, 2017 · Nov 12, 2017 · Nov 12, 2017
diff --git a/doc/modules/calibration.rst b/doc/modules/calibration.rst
@@ -1,4 +1,4 @@
-.. _calibration:
+.. _probability_calibration:
 
 =======================
 Probability calibration
@@ -208,3 +208,124 @@ a similar decrease in log-loss.
     .. [5] On the combination of forecast probabilities for
            consecutive precipitation periods. Wea. Forecasting, 5, 640–650.,
            Wilks, D. S., 1990a
+
+
+.. _decision_threshold_calibration:
+
+==============================
+Decision Threshold calibration
+==============================
+
+.. currentmodule:: sklearn.calibration
+
+Often Machine Learning classifiers base their
+predictions on real-valued decision functions or probability estimates that
+carry the inherited biases of their models. Additionally when using a machine
+learning model the evaluation criteria can differ from the optimisation
+objectives used by the model during training.
+
+When predicting between two classes it is commonly advised that an appropriate
+decision threshold is estimated based on some cutoff criteria rather than
+arbitrarily using the midpoint of the space of possible values. Estimating a
+decision threshold for a specific use case can help to increase the overall
+accuracy of the model and provide better handling for sensitive classes.
+
+.. currentmodule:: sklearn.calibration
+
+:class:`CutoffClassifier` can be used as a wrapper around a model for binary
+classification to help obtain a more appropriate decision threshold and use it
+for predicting new samples.
+
+Usage
+-----
+
+To use the :class:`CutoffClassifier` you need to provide an estimator that has
+a ``decision_function`` or a ``predict_proba`` method. The ``method``
+parameter controls whether the first will be preferred over the second if both
+are available.
+
+The wrapped estimator can be pre-trained, in which case ``cv = 'prefit'``, or
+not. If the classifier is not trained then a cross-validation loop specified by
+the parameter ``cv`` can be used to obtain a decision threshold by averaging
+all decision thresholds calculated on the hold-out parts of each cross
+validation iteration. Finally the model is trained on all the provided data.
+When using ``cv = 'prefit'`` you need to make sure to use a hold-out part of
+your data for calibration.
+
+The strategies, controlled by the parameter ``strategy``, for finding
+appropriate decision thresholds are based either on precision recall estimates
+or true positive and true negative rates. Specifically:
+
+.. currentmodule:: sklearn.metrics
+
+* ``f_beta``
+   selects a decision threshold that maximizes the :func:`fbeta_score`. The
+   value of beta is specified by the parameter ``beta``. The ``beta`` parameter
+   determines the weight of precision. When ``beta = 1`` both precision recall
+   get the same weight therefore the maximization target in this case is the
+   :func:`f1_score`. if ``beta < 1`` more weight is given to precision whereas
+   if ``beta > 1`` more weight is given to recall.
+
+* ``roc``
+   selects the decision threshold for the point on the :func:`roc_curve` that
+   is closest to the ideal corner (0, 1)
+
+* ``max_tpr``
+   selects the decision threshold for the point that yields the highest true
+   positive rate while maintaining a minimum true negative rate, specified by
+   the parameter ``threshold``
+
+* ``max_tnr``
+   selects the decision threshold for the point that yields the highest true
+   negative rate while maintaining a minimum true positive rate, specified by
+   the parameter ``threshold``
+
+Here is a simple usage example::
+
+   >>> from sklearn.calibration import CutoffClassifier
+   >>> from sklearn.datasets import load_breast_cancer
+   >>> from sklearn.naive_bayes import GaussianNB
+   >>> from sklearn.metrics import precision_score
+   >>> from sklearn.model_selection import train_test_split
+
+   >>> X, y = load_breast_cancer(return_X_y=True)
+   >>> X_train, X_test, y_train, y_test = train_test_split(
+   ...     X, y, train_size=0.6, random_state=42)
+   >>> clf = CutoffClassifier(GaussianNB(), strategy='f_beta', beta=0.6,
+   ...                        cv=3).fit(X_train, y_train)
+   >>> y_pred = clf.predict(X_test)
+   >>> precision_score(y_test, y_pred)                   # doctest: +ELLIPSIS
+   0.959...
+
+.. topic:: Examples:
+
+ * :ref:`sphx_glr_auto_examples_calibration_plot_decision_threshold_calibration.py`: Decision
+   threshold calibration on the breast cancer dataset
+
+.. currentmodule:: sklearn.calibration
+
+The following image shows the results of using the :class:`CutoffClassifier`
+for finding a decision threshold for a :class:`LogisticRegression` classifier
+and an :class:`AdaBoostClassifier` for two use cases.
+
+.. figure:: ../auto_examples/calibration/images/sphx_glr_plot_decision_threshold_calibration_001.png
+   :target: ../auto_examples/calibration/plot_decision_threshold_calibration.html
+   :align: center
+
+In the first case we want to increase the overall accuracy of the classifier on
+the breast cancer dataset. In the second case we want to find a decision
+threshold that yields maximum true positive rate while maintaining a minimum
+value for the true negative rate.
+
+.. topic:: References:
+
+    * Receiver-operating characteristic (ROC) plots: a fundamental
+      evaluation tool in clinical medicine, MH Zweig, G Campbell -
+      Clinical chemistry, 1993
+
+Notes
+-----
+
+Calibrating the decision threshold of a classifier does not guarantee increased
+performance. The generalisation ability of the obtained decision threshold has
+to be evaluated.
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -50,7 +50,7 @@ Functions
    set_config
    show_versions
 
-.. _calibration_ref:
+.. _probability_calibration_ref:
 
 :mod:`sklearn.calibration`: Probability Calibration
 ===================================================
@@ -59,7 +59,7 @@ Functions
    :no-members:
    :no-inherited-members:
 
-**User guide:** See the :ref:`calibration` section for further details.
+**User guide:** See the :ref:`probability_calibration` section for further details.
 
 .. currentmodule:: sklearn
 
@@ -76,6 +76,25 @@ Functions
 
    calibration.calibration_curve
 
+.. _decision_threshold_calibration_ref:
+
+:mod:`sklearn.calibration`: Decision Threshold Calibration
+==========================================================
+
+.. automodule:: sklearn.calibration
+      :no-members:
+      :no-inherited-members:
+
+**User guide:** See the :ref:`decision_threshold_calibration` section for further details.
+
+.. currentmodule:: sklearn
+
+.. autosummary::
+      :toctree: generated/
+      :template: class.rst
+
+      calibration.CutoffClassifier
+
 .. _cluster_ref:
 
 :mod:`sklearn.cluster`: Clustering

diff --git a/examples/calibration/README.txt b/examples/calibration/README.txt
@@ -3,4 +3,4 @@
 Calibration
 -----------------------
 
-Examples illustrating the calibration of predicted probabilities of classifiers.
+Examples concerning the :mod:`sklearn.calibration` module.
diff --git a/examples/calibration/plot_decision_threshold_calibration.py b/examples/calibration/plot_decision_threshold_calibration.py
@@ -0,0 +1,167 @@
+"""
+======================================================================
+Decision threshold (cutoff point) calibration on breast cancer dataset
+======================================================================
+
+Machine learning classifiers often base their predictions on real-valued
+decision functions that don't always have accuracy as their objective. Moreover
+the learning objective of a model can differ from the user's needs hence using
+an arbitrary decision threshold as defined by the model can be not ideal.
+
+The CutoffClassifier can be used to calibrate the decision threshold of a model
+in order to increase the classifier's trustworthiness. Optimization objectives
+during the decision threshold calibration can be the true positive and / or
+the true negative rate as well as the f beta score.
+
+In this example the decision threshold calibration is applied on two
+classifiers trained on the breast cancer dataset. The goal in the first case is
+to maximize the f1 score of the classifiers whereas in the second the goal is
+to maximize the true positive rate while maintaining a minimum true negative
+rate.
+
+As you can see after calibration the f1 score of the LogisticRegression
+classifiers has increased slightly whereas the accuracy of the
+AdaBoostClassifier classifier has stayed the same.
+
+For the second goal as seen after calibration both classifiers achieve better
+true positive rate while their respective true negative rates have decreased
+slightly or remained stable.
+"""
+
+# Author: Prokopios Gryllos <prokopis.gryllos@sentiance.com>
+#
+# License: BSD 3 clause
+
+from __future__ import division
+
+import numpy as np
+
+from sklearn.ensemble import AdaBoostClassifier
+from sklearn.metrics import confusion_matrix, f1_score
+from sklearn.calibration import CutoffClassifier
+from sklearn.linear_model import LogisticRegression
+from sklearn.datasets import load_breast_cancer
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+
+
+print(__doc__)
+
+# percentage of the training set that will be used for calibration
+calibration_samples_percentage = 0.2
+
+X, y = load_breast_cancer(return_X_y=True)
+
+X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.6,
+                                                    random_state=42)
+
+calibration_samples = int(len(X_train) * calibration_samples_percentage)
+
+lr = LogisticRegression().fit(
+    X_train[:-calibration_samples], y_train[:-calibration_samples])
+
+y_pred_lr = lr.predict(X_test)
+tn_lr, fp_lr, fn_lr, tp_lr = confusion_matrix(y_test, y_pred_lr).ravel()
+tpr_lr = tp_lr / (tp_lr + fn_lr)
+tnr_lr = tn_lr / (tn_lr + fp_lr)
+f_one_lr = f1_score(y_test, y_pred_lr)
+
+ada = AdaBoostClassifier().fit(
+    X_train[:-calibration_samples], y_train[:-calibration_samples])
+
+y_pred_ada = ada.predict(X_test)
+tn_ada, fp_ada, fn_ada, tp_ada = confusion_matrix(y_test, y_pred_ada).ravel()
+tpr_ada = tp_ada / (tp_ada + fn_ada)
+tnr_ada = tn_ada / (tn_ada + fp_ada)
+f_one_ada = f1_score(y_test, y_pred_ada)
+
+# objective 1: we want to calibrate the decision threshold in order to achieve
+# better f1 score
+lr_f_beta = CutoffClassifier(
+    lr, strategy='f_beta', method='predict_proba', beta=1, cv='prefit').fit(
+    X_train[calibration_samples:], y_train[calibration_samples:])
+
+y_pred_lr_f_beta = lr_f_beta.predict(X_test)
+f_one_lr_f_beta = f1_score(y_test, y_pred_lr_f_beta)
+
+ada_f_beta = CutoffClassifier(
+    ada, strategy='f_beta', method='predict_proba', beta=1, cv='prefit'
+).fit(X_train[calibration_samples:], y_train[calibration_samples:])
+
+y_pred_ada_f_beta = ada_f_beta.predict(X_test)
+f_one_ada_f_beta = f1_score(y_test, y_pred_ada_f_beta)
+
+# objective 2: we want to maximize the true positive rate while the true
+# negative rate is at least 0.7
+lr_max_tpr = CutoffClassifier(
+    lr, strategy='max_tpr', method='predict_proba', threshold=0.7, cv='prefit'
+).fit(X_train[calibration_samples:], y_train[calibration_samples:])
+
+y_pred_lr_max_tpr = lr_max_tpr.predict(X_test)
+tn_lr_max_tpr, fp_lr_max_tpr, fn_lr_max_tpr, tp_lr_max_tpr = \
+    confusion_matrix(y_test, y_pred_lr_max_tpr).ravel()
+tpr_lr_max_tpr = tp_lr_max_tpr / (tp_lr_max_tpr + fn_lr_max_tpr)
+tnr_lr_max_tpr = tn_lr_max_tpr / (tn_lr_max_tpr + fp_lr_max_tpr)
+
+ada_max_tpr = CutoffClassifier(
+    ada, strategy='max_tpr', method='predict_proba', threshold=0.7, cv='prefit'
+).fit(X_train[calibration_samples:], y_train[calibration_samples:])
+
+y_pred_ada_max_tpr = ada_max_tpr.predict(X_test)
+tn_ada_max_tpr, fp_ada_max_tpr, fn_ada_max_tpr, tp_ada_max_tpr = \
+    confusion_matrix(y_test, y_pred_ada_max_tpr).ravel()
+tpr_ada_max_tpr = tp_ada_max_tpr / (tp_ada_max_tpr + fn_ada_max_tpr)
+tnr_ada_max_tpr = tn_ada_max_tpr / (tn_ada_max_tpr + fp_ada_max_tpr)
+
+print('Calibrated threshold')
+print('Logistic Regression classifier: {}'.format(
+    lr_max_tpr.decision_threshold_))
+print('AdaBoost classifier: {}'.format(ada_max_tpr.decision_threshold_))
+print('before calibration')
+print('Logistic Regression classifier: tpr = {}, tnr = {}, f1 = {}'.format(
+    tpr_lr, tnr_lr, f_one_lr))
+print('AdaBoost classifier: tpr = {}, tpn = {}, f1 = {}'.format(
+    tpr_ada, tnr_ada, f_one_ada))
+
+print('true positive and true negative rates after calibration')
+print('Logistic Regression classifier: tpr = {}, tnr = {}, f1 = {}'.format(
+    tpr_lr_max_tpr, tnr_lr_max_tpr, f_one_lr_f_beta))
+print('AdaBoost classifier: tpr = {}, tnr = {}, f1 = {}'.format(
+    tpr_ada_max_tpr, tnr_ada_max_tpr, f_one_ada_f_beta))
+
+#########
+# plots #
+#########
+bar_width = 0.2
+
+plt.subplot(2, 1, 1)
+index = np.asarray([1, 2])
+plt.bar(index, [f_one_lr, f_one_ada], bar_width, color='r',
+        label='Before calibration')
+
+plt.bar(index + bar_width, [f_one_lr_f_beta, f_one_ada_f_beta], bar_width,
+        color='b', label='After calibration')
+
+plt.xticks(index + bar_width / 2, ('f1 logistic', 'f1 adaboost'))
+
+plt.ylabel('scores')
+plt.title('f1 score')
+plt.legend(bbox_to_anchor=(.5, -.2), loc='center', borderaxespad=0.)
+
+plt.subplot(2, 1, 2)
+index = np.asarray([1, 2, 3, 4])
+plt.bar(index, [tpr_lr, tnr_lr, tpr_ada, tnr_ada],
+        bar_width, color='r', label='Before calibration')
+
+plt.bar(index + bar_width,
+        [tpr_lr_max_tpr, tnr_lr_max_tpr, tpr_ada_max_tpr, tnr_ada_max_tpr],
+        bar_width, color='b', label='After calibration')
+
+plt.xticks(
+    index + bar_width / 2,
+    ('tpr logistic', 'tnr logistic', 'tpr adaboost', 'tnr adaboost'))
+plt.ylabel('scores')
+plt.title('true positive & true negative rate')
+
+plt.subplots_adjust(hspace=0.6)
+plt.show()