FEA Add strategy isotonic to calibration curve #23824

lorentzenchr · 2022-07-03T07:29:15Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR adds strategy="isotonic" to calibration_curve and CalibrationDisplay.

Any other comments?

Reliability diagrams with (PAV algorithm) isotonic regression is the CORP approach of (https://doi.org/10.1073/pnas.2016191118).

lorentzenchr · 2022-07-03T09:25:03Z

Results

From the example of CalibrationDisplay.

import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibrationDisplay


X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=0)
clf = LogisticRegression(random_state=0)
clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
fig, ax = plt.subplots()
CalibrationDisplay.from_estimator(clf, X_test, y_test, ax=ax)
CalibrationDisplay.from_estimator(clf, X_test, y_test, ax=ax, strategy="isotonic")
ax.get_legend().get_texts()[1].set_text('LogisticRegression uniform')
ax.get_legend().get_texts()[2].set_text('LogisticRegression isotonic')

From https://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html#calibration-curves.

import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
from sklearn.calibration import CalibrationDisplay
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import LinearSVC


X, y = make_classification(
    n_samples=100_000, n_features=20, n_informative=2, n_redundant=2, random_state=42
)

train_samples = 100  # Samples used for training the models
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    shuffle=False,
    test_size=100_000 - train_samples,
)


class NaivelyCalibratedLinearSVC(LinearSVC):
    """LinearSVC with `predict_proba` method that naively scales
    `decision_function` output."""

    def fit(self, X, y):
        super().fit(X, y)
        df = self.decision_function(X)
        self.df_min_ = df.min()
        self.df_max_ = df.max()

    def predict_proba(self, X):
        """Min-max scale output of `decision_function` to [0,1]."""
        df = self.decision_function(X)
        calibrated_df = (df - self.df_min_) / (self.df_max_ - self.df_min_)
        proba_pos_class = np.clip(calibrated_df, 0, 1)
        proba_neg_class = 1 - proba_pos_class
        proba = np.c_[proba_neg_class, proba_pos_class]
        return proba


# Create classifiers
lr = LogisticRegression()
gnb = GaussianNB()
svc = NaivelyCalibratedLinearSVC(C=1.0)
rfc = RandomForestClassifier()

clf_list = [
    (lr, "Logistic"),
    (gnb, "Naive Bayes"),
    (svc, "SVC"),
    (rfc, "Random forest"),
]


fig = plt.figure(figsize=(10, 10))
gs = GridSpec(4, 2)
colors = plt.cm.get_cmap("Dark2")

ax_calibration_curve = fig.add_subplot(gs[:2, :2])
calibration_displays = {}
for i, (clf, name) in enumerate(clf_list):
    clf.fit(X_train, y_train)
    display = CalibrationDisplay.from_estimator(
        clf,
        X_test,
        y_test,
        n_bins=10,
        strategy="isotonic",
        name=name,
        ax=ax_calibration_curve,
        color=colors(i),
    )
    calibration_displays[name] = display

ax_calibration_curve.grid()
ax_calibration_curve.set_title("Calibration plots")
plt.show()

sklearn/calibration.py

lorentzenchr · 2022-08-25T15:50:18Z

@ogrisel @glemaitre You might be interested.

thomasjpfan

As you noted in #23132 (comment), the CORP paper does not meet our inclusion criterion. According to google scholar it has been cited 13 times.

If we can not include the method based on inclusion, then an alternative is to accept a callable here so it is simple to implement CORP:

def calibration_curve(...):
    ...
    elif callable(strategy):
        # n_bins to be flexible
        return strategy(y_prob, y_true, n_bins)

and strategy is:

def strategy(y_prob, y_true, n_bins):
    iso = IsotonicRegression(y_min=0, y_max=1).fit(y_prob, y_true)
    prob_true = iso.y_thresholds_
    prob_pred = iso.X_thresholds_
    return prob_true, prob_pred

Then we can update a calibration example to showcase passing a callable and using the CORP strategy.

lorentzenchr · 2022-08-25T18:54:26Z

@thomasjpfan The point is that isotonic regression is already included in scikit-learn, so why not use it? In particular, CalibratedClassifierCV is using it and is related to the same topic:
Another way of putting it: Plot CalibratedClassifierCV(clf, method="isotonic", cv="prefit").fit(X, y).predict(X) vs clf.prefit(X).
I see the paper more as a theoretical foundation as to why isotonic regression is good to use in reliability diagrams.

lorentzenchr · 2022-09-05T22:01:25Z

To give it more citation counts:

The same plots are in Figure 1, bottom line, of Alexandru Niculescu-Mizil & Rich Caruana (2005) "Predicting Good Probabilities With Supervised Learning".
Zadrozny & Elkan (2002) "Transforming classifier scores into accurate multiclass probability estimates"

lorentzenchr added 4 commits July 3, 2022 09:14

FEA add isotonic strategy to calibration curve

99d42c1

TST add test for isotonic calibration curve

c4be459

DOC add term conditional event probability

d2a7245

DOC add whatsnew

fead66d

lorentzenchr added Waiting for Reviewer module:calibration labels Jul 3, 2022

lucyleeow reviewed Jul 25, 2022

View reviewed changes

sklearn/calibration.py Show resolved Hide resolved

lorentzenchr added 3 commits August 10, 2022 21:30

DOC add versionadded

7d660fb

Merge branch 'main' into calibration_isotonic

8a80959

DOC fix merge conflict in whatsnew

34f77b7

lorentzenchr force-pushed the calibration_isotonic branch from aa0e0d6 to 34f77b7 Compare August 10, 2022 20:43

thomasjpfan reviewed Aug 25, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA Add strategy isotonic to calibration curve #23824

FEA Add strategy isotonic to calibration curve #23824

lorentzenchr commented Jul 3, 2022

lorentzenchr commented Jul 3, 2022

lorentzenchr commented Aug 25, 2022

thomasjpfan left a comment

lorentzenchr commented Aug 25, 2022

lorentzenchr commented Sep 5, 2022 •

edited

Loading

FEA Add strategy isotonic to calibration curve #23824

Are you sure you want to change the base?

FEA Add strategy isotonic to calibration curve #23824

Conversation

lorentzenchr commented Jul 3, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

lorentzenchr commented Jul 3, 2022

Results

lorentzenchr commented Aug 25, 2022

thomasjpfan left a comment

Choose a reason for hiding this comment

lorentzenchr commented Aug 25, 2022

lorentzenchr commented Sep 5, 2022 • edited Loading

lorentzenchr commented Sep 5, 2022 •

edited

Loading