-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA Add strategy isotonic to calibration curve #23824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ResultsFrom the example of import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibrationDisplay
X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
X, y, random_state=0)
clf = LogisticRegression(random_state=0)
clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
fig, ax = plt.subplots()
CalibrationDisplay.from_estimator(clf, X_test, y_test, ax=ax)
CalibrationDisplay.from_estimator(clf, X_test, y_test, ax=ax, strategy="isotonic")
ax.get_legend().get_texts()[1].set_text('LogisticRegression uniform')
ax.get_legend().get_texts()[2].set_text('LogisticRegression isotonic') From https://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html#calibration-curves. import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
from sklearn.calibration import CalibrationDisplay
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import LinearSVC
X, y = make_classification(
n_samples=100_000, n_features=20, n_informative=2, n_redundant=2, random_state=42
)
train_samples = 100 # Samples used for training the models
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
shuffle=False,
test_size=100_000 - train_samples,
)
class NaivelyCalibratedLinearSVC(LinearSVC):
"""LinearSVC with `predict_proba` method that naively scales
`decision_function` output."""
def fit(self, X, y):
super().fit(X, y)
df = self.decision_function(X)
self.df_min_ = df.min()
self.df_max_ = df.max()
def predict_proba(self, X):
"""Min-max scale output of `decision_function` to [0,1]."""
df = self.decision_function(X)
calibrated_df = (df - self.df_min_) / (self.df_max_ - self.df_min_)
proba_pos_class = np.clip(calibrated_df, 0, 1)
proba_neg_class = 1 - proba_pos_class
proba = np.c_[proba_neg_class, proba_pos_class]
return proba
# Create classifiers
lr = LogisticRegression()
gnb = GaussianNB()
svc = NaivelyCalibratedLinearSVC(C=1.0)
rfc = RandomForestClassifier()
clf_list = [
(lr, "Logistic"),
(gnb, "Naive Bayes"),
(svc, "SVC"),
(rfc, "Random forest"),
]
fig = plt.figure(figsize=(10, 10))
gs = GridSpec(4, 2)
colors = plt.cm.get_cmap("Dark2")
ax_calibration_curve = fig.add_subplot(gs[:2, :2])
calibration_displays = {}
for i, (clf, name) in enumerate(clf_list):
clf.fit(X_train, y_train)
display = CalibrationDisplay.from_estimator(
clf,
X_test,
y_test,
n_bins=10,
strategy="isotonic",
name=name,
ax=ax_calibration_curve,
color=colors(i),
)
calibration_displays[name] = display
ax_calibration_curve.grid()
ax_calibration_curve.set_title("Calibration plots")
plt.show() |
aa0e0d6
to
34f77b7
Compare
@ogrisel @glemaitre You might be interested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you noted in #23132 (comment), the CORP paper does not meet our inclusion criterion. According to google scholar it has been cited 13 times.
If we can not include the method based on inclusion, then an alternative is to accept a callable here so it is simple to implement CORP:
def calibration_curve(...):
...
elif callable(strategy):
# n_bins to be flexible
return strategy(y_prob, y_true, n_bins)
and strategy
is:
def strategy(y_prob, y_true, n_bins):
iso = IsotonicRegression(y_min=0, y_max=1).fit(y_prob, y_true)
prob_true = iso.y_thresholds_
prob_pred = iso.X_thresholds_
return prob_true, prob_pred
Then we can update a calibration example to showcase passing a callable and using the CORP strategy.
@thomasjpfan The point is that isotonic regression is already included in scikit-learn, so why not use it? In particular, |
To give it more citation counts:
|
Reference Issues/PRs
Fixes #23132.
What does this implement/fix? Explain your changes.
This PR adds
strategy="isotonic"
tocalibration_curve
andCalibrationDisplay
.Any other comments?
Reliability diagrams with (PAV algorithm) isotonic regression is the CORP approach of (https://doi.org/10.1073/pnas.2016191118).