Add PAV algorithm for calibration_curve/reliability diagrams #23132

lorentzenchr · 2022-04-14T14:52:13Z

Describe the workflow you want to enable

import numpy as np
from sklearn.calibration import calibration_curve

y_true = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1])
y_pred = np.array([0.1, 0.2, 0.3, 0.4, 0.65, 0.7, 0.8, 0.9,  1.])
prob_true, prob_pred = calibration_curve(y_true, y_pred, strategy="pav")

Describe your proposed solution

Add the strategy PAV as [1] and [2] to calibration_curve, there called CORP. This is basically applying isotonic regression as binning strategy, which we already have in scikit-learn.

[1] Dimitriadis, T., Gneiting, T., & Jordan, A.I. (2021). Stable reliability diagrams for probabilistic classifiers. Proceedings of the National Academy of Sciences of the United States of America, 118. https://doi.org/10.1073/pnas.2016191118
[2] https://cran.r-project.org/package=reliabilitydiag

Describe alternatives you've considered, if relevant

No response

Additional context

Given the recency of the paper, it has clearly not enough citations (yet). But I have the impression that this is a good strategy for a reliability diagrams with good theoretical and practical properties.

To my knowledge, this strategy is nowhere available in the Python universe as of now.

The text was updated successfully, but these errors were encountered:

lorentzenchr · 2022-04-14T14:56:28Z

@aijordan For your information.

ogrisel · 2022-04-20T08:03:22Z

It's possible that using Centered Isotonic Regression (#21454) would make the reliability diagram look even better but might break the theoretical results of the paper you linked above.

lorentzenchr · 2022-08-25T15:48:46Z

@ogrisel As pointed out in #21454, centered isotonic regression seems to be an invalid option because it is not calibrated itself which is a crucial property of (standard) isotonic regression, in particular important for assessing (auto-) calibration of a model as is done in a reliability diagram.

lorentzenchr · 2023-12-10T17:07:01Z

Posted by @ogrisel in #23767 (comment)

My main concern with the CORP reliability diagrams is that they are very square-looking on finite size test sets, see for instance:

Those diagrams would probably qualitatively look very different in the asymptotic regime of large test sets.

To avoid this finite test sample artifact, we might also what to consider methods such as implemented in:

https://github.com/apple/ml-calibration

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing
Jarosław Błasiok, Preetum Nakkiran
https://arxiv.org/abs/2309.12236

lorentzenchr · 2023-12-10T18:02:54Z

@ogrisel

My main concern with the CORP reliability diagrams is that they are very square-looking on finite size

As Prof Simon Woods says:

Statistics is the honest interpretation of data

The reliability diagram is a statistical diagnostics/verification tool and not something that needs to be pleasant for the eye, but easy to interpret.
The zigzag is a mere consequence of the underlying (small sample) uncertainty in the estimation of $E[Y|prediction]$.

BTW, I never understood why the PAV (=CORP) approach is good enough for calibration of classifiers (modifies actual model predictions), but not good enough in a reliability diagram (diagnostics) - within scikit-learn 🤨

ogrisel · 2023-12-10T18:10:54Z

The reliability diagram is a statistical diagnostics/verification tool and not something that needs to be pleasant for the eye, but easy to interpret.

It's not a matter of being pleasing to the eye but a matter about being misleading about the shape of the asymptotic curve. The asymptotic curve will be smooth most of the time, and the CORP finite sample size estimate can induce the reader into thinking otherwise, which I find misleading and a potential source of confusion for our users.

BTW, I never understood why the PAV (=CORP) approach is good enough for calibration of classifiers (modifies actual model predictions), but not good enough in a reliability diagram (diagnostics) - within scikit-learn

I actually have the exact same concern with isotonic regression as a post hoc calibrator. I would much rather use the centered isotonic regression as post hoc classifier which is mostly strictly monotonic (and as a result would not introduce an unexpected change to pure ranking metrics such as ROC AUC / Gini index) and converges to the same solution as the step-wise constant calibrator in the large sample size.

lorentzenchr · 2023-12-11T06:25:45Z

The asymptotic curve will be smooth most of the time

That‘s not correct. For instance, tree based models or GLMs with categorical features are not smooth in the predictions.

ogrisel · 2023-12-11T13:27:42Z

That‘s not correct. For instance, tree based models or GLMs with categorical features are not smooth in the predictions.

Indeed that might be the case. Although depending on the size of the training set, I suspect that they are still be much smoother than what the CORP reliability diagram suggests. To settle this debate we will need some experiments with a few large datasets we we can subsample both the training set and the test set used to estimate the reliability curve and compare the small test sample CORP/smoothed reliability curves to the CORP curve on the full test set.

We could also have a reliability diagram with a user settable option to decide what strategy they want to use (fixed binning as we do now, CORP induced bin edges and some smooth estimate). Still comparing the methods on a few canonical datasets would help us make informed recommendations in the docstring of that parameter.

lorentzenchr added New Feature Needs Triage Issue requires triage module:calibration Needs Decision - Include Feature Requires decision regarding including feature labels Apr 14, 2022

lorentzenchr removed the Needs Triage Issue requires triage label Apr 14, 2022

ogrisel mentioned this issue May 17, 2022

Calibration and Refinement loss for Brier score loss #21774

Open

This was referenced Jun 27, 2022

Additive Score/Metric decomposition into Miscalibration, Discrimination and Uncertainty #23767

Open

FEA Add strategy isotonic to calibration curve #23824

Open

glemaitre added this to Visualization and displays May 17, 2024

glemaitre moved this to Discussion in Visualization and displays May 17, 2024

lorentzenchr mentioned this issue Jan 24, 2025

UX CalibrationDisplay's naive use can lead to very confusing results #30664

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PAV algorithm for calibration_curve/reliability diagrams #23132

Add PAV algorithm for calibration_curve/reliability diagrams #23132

lorentzenchr commented Apr 14, 2022 •

edited

Loading

lorentzenchr commented Apr 14, 2022

ogrisel commented Apr 20, 2022

lorentzenchr commented Aug 25, 2022

lorentzenchr commented Dec 10, 2023

lorentzenchr commented Dec 10, 2023

ogrisel commented Dec 10, 2023 •

edited

Loading

lorentzenchr commented Dec 11, 2023

ogrisel commented Dec 11, 2023 •

edited

Loading

Add PAV algorithm for calibration_curve/reliability diagrams #23132

Add PAV algorithm for calibration_curve/reliability diagrams #23132

Comments

lorentzenchr commented Apr 14, 2022 • edited Loading

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

lorentzenchr commented Apr 14, 2022

ogrisel commented Apr 20, 2022

lorentzenchr commented Aug 25, 2022

lorentzenchr commented Dec 10, 2023

lorentzenchr commented Dec 10, 2023

ogrisel commented Dec 10, 2023 • edited Loading

lorentzenchr commented Dec 11, 2023

ogrisel commented Dec 11, 2023 • edited Loading

lorentzenchr commented Apr 14, 2022 •

edited

Loading

ogrisel commented Dec 10, 2023 •

edited

Loading

ogrisel commented Dec 11, 2023 •

edited

Loading