DOC Expand on sigmoid and isotonic in calibration.rst #17725

lucyleeow · 2020-06-25T15:38:35Z

Reference Issues/PRs

Addresses: #16321 (comment)

What does this implement/fix? Explain your changes.

Expands on when to use sigmoid vs isotonic for calibration

(I added these points below in this PR as well but I am happy to remove/change if not appropriate in this PR)

Expands on why data used for fitting classifier should be different from data used for fitting calibrator
Moves some sections, as I thought section about using cv='prefit' belongs next the the section about cross-valdiation.
Expands on how CalibratedClassifierCV uses one-vs-the-rest to extend to multiclass
Add internal doc links
Adds links to papers referenced

Any other comments?

ping @NicolasHug

lucyleeow · 2020-06-25T16:04:07Z

I think adding some code examples would also be useful in calibration.rst - happy to do it here if appropriate.
Edit: I see examples have been added to the CalibratedClassifierCV docstring so code examples here is less important.

doc/modules/calibration.rst

NicolasHug · 2020-06-25T16:11:49Z

doc/modules/calibration.rst

+.. math::
+       \sum_i (y_i - f_i)^2
+
+subject to :math:`\f_i \le f_j`. This method is more general when compared to


The constraint is y_i < y_j whenever f_i < f_j

Though we should not use y_i: y_i is used before and corresponds to the true target (0 or 1).

I was confused at this part but I think y_i here does mean the true target.

from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180410/
(sorry for funny screenshot)

"subject to :math:\f_i \le f_j" is ambiguous as j is not defined. I think you should reproduce the full formula you quoted above only using f_i / f_{i+1} instead of the \tilde{s}_i / \tilde{s}_{i+1} notation.

ogrisel · 2020-06-26T07:44:44Z

doc/modules/calibration.rst

+The sigmoid method is biased in that it assumes the :ref:`calibration curve
+<calibration_curve>` of the un-calibrated model has a sigmoid shape [1]_. It
+is thus most effective when the un-calibrated model is over-confident.
+


Maybe we could also mention the fact that Platt scaling assumes symmetric calibration errors, that is, it assumes that the over-confidence errors for low values of f_i have the same magnitude as for high values of f_i. This is not necessarily the case for highly imbalanced classification problems where the un-calibrated classifier can have asymmetric calibration errors.

This is just an intuition (I have not run experiments to confirm this happens in practice) though.

You're right it's discussed here: https://projecteuclid.org/download/pdfview_1/euclid.ejs/1513306867

Thanks for the reference, Beta calibration looks very nice, I did not know about it. Unfortunately it does not meet the criterion for inclusion in scikit-learn in terms of citations but honestly I wouldn't mind considering a PR to add as a third option to CalibratedClassifierCV.

doc/modules/calibration.rst

ogrisel · 2020-06-26T09:47:20Z

doc/modules/calibration.rst

+       \sum_{i=1}^{n} (y_i - f_i)^2 : f_i \leq f_{i+1}\quad \forall i \{1,..., n-1\}
+
+where :math:`y_i` is the true label of sample :math:`i` and :math:`f_i`
+is the output of the classifier for sample :math:`i`. This method is more


Suggested change

is the output of the classifier for sample :math:`i`. This method is more

is the calibrated output of the classifier for sample :math:`i`. This method is more

...are you sure? I am confused now

this is the function that is minimized to find the isotonic function, so should be the output of the classifier..?

We're already using f_i above to define the output of the un-calibrated classifier.

The formula should be

\sum_{i=1}^{n} (y_i - \hat{f}_i)^2

where \hat{f} is as @ogrisel suggested (the calibrated probability)

And the constraint is that \hat{f}_i >= \hat{f}_j whenever f_i >= f_j

Indeed sorry for the confusion.

doc/modules/calibration.rst

ogrisel · 2020-06-26T12:14:11Z

doc/modules/calibration.rst

+.. math::
+       \sum_{i=1}^{n} (y_i - \hat{f}_i)^2
+
+subject to \hat{f}_i >= \hat{f}_j whenever f_i >= f_j. :math:`y_i` is the true


The math formatting is missing here: https://109917-843222-gh.circle-artifacts.com/0/doc/modules/calibration.html#isotonic

Suggested change

subject to \hat{f}_i >= \hat{f}_j whenever f_i >= f_j. :math:`y_i` is the true

subject to :math`:\hat{f}_i >= \hat{f}_j` whenever :math:`f_i >= f_j`. :math:`y_i` is the true

This paragraph will probably need to be wrapped. to avoid going beyond 80 chars.

Under vscode, I use https://marketplace.visualstudio.com/items?itemName=stkb.rewrap with the alt+q keyboard shortcut for this.

ogrisel

LGTM (assuming the circle ci output will be good after the latest commit). :)

lucyleeow · 2020-06-26T12:18:42Z

@ogrisel do you have any idea why my class linking (e.g., :class:`SGDClassifier`) is not appearing (as a link) in the build documentation?

NicolasHug · 2020-06-26T12:23:46Z

doc/modules/calibration.rst

@@ -11,16 +11,21 @@ When performing classification you often want not only to predict the class
 label, but also obtain a probability of the respective label. This probability
 gives you some kind of confidence on the prediction. Some models can give you
 poor estimates of the class probabilities and some even do not support
-probability prediction. The calibration module allows you to better calibrate
+probability prediction (e.g., :class:`SGDClassifier`). The calibration


you need the whole path unless there's a previous sphinx directive indicating the current module (which wouldn't be sklearn.linear_model anyway)

Suggested change

probability prediction (e.g., :class:`SGDClassifier`). The calibration

probability prediction (e.g., :class:`~sklearn.linear_model.SGDClassifier`). The calibration

ooohhhh thanks, that is good to know.

NicolasHug

thanks @lucyleeow this will be a nice addition to the UG!

some last comments from me

NicolasHug · 2020-07-02T19:08:02Z

doc/modules/calibration.rst

@@ -11,16 +11,21 @@ When performing classification you often want not only to predict the class
 label, but also obtain a probability of the respective label. This probability
 gives you some kind of confidence on the prediction. Some models can give you
 poor estimates of the class probabilities and some even do not support
-probability prediction. The calibration module allows you to better calibrate
+probability prediction (e.g., :class:`~sklearn.linear_model.SGDClassifier`).


Suggested change

probability prediction (e.g., :class:`~sklearn.linear_model.SGDClassifier`).

probability prediction (e.g., some instances of :class:`~sklearn.linear_model.SGDClassifier`).

NicolasHug · 2020-07-02T19:10:21Z

doc/modules/calibration.rst

@@ -85,22 +90,29 @@ Calibrating a classifier

 Calibrating a classifier consists in fitting a regressor (called a


consists of

(pretty sure that was from me :s)

NicolasHug · 2020-07-02T19:17:44Z

doc/modules/calibration.rst

-is calibrated first for each class separately in a one-vs-rest fashion [4]_.
-When predicting probabilities, the calibrated probabilities for each class
+:class:`CalibratedClassifierCV` supports the use of two 'calibration'
+regressors: 'sigmoid' and 'isotonic'. Both these regressors only


I would put the section about multiclass (from "Both these regressors" to "normalize them") into another subsesction e.g. "Multiclass support", once isotonic and sigmoid calibrators have been describe.

Otherwise the 2 subsections detailing isotonic and sigmoid are a bit abrupt. It would be more natural if they directely followed from ":class:CalibratedClassifierCV supports the use of two 'calibration' regressors: 'sigmoid' and 'isotonic'."

We can move the sentence about brier_score just before

Thanks for the suggestion, it works much better. I thought it didn't flow well but I wasn't sure how best to change.

NicolasHug · 2020-07-02T19:22:22Z

doc/modules/calibration.rst

+symmetrical [1]_. It is thus most effective when the un-calibrated model is
+under-confident and has similar errors for both high and low


my understanding is that we assume over-confidence for low probabilities and under-confidence for high probabilities (which are then compensated by the shape of the logistic function)?

also by "similar errors" do we mean errors with similar absolute values / magnitude?

Oh I am confused. I thought that a 'sigmoid' shape of calibration curve meant under-confident classifier and 'transposed-sigmoid' shape meant over-confident model. This is just from the example: https://scikit-learn.org/stable/auto_examples/calibration/plot_calibration_curve.html#sphx-glr-auto-examples-calibration-plot-calibration-curve-py

I think similar errors is in terms of the shape of the calibration curve - the sigmoid is symmetrical in shape. Since we are dealing with difference in predicted probability and frequency of true positives per bin, i would say similar absolute difference?

my understanding is that we assume over-confidence for low probabilities and under-confidence for high probabilities (which are then compensated by the shape of the logistic function)?

Ok I see how I was wrong here

Oh I am confused. I thought that a 'sigmoid' shape of calibration curve meant under-confident classifier and 'transposed-sigmoid' shape meant over-confident model. This is just from the example: scikit-learn.org/stable/auto_examples/calibration/plot_calibration_curve.html#sphx-glr-auto-examples-calibration-plot-calibration-curve-py

Now I'm confused too: from the example, it seems to me that NB (with a transposed sigmoid shape) is
under confident, while the LinearSVC (with a sigmoid shape) is overconfident: for example for the bin at 0.8, its predictions are close to 1, so I interpret this as being over-confident about the positive class. What am I getting wrong? Maybe @ogrisel can chime in?

rom the example, it seems to me that NB (with a transposed sigmoid shape) is
under confident, while the LinearSVC (with a sigmoid shape) is overconfident:

Yes you are right! I should have thought about this more. (I think @ogrisel will be on holiday from next week though...)

What am I getting wrong?

Ok so what I was getting wrong is that I was inverting the axes: the x axis is what the classifiers predicts and the y axis are the actual proportions. So indeed the sigmoid curve describes under-confidence and the example is correct.

Yes, okay. It is tricky to interpret. I guess you have to think about it from 0.5 and go up/down.

NicolasHug · 2020-07-02T19:23:22Z

doc/modules/calibration.rst

+subject to :math:`\hat{f}_i >= \hat{f}_j` whenever
+:math:`f_i >= f_j`. :math:`y_i` is the true
+label of sample :math:`i` and :math:`\hat{f}_i` is the output of the
+calibrated classifier for sample :math:`i`. This method


Suggested change

calibrated classifier for sample :math:`i`. This method

calibrated classifier for sample :math:`i`, i.e. the calibrated probability. This method

lucyleeow · 2020-07-03T11:22:37Z

Thanks for the review @NicolasHug. I expanded on brier score as well, adding the definitions from #10969

I think once #11096 is done, they will amend the doc to talk about calibration loss instead of brier score, but for now I thought it would be useful to expand on Brier score.

NicolasHug · 2020-07-04T19:28:29Z

doc/modules/calibration.rst

+The sigmoid method is biased in that it assumes the :ref:`calibration curve
+<calibration_curve>` of the un-calibrated model has a sigmoid shape and is
+symmetrical [1]_. It is thus most effective when the un-calibrated model is


I don't want to delay merging further if you're sure about this but there are a few things that aren't clear for me here:

why does sigmoid calibration assumes a sigmoid calibration curve?

is this really discussed in [1]_, or rather in projecteuclid.org/download/pdfview_1/euclid.ejs/1513306867?

In projecteuclid.org/download/pdfview_1/euclid.ejs/1513306867 it is said that

the parametric assumption made by logistic calibration is exactly the right one if the scores output by a classifier are normally distributed within each class around class means s+ and s− with the same variance σ2

though I'm not sure yet how that relates to the comment made above

The symmetry assumption is discussed in projecteuclid.org/download/pdfview_1/euclid.ejs/1513306867 but not in '[1]'. I can add it in.

* why does sigmoid calibration assumes a sigmoid calibration curve?

I am not clear on the maths but I think this is explained better in the original Platt paper: https://www.researchgate.net/publication/2594015_Probabilistic_Outputs_for_Support_Vector_Machines_and_Comparisons_to_Regularized_Likelihood_Methods
section 2.1 (which I can reference instead)

I am not clear on the maths but I think this is explained better in the original Platt paper

I don't see where this paper says such a thing. What I read is:

the sigmiod model is equivalent to assuming that the output of the SVM is proportional to the log odds of a positive example

the class-conditional densities between the margins are apparently exponential. Bayes rules on 2 exponentials suggests using a parametric form of a sigmoid

I don't understand how these two are equivalent to "using a sigmoid calibration assumes that the calibration curve has a sigmoid shape"

Hmm yes that is true. (Is it fair to say that:) It was designed to calibrate the output of the SVM - which has a sigmoid shape (is this always true)?

Maybe we can just say:

using the sigmoid calibration method assumes that the calibration curve can corrected by applying a sigmoid function to the raw predictions. This assumption has been empirically justified in the case of support vector machines with common kernel functions on various benchmark datasets in section 2.1 of Platt 2000 [1] but does not necessarily hold in general.

lucyleeow · 2020-07-22T15:48:09Z

Thanks for your help @ogrisel and @NicolasHug, I've made some changes and hopefully it is correct now...

NicolasHug

thanks a lot @lucyleeow , will merge when green

…7725)

lucyleeow added 2 commits June 25, 2020 17:26

update calb

ddea665

wording

1ecf426

NicolasHug reviewed Jun 25, 2020

View reviewed changes

lucyleeow added 2 commits June 25, 2020 20:50

suggestsion

680f124

Merge branch 'master' into doc_calb

5b6f605

ogrisel reviewed Jun 26, 2020

View reviewed changes

glemaitre reviewed Jun 26, 2020

View reviewed changes

doc/modules/calibration.rst Outdated Show resolved Hide resolved

suggestions

562dc8d

ogrisel reviewed Jun 26, 2020

View reviewed changes

NicolasHug reviewed Jun 26, 2020

View reviewed changes

doc/modules/calibration.rst Outdated Show resolved Hide resolved

lucyleeow added 2 commits June 26, 2020 13:35

suggestion

73d060f

suggestion

dc50f10

ogrisel reviewed Jun 26, 2020

View reviewed changes

add math

8831bd4

ogrisel approved these changes Jun 26, 2020

View reviewed changes

NicolasHug reviewed Jun 26, 2020

View reviewed changes

lucyleeow added 3 commits June 26, 2020 14:27

fix link

ba71ca5

fix under conf

ed6c65d

Merge branch 'master' into doc_calb

c8c5d4f

NicolasHug reviewed Jul 2, 2020

View reviewed changes

suggestions

cba41cf

NicolasHug reviewed Jul 4, 2020

View reviewed changes

update

83e7476

NicolasHug approved these changes Jul 22, 2020

View reviewed changes

NicolasHug merged commit effc436 into scikit-learn:master Jul 22, 2020

lucyleeow deleted the doc_calb branch July 30, 2020 17:35

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

DOC Expand on sigmoid and isotonic in calibration.rst (scikit-learn#1…

e2b89f7

…7725)

	is the output of the classifier for sample :math:`i`. This method is more
	is the calibrated output of the classifier for sample :math:`i`. This method is more

	subject to \hat{f}_i >= \hat{f}_j whenever f_i >= f_j. :math:`y_i` is the true
	subject to :math`:\hat{f}_i >= \hat{f}_j` whenever :math:`f_i >= f_j`. :math:`y_i` is the true

	probability prediction (e.g., :class:`SGDClassifier`). The calibration
	probability prediction (e.g., :class:`~sklearn.linear_model.SGDClassifier`). The calibration

	probability prediction (e.g., :class:`~sklearn.linear_model.SGDClassifier`).
	probability prediction (e.g., some instances of :class:`~sklearn.linear_model.SGDClassifier`).

		@@ -85,22 +90,29 @@ Calibrating a classifier

		Calibrating a classifier consists in fitting a regressor (called a

		symmetrical [1]_. It is thus most effective when the un-calibrated model is
		under-confident and has similar errors for both high and low

	calibrated classifier for sample :math:`i`. This method
	calibrated classifier for sample :math:`i`, i.e. the calibrated probability. This method

Uh oh!

DOC Expand on sigmoid and isotonic in calibration.rst #17725

DOC Expand on sigmoid and isotonic in calibration.rst #17725

Uh oh!

Conversation

lucyleeow commented Jun 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

lucyleeow commented Jun 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Jun 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lucyleeow commented Jun 25, 2020 •

edited

Loading

lucyleeow commented Jun 25, 2020 •

edited

Loading

ogrisel Jun 26, 2020 •

edited

Loading

lucyleeow commented Jun 26, 2020 •

edited

Loading

lucyleeow Jul 9, 2020 •

edited

Loading

ogrisel Jul 22, 2020 •

edited

Loading