DOC more precise calibration wording #28171

GaelVaroquaux · 2024-01-18T16:21:31Z

A logistic regression will return well-calibrated predictions only if it is well specified. Some people were interpreting our text out of context and over-generalizing the sentence saying the a log-reg returns well-calibrated predictions.

Here is a minor modification to make it more precise

A logistic regression will return well-calibrated predictions only if it is well specified. Some people were interpreting our text out of context and over-generalizing the sentence saying the a log-reg returns well-calibrated predictions. Here is a minor modification to make it more precise

github-actions · 2024-01-18T16:22:45Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 074f08f. Link to the linter CI: here}

ArturoAmorQ

Just a tweak, otherwise LGTM :) Thanks @GaelVaroquaux, this certainly improves the wording!

doc/modules/calibration.rst

ogrisel

I agree with the suggested improvement.

Note that the level of regularization (and the presence of uninformative features in the low sample size regime) can also strongly impact the over-/under-confidence in predictions by
logistic regression models.

I plan to expand an example in the future to demonstrate this empirically but maybe in the short term we could just expand the paragraph as in the following suggestion.

But if you think that renders the paragraph too verbose, we can leave this for later. The current state of this PR is already a net improvement.

doc/modules/calibration.rst

ogrisel · 2024-01-18T16:53:42Z

/cc @lorentzenchr

lorentzenchr · 2024-01-18T21:19:31Z

doc/modules/calibration.rst

@@ -74,10 +74,12 @@ by showing the number of samples in each predicted probability bin.

 .. currentmodule:: sklearn.linear_model

-:class:`LogisticRegression` returns well calibrated predictions by default as it has a
+:class:`LogisticRegression` is more likely to return well calibrated predictions by default as it has a


"more likely" calls for a comparison: more likely than what?

I would also like to change the term "by default" to something better. Maybe "by itself", "without re-calibration, or "if well specified"

doc/modules/calibration.rst

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

@lorentzenchr

Merge suggested wordings from @lorentzenchr and @ogrisel

ArturoAmorQ · 2024-01-19T09:20:31Z

doc/modules/calibration.rst

@@ -74,10 +74,14 @@ by showing the number of samples in each predicted probability bin.

 .. currentmodule:: sklearn.linear_model

-:class:`LogisticRegression` returns well calibrated predictions by default as it has a
+:class:`LogisticRegression` is more likely to return well calibrated predictions by itself as it has a


Maybe to address @lorentzenchr's comment:

Suggested change

:class:`LogisticRegression` is more likely to return well calibrated predictions by itself as it has a

:class:`LogisticRegression` is likely to return well calibrated predictions by itself as it has a

I'd rather not: some people are going to argue that it's not that likely in absolute

ogrisel

Another pass of suggestion after iterating on the example itself (see details below):

doc/modules/calibration.rst

ogrisel · 2024-01-24T17:52:06Z

@GaelVaroquaux @lorentzenchr @ArturoAmorQ if you all agree with the suggestions in the comments above I can take care of applying this them all into this PR and rewrap the resulting paragraphs to get a clean diff.

lorentzenchr · 2024-02-22T15:25:20Z

Superseded by #28231

ArturoAmorQ · 2024-02-22T15:29:14Z

@lorentzenchr not sure we wanted to close this one as it concerns the user guide, whereas #28231 targets the Comparison of Calibration of Classifiers example.

GaelVaroquaux · 2024-02-22T15:34:43Z

I agree that this addresses a complementary aspect than #28231 and I think this needs to be merged.

I must say that I got a bit tired by the back and forth of nitpick on wordings that are somewhat at odds with the fact that the original wording is overall not great and that, IMHO, this was an improvement.

ogrisel · 2024-02-22T16:21:51Z

@lorentzenchr not sure we wanted to close this one as it concerns the user guide, whereas #28231 targets the Comparison of Calibration of Classifiers example.

Exactly, my objective in opening #28231 was:

to fix the figure used in the user guide (because previously it did not really show a well calibrated curve for LR, contradicting the message).
and improve its analysis w.r.t. what I understood based on the discussion in this PR.

Still I think we should still fix the user guide to merge this PR possibly as is (as it's already a net improvement to mention well-specification which is a very important condition to get well-calibrated LR models.

If you agree, I would be willing to put some further efforts to refine this section in a follow-up PR.

GaelVaroquaux · 2024-02-22T16:24:28Z

If you agree, I would be willing to put some further efforts to refine this section in a follow-up PR.

Fine with me (of course :) )

doc/modules/calibration.rst

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

lorentzenchr · 2024-02-22T17:48:30Z

So for closing too prematurely. But the net result is good. This PR moves forward.

ogrisel · 2024-02-23T06:37:11Z

So shall we merge? My +1 still holds.

lorentzenchr · 2024-02-23T07:17:11Z

doc/modules/calibration.rst

-This leads to the so-called **balance property**, see [8]_ and
-:ref:`Logistic_regression`.
+In the unpenalized case, this leads to the so-called **balance property**, see [8]_ and :ref:`Logistic_regression`.
+In the plot above, data is generated according to a linear mechanism, which is


@ogrisel Can you comment if this is true?

It's probably dependent on the random seed but it's probably not too false, otherwise the calibration curve would be bad. I would rather keep the message simple enough in the user guide for now and refine the example(s) iteratively in follow-up PRs and update the user guide accordingly once this is done.

doc/modules/calibration.rst

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel · 2024-02-23T15:41:14Z

Thanks all. I will try to open a follow-up issue / PR soon.

GaelVaroquaux · 2024-02-23T15:47:25Z

Thanks everyone!!

GaelVaroquaux added Documentation No Changelog Needed labels Jan 18, 2024

ArturoAmorQ approved these changes Jan 18, 2024

View reviewed changes

doc/modules/calibration.rst Outdated Show resolved Hide resolved

ogrisel approved these changes Jan 18, 2024

View reviewed changes

doc/modules/calibration.rst Outdated Show resolved Hide resolved

lorentzenchr approved these changes Jan 18, 2024

View reviewed changes

GaelVaroquaux and others added 3 commits January 18, 2024 22:49

Update doc/modules/calibration.rst

6a71b06

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

Update doc/modules/calibration.rst

843273f

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

DOC: merge suggestions

925186b

Merge suggested wordings from @lorentzenchr and @ogrisel

ArturoAmorQ reviewed Jan 19, 2024

View reviewed changes

ogrisel mentioned this pull request Jan 23, 2024

DOC: improve plot_compare_calibration.py #28231

Merged

ogrisel reviewed Jan 23, 2024

View reviewed changes

doc/modules/calibration.rst Show resolved Hide resolved

doc/modules/calibration.rst Show resolved Hide resolved

doc/modules/calibration.rst Outdated Show resolved Hide resolved

glemaitre added this to the 1.4.1 milestone Feb 6, 2024

lorentzenchr closed this Feb 22, 2024

GaelVaroquaux reopened this Feb 22, 2024

ogrisel reviewed Feb 22, 2024

View reviewed changes

doc/modules/calibration.rst Outdated Show resolved Hide resolved

Update doc/modules/calibration.rst

73365b0

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

lorentzenchr reviewed Feb 23, 2024

View reviewed changes

Update doc/modules/calibration.rst

074f08f

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

lorentzenchr changed the title ~~DOC: more precise calibration wording~~ DOC more precise calibration wording Feb 23, 2024

lorentzenchr enabled auto-merge (squash) February 23, 2024 14:21

lorentzenchr merged commit fe718a8 into scikit-learn:main Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC more precise calibration wording #28171

DOC more precise calibration wording #28171

GaelVaroquaux commented Jan 18, 2024

github-actions bot commented Jan 18, 2024 •

edited

Loading

ArturoAmorQ left a comment

ogrisel left a comment

ogrisel commented Jan 18, 2024

lorentzenchr Jan 18, 2024

ArturoAmorQ Jan 19, 2024

GaelVaroquaux Jan 19, 2024

ogrisel left a comment

ogrisel commented Jan 24, 2024

lorentzenchr commented Feb 22, 2024

ArturoAmorQ commented Feb 22, 2024 •

edited

Loading

GaelVaroquaux commented Feb 22, 2024

ogrisel commented Feb 22, 2024 •

edited

Loading

GaelVaroquaux commented Feb 22, 2024 via email

lorentzenchr commented Feb 22, 2024

ogrisel commented Feb 23, 2024

lorentzenchr Feb 23, 2024

ogrisel Feb 23, 2024

ogrisel commented Feb 23, 2024

GaelVaroquaux commented Feb 23, 2024 via email

	:class:`LogisticRegression` is more likely to return well calibrated predictions by itself as it has a
	:class:`LogisticRegression` is likely to return well calibrated predictions by itself as it has a

DOC more precise calibration wording #28171

DOC more precise calibration wording #28171

Conversation

GaelVaroquaux commented Jan 18, 2024

github-actions bot commented Jan 18, 2024 • edited Loading

✔️ Linting Passed

ArturoAmorQ left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented Jan 18, 2024

lorentzenchr Jan 18, 2024

Choose a reason for hiding this comment

ArturoAmorQ Jan 19, 2024

Choose a reason for hiding this comment

GaelVaroquaux Jan 19, 2024

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented Jan 24, 2024

lorentzenchr commented Feb 22, 2024

ArturoAmorQ commented Feb 22, 2024 • edited Loading

GaelVaroquaux commented Feb 22, 2024

ogrisel commented Feb 22, 2024 • edited Loading

GaelVaroquaux commented Feb 22, 2024 via email

lorentzenchr commented Feb 22, 2024

ogrisel commented Feb 23, 2024

lorentzenchr Feb 23, 2024

Choose a reason for hiding this comment

ogrisel Feb 23, 2024

Choose a reason for hiding this comment

ogrisel commented Feb 23, 2024

GaelVaroquaux commented Feb 23, 2024 via email

github-actions bot commented Jan 18, 2024 •

edited

Loading

ArturoAmorQ commented Feb 22, 2024 •

edited

Loading

ogrisel commented Feb 22, 2024 •

edited

Loading