-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH use cv_results in the different curve display to add confidence intervals #21211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH use cv_results in the different curve display to add confidence intervals #21211
Conversation
@@ -1067,6 +1135,22 @@ def plot(self, *, ax=None, name=None, ref_line=True, **kwargs): | |||
If `True`, plots a reference line representing a perfectly | |||
calibrated classifier. | |||
|
|||
plot_uncertainty_style : {"errorbar", "fill_between", "lines"}, \ | |||
default="errorbar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default should plot_uncertainty_style="lines"
as it's the easier to understand without being mislead. For plot_uncertainty_style="errorbar"
and plot_uncertainty_style="fill_between"
we need to know that it's based on the raw standard deviation (as opposed to a pseudo confidence interval based on the standard error of the mean for instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also accept plot_uncertainty_style=None
to only plot the mean CV calibration curve without any uncertainty markers on the plot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also plot_uncertainty_style="shade"
or plot_uncertainty_style="shaded_area"
might be easier to understand than plot_uncertainty_style="fill_between"
.
default="errorbar" | ||
Style to plot the uncertainty information. Possibilities are: | ||
|
||
- "errorbar": error bars representing one standard deviation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two standard deviations: 1 above and 1 below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume (I did not check ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return_indices : bool, default=False | ||
Whether to return the train-test indices selected for each split. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming from #21664, I agree return_indices
is useful. (I wanted to do something like this recently).
@glemaitre this seems cool to be continued! |
Yep this also pet of the CZI proposal on inspection. This would be my next effort after the tuning threshold classifier. |
This PR intends to add the capability of plotting uncertainty of the different curves (calibration, precision-recall, roc, etc.) by using the results of cross-validation (i.e. the output of
cross_validate
).TODO:
return_indices
incross_validate
to store the train-test indices. It is the safest way to keep track of the train-test splits in the case of stochastic splitting strategies.from_cv_results
in the plotting display to take advantage of the CV computation.from_cv_results
CalibrationDisplay
calibration_curve
Usage example