-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Reliability curves for calibration of predict_proba #3574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
"set to False.") | ||
|
||
bin_width = 1.0 / bins | ||
bin_centers = np.linspace(0, 1.0 - bin_width, bins) + bin_width / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this algorithm is equivalent to:
bin_width = 1.0 / bins
# TODO: check boundary cases
binned = np.searchsorted(np.linspace(0, 1.0, bins), y_score)
bin_sums = np.bincount(binned, weights=y_score, minlength=bins)
bin_positives = np.bincount(binned, weights=y_true, minlength=bins)
bin_total = np.bincount(binned, minlength=bins)
bin_total[bin_counts == 0] = np.nan
y_score_bin_mean = bin_sums / bin_total
empirical_prob_pos = bin_positives / bin_total
You need a test. I think you should also follow the convention that we don't know whether |
This was already implemented in PR #1176 as |
Plotting the probability histogram below the calibration plot as shown in your notebook is a good idea! |
I'll "steal" your idea of histograms and update the PR asap. I'll ping you |
Sure, feel free to reuse anything you find useful. I didn'd know about PR #1176 ; it's really cool. Two things from reliability_curve of this PR might be useful for calibration_plot from the other:
Let me know if I can help out in any of these issues. |
Closing as duplicate, then? |
Jan I won't find the time to work on this in the next few days. Feel free
I would keep it in classification.py |
I sent you a PR with the discussed content |
@agramfort, are there any plans for |
Using sample_weight would make sense yes
|
@agramfort, how would from sklearn.metrics.classification import _check_binary_probabilistic_predictions
from sklearn.utils import column_or_1d
def calibration_curve(y_true, y_prob, normalize=False, n_bins=5):
"""Compute true and predicted probabilities for a calibration curve.
Read more in the :ref:`User Guide <calibration>`.
Parameters
----------
y_true : array, shape (n_samples,)
True targets.
y_prob : array, shape (n_samples,)
Probabilities of the positive class.
normalize : bool, optional, default=False
Whether y_prob needs to be normalized into the bin [0, 1], i.e. is not
a proper probability. If True, the smallest value in y_prob is mapped
onto 0 and the largest one onto 1.
n_bins : int
Number of bins. A bigger number requires more data.
Returns
-------
prob_true : array, shape (n_bins,)
The true probability in each bin (fraction of positives).
prob_pred : array, shape (n_bins,)
The mean predicted probability in each bin.
References
----------
Alexandru Niculescu-Mizil and Rich Caruana (2005) Predicting Good
Probabilities With Supervised Learning, in Proceedings of the 22nd
International Conference on Machine Learning (ICML).
See section 4 (Qualitative Analysis of Predictions).
"""
y_true = column_or_1d(y_true)
y_prob = column_or_1d(y_prob)
if normalize: # Normalize predicted values into interval [0, 1]
y_prob = (y_prob - y_prob.min()) / (y_prob.max() - y_prob.min())
elif y_prob.min() < 0 or y_prob.max() > 1:
raise ValueError("y_prob has values outside [0, 1] and normalize is "
"set to False.")
y_true = _check_binary_probabilistic_predictions(y_true, y_prob)
bins = np.linspace(0., 1. + 1e-8, n_bins + 1)
binids = np.digitize(y_prob, bins) - 1
bin_sums = np.bincount(binids, weights=y_prob, minlength=len(bins))
bin_true = np.bincount(binids, weights=y_true, minlength=len(bins))
bin_total = np.bincount(binids, minlength=len(bins))
nonzero = bin_total != 0
prob_true = (bin_true[nonzero] / bin_total[nonzero])
prob_pred = (bin_sums[nonzero] / bin_total[nonzero])
return prob_true, prob_pred By the way my best guess would be to make the following changes: bin_sums = np.bincount(binids, weights=sample_weight*y_prob, minlength=len(bins))
bin_true = np.bincount(binids, weights=sample_weight*y_true, minlength=len(bins))
bin_total = np.bincount(binids, weights=sample_weight, minlength=len(bins)) But for some reason my results do not look right after such changes. |
Your idea of just multiplying sample_weight to the weights would also have been my first guess. In which sense do the results not look as expected? BTW: Could you open an issue regarding adding sample_weight to |
@jmetzen, thank you for your reply. I will open up a new issue. Hopefully when I put in a new PR I can give an explanation that will clarify the motivation behind my request to add in By the way, it was your personal blog, jmetzen.github, that got me started on calibration curves. Thanks for the great post. It helped correct my analysis from an issue that almost went unnoticed. |
This PR adds the reliability_curve metric to metrics.ranking.py. Reliability diagrams allow checking if the predicted probabilities of a binary classifier are well calibrated. The PR also contains an example comparing how well the predicted probabilities of different classifiers are calibrated. A notebook giving the same example can be found under http://jmetzen.github.io/2014-08-16/reliability-diagram.html
For some backgrounds on reliability-diagrams, please refer to the paper "Predicting Good Probabilities with Supervised Learning"