Skip to content

[MRG] Reliability curves for calibration of predict_proba #3574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jmetzen
Copy link
Member

@jmetzen jmetzen commented Aug 18, 2014

This PR adds the reliability_curve metric to metrics.ranking.py. Reliability diagrams allow checking if the predicted probabilities of a binary classifier are well calibrated. The PR also contains an example comparing how well the predicted probabilities of different classifiers are calibrated. A notebook giving the same example can be found under http://jmetzen.github.io/2014-08-16/reliability-diagram.html

For some backgrounds on reliability-diagrams, please refer to the paper "Predicting Good Probabilities with Supervised Learning"

@jmetzen jmetzen changed the title Reliability curves for calibration of predict_proba [mrg] Reliability curves for calibration of predict_proba Aug 18, 2014
@jmetzen jmetzen changed the title [mrg] Reliability curves for calibration of predict_proba [MRG] Reliability curves for calibration of predict_proba Aug 18, 2014
"set to False.")

bin_width = 1.0 / bins
bin_centers = np.linspace(0, 1.0 - bin_width, bins) + bin_width / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this algorithm is equivalent to:

    bin_width = 1.0 / bins
    # TODO: check boundary cases
    binned = np.searchsorted(np.linspace(0, 1.0, bins), y_score)
    bin_sums = np.bincount(binned, weights=y_score, minlength=bins)
    bin_positives = np.bincount(binned, weights=y_true, minlength=bins)
    bin_total = np.bincount(binned, minlength=bins)
    bin_total[bin_counts == 0] = np.nan
    y_score_bin_mean = bin_sums / bin_total
    empirical_prob_pos = bin_positives / bin_total

@jnothman
Copy link
Member

You need a test.

I think you should also follow the convention that we don't know whether y_true is 0s and 1s (the most common alternative being that negatives are -1). Perhaps you should take a pos_label=1 parameter.

@mblondel
Copy link
Member

This was already implemented in PR #1176 as calibration_plot. You should join forces with @agramfort :)

@mblondel
Copy link
Member

Plotting the probability histogram below the calibration plot as shown in your notebook is a good idea!

@agramfort
Copy link
Member

I'll "steal" your idea of histograms and update the PR asap. I'll ping you
when done.

@jmetzen
Copy link
Member Author

jmetzen commented Aug 22, 2014

Sure, feel free to reuse anything you find useful. I didn'd know about PR #1176 ; it's really cool.

Two things from reliability_curve of this PR might be useful for calibration_plot from the other:

  • The normalize flag
  • Treating it as a ranking "metric". The interface is actually very similar to precision_recall_curve and roc_curve; what do you think about moving calibration_plot to ranking.py and rename it to calibration_curve (it's not plotting anything actually). The parameter "sample_weight" used in metrics/curves could also be useful in calibration_plot.

Let me know if I can help out in any of these issues.

@jnothman
Copy link
Member

Closing as duplicate, then?

@jnothman jnothman closed this Aug 23, 2014
@agramfort
Copy link
Member

Closing as duplicate, then?

Jan I won't find the time to work on this in the next few days. Feel free
to send me a PR that:

  • renames to calibration_curve
  • adds the histogram to one example
  • add the normalize flag

I would keep it in classification.py

@jmetzen
Copy link
Member Author

jmetzen commented Aug 30, 2014

I sent you a PR with the discussed content

@jmetzen jmetzen deleted the reliability_diagram branch September 15, 2014 14:11
@ecampana
Copy link

@agramfort, are there any plans for calibration_curve to support sample_weight or are there theoretical grounds for why doing this would not make sense in the first place? I made a similar comment on #2630. Thank you in advance for any information you my shed on this matter.

@agramfort
Copy link
Member

agramfort commented Mar 24, 2017 via email

@ecampana
Copy link

ecampana commented Mar 25, 2017

@agramfort, how would calibration_curve need to be modified to include sample_weight. I have spent a few hours trying to figure out the logic behind the function, but I still fail to see where it would be appropriate to introduce sample_weight. I would appreciate any hints as to the best place to modify the below function. Thanks again for your help.

from sklearn.metrics.classification import _check_binary_probabilistic_predictions
from sklearn.utils import column_or_1d

def calibration_curve(y_true, y_prob, normalize=False, n_bins=5):
    """Compute true and predicted probabilities for a calibration curve.
    Read more in the :ref:`User Guide <calibration>`.
    Parameters
    ----------
    y_true : array, shape (n_samples,)
        True targets.
    y_prob : array, shape (n_samples,)
        Probabilities of the positive class.
    normalize : bool, optional, default=False
        Whether y_prob needs to be normalized into the bin [0, 1], i.e. is not
        a proper probability. If True, the smallest value in y_prob is mapped
        onto 0 and the largest one onto 1.
    n_bins : int
        Number of bins. A bigger number requires more data.
    Returns
    -------
    prob_true : array, shape (n_bins,)
        The true probability in each bin (fraction of positives).
    prob_pred : array, shape (n_bins,)
        The mean predicted probability in each bin.
    References
    ----------
    Alexandru Niculescu-Mizil and Rich Caruana (2005) Predicting Good
    Probabilities With Supervised Learning, in Proceedings of the 22nd
    International Conference on Machine Learning (ICML).
    See section 4 (Qualitative Analysis of Predictions).
    """
    y_true = column_or_1d(y_true)
    y_prob = column_or_1d(y_prob)

    if normalize:  # Normalize predicted values into interval [0, 1]
        y_prob = (y_prob - y_prob.min()) / (y_prob.max() - y_prob.min())
    elif y_prob.min() < 0 or y_prob.max() > 1:
        raise ValueError("y_prob has values outside [0, 1] and normalize is "
                         "set to False.")

    y_true = _check_binary_probabilistic_predictions(y_true, y_prob)

    bins = np.linspace(0., 1. + 1e-8, n_bins + 1)
    binids = np.digitize(y_prob, bins) - 1

    bin_sums = np.bincount(binids, weights=y_prob, minlength=len(bins))
    bin_true = np.bincount(binids, weights=y_true, minlength=len(bins))
    bin_total = np.bincount(binids, minlength=len(bins))

    nonzero = bin_total != 0
    prob_true = (bin_true[nonzero] / bin_total[nonzero])
    prob_pred = (bin_sums[nonzero] / bin_total[nonzero])

    return prob_true, prob_pred

By the way my best guess would be to make the following changes:

    bin_sums = np.bincount(binids, weights=sample_weight*y_prob, minlength=len(bins))
    bin_true = np.bincount(binids, weights=sample_weight*y_true, minlength=len(bins))
    bin_total = np.bincount(binids, weights=sample_weight, minlength=len(bins))

But for some reason my results do not look right after such changes.

@jmetzen
Copy link
Member Author

jmetzen commented Mar 26, 2017

Your idea of just multiplying sample_weight to the weights would also have been my first guess. In which sense do the results not look as expected?

BTW: Could you open an issue regarding adding sample_weight to calibration_curve? The further discussion should take place there and not in this old PR.

@ecampana
Copy link

ecampana commented Mar 29, 2017

@jmetzen, thank you for your reply. I will open up a new issue. Hopefully when I put in a new PR I can give an explanation that will clarify the motivation behind my request to add in sample_weight as a parameter for calibration_curve since stephen-hoover was not so sure if he would believe a "weighted" calibration curve. Also, please ignore my comment about the weighted calibration curves not making sense as I had expected. Another unrelated issue was causing the problem.

By the way, it was your personal blog, jmetzen.github, that got me started on calibration curves. Thanks for the great post. It helped correct my analysis from an issue that almost went unnoticed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants