Skip to content

ENH add CAP curve #28972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 90 commits into
base: main
Choose a base branch
from

Conversation

JosephBARBIERDARNAL
Copy link
Contributor

@JosephBARBIERDARNAL JosephBARBIERDARNAL commented May 7, 2024

This PR is a dupplicate from #28752. I'm recreating a PR because I accidentally deleted my fork with the old one.

Reference Issue

fix for #10003

What does this implement/fix?

creation of a CumulativeAccuracyDisplay class for plots

"The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The output is called a CAP curve.[1] The CAP is distinct from the receiver operating characteristic (ROC) curve, which plots the true-positive rate against the false-positive rate." (wikipedia definition)

It's mainly inspired from the RocCurveDisplay class.

other

It's currently a work in progress.

TODO

Binary classification

  • raise a ValueError in from_estimator if the estimator is not fitted or is a classifier that was fitted with more than 3 classes;
  • fix pos_label handling when the positive class;
  • add/update a test to check that we have the same result for response_method="decision_function" and response_method="predict_proba" for a LogisticRegression classifier fit with string labels and for all 3 possible values of pos_label;
    • not that this is different from what is already tested in test_display_from_estimator_and_from_prediction;
    • CAP curves should be invariant under order preserving transformations of the predictions. This is way plotting CAP from the unormalized logits or the logistic sigmoid scaled probabilistic predictions should result in the same curve: the logistic sigmoid is strictly monotonic, hence order preserving.
      • joseph --> should be good
  • add an option (enabled by default) to draw CAP curve of the "perfect"/"oracle" model;
  • update the tests to check that the model curves always lie between the "chance level" and "perfect"/"oracle" curves.
  • add a test to check that the display array attributes y_true_cumulative and cumulative_total have the same dtype as y_pred in the test about from_predictions. We can test for y_pred passed either as np.float32 or np.float64.
    • joseph: should be good
  • test that CAPCurveDisplay.from_estimator(LinearSVC().fit(X, y), ...) works (even if it does not have a predict_proba method. This should cover one of the line reported as uncovered by codecov.
  • leverage test_common_curve_display.py to reuse some generic tests on CAPCurveDisplay and maybe remove redundant tests on invalid inputs from test_cap_curve_display.py if any;
    • joseph: should be good
  • add despine argument?
    • olivier: I would rather not do that specifically for that PR but maybe consider a cross-display PR that does that for all *Display classes in scikit-learn. Feel free to open an issue to discuss this with screen shots e.g. on ROC or PR curves and your analysis of pros and cons.
    • joseph: sure. I suggested it here because the ROC and PR curves already have it (see ENH despine keyword for ROC and PR curves #26367). I'm not sure it makes much sense for ConfusionMatrixDisplay (?). I'll open an issue (when this PR will be merged) for CAPCurveDisplay, PredictionErrorDisplay and DetCurveDisplay because I think they're the only ones that don't have this option.
    • should be good --> I added the argument, as per suggested below in the discussion

Regression

  • update the docstrings to make it explicit that either regressors (with positive outcomes) or binary classifiers are accepted and anything else is rejected.
    • joseph: I made a first pass, but I guess it can be improved
  • raise a ValueError with an informative error message if y_true has negative values;
    • joseph: it is now tested here
  • raise a ValueError if all y_true are zeros (the plot would be degenerate and would raise a low level divide by zero warning whith normalize_scale=True);
    • In theory, this should not happen, since if all the y_true are zeros, it will be considered a case of classification
  • add tests with continuous outputs on positive observation data (e.g. using PoissonRegressor) and check that the regressor curve lie between the "chance level" and "perfect" curves;
    • joseph: it is now tested here
  • update the insurance model examples (examples/linear_model/plot_tweedie_regression_insurance_claims.py and examples/linear_model/plot_poisson_regression_non_normal_loss.py) to use the CAPCurveDisplay class instead of manually plotting the Lorenz curves.

Other

  • document the new feature with a new entry under doc/whats_new/upcoming_changes/
  • update the user guide (doc/visualization) to reference this new tool.
    • done here.
    • joseph: One thing I find strange is that the CAP curve is the only one where all the letters of the acronym are in capitals. Is this explicitly intended? There's DetCurveDisplay, RocCurveDisplay and then CAPCurveDisplay, which seems inconsistent.

Nice to have

Copy link

github-actions bot commented May 7, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ea7b322. Link to the linter CI: here

@lorentzenchr lorentzenchr changed the title add cap curve main class and tests ENH add CAP curve May 11, 2024
@lorentzenchr
Copy link
Member

@JosephBARBIERDARNAL Could you please address all the comments from #28752?

@JosephBARBIERDARNAL
Copy link
Contributor Author

Yes, I'm ready to do it, but I won't be able to get back to it quickly (2 to 3 months) unfortunately. Hope that's okay

@JosephBARBIERDARNAL
Copy link
Contributor Author

JosephBARBIERDARNAL commented Oct 6, 2024

Working on this PR again. Here are the current updates:

Some (likely non-exhaustive) issues:

  • Should the figure be squared when normalize=False? Currently, it does self.ax_.set([...], aspect="equal") in both cases, so the figure dimensions are "random" (i.e., unpredictable)
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import CapCurveDisplay

X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_scores = clf.decision_function(X_test)

fig, ax = plt.subplots(ncols=2, dpi=300, figsize=(12,12))

display = CapCurveDisplay.from_predictions(
    ax=ax[0],
    y_true=y_test,
    y_pred=y_scores,
    name='normalize_scale=False',
    normalize_scale=False,
    plot_chance_level=True
)

display = CapCurveDisplay.from_predictions(
    ax=ax[1],
    y_true=y_test,
    y_pred=y_scores,
    name='normalize_scale=True',
    normalize_scale=True,
    plot_chance_level=True
)

Screenshot 2024-10-06 at 14 49 39

@glemaitre glemaitre self-requested a review November 3, 2024 10:24
@glemaitre
Copy link
Member

I'll come back on this PR after the release. This will be one of my priority to be merged for 1.7.

@glemaitre glemaitre added this to the 1.7 milestone Nov 3, 2024
@JosephBARBIERDARNAL
Copy link
Contributor Author

I'll come back on this PR after the release. This will be one of my priority to be merged for 1.7.

Ok cool. There's still a few things I need to do anyway before a review (cov test and adding the #30023 check).


# compute cumulative sums for true positives and all cases
y_true_cumulative = np.cumsum(y_true_sorted * sample_weight_sorted)
cumulative_total = np.cumsum(sample_weight_sorted)
Copy link
Member

@ogrisel ogrisel Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For information, there was a concurrent PR to fix the lack of cumsum of the sample_weight to define the x-axis of the Lorenz curves in one of our regression examples. To check that this was the correct fix, we ran a quick check on synthetic data with integer valued sample weights to check that there is an exact equivalence between repeated data points and reweighting them by exposure:

#30198 (comment)

Maybe this PR could be expanded to test for this property also holds for CapCurveDisplay.from_predictions with non-constant, integer-valued sample_weight.

EDIT: I am not entirely sure how to write such a test, but possibly we could numpy.interp1d the CAP curve computed on the repeated for the x-axis location of the points of the weighted CAP curve and check that the two curves match up to a small eps at those locations.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several review comments in #28752 that have not yet been addressed.

I would like to make sure that we do not forget about them when iterating on this PR.

@JosephBARBIERDARNAL to sort things out, please reply in the threads of the review of #28752 and make it explicit which comments have been addressed in #28972 and how, and then mark those threads as resolved.

Also, we need some test and update an existing example, where we compare ROC, DET and CAP curves on the same classifier. I suppose this example is the best candidate:

https://scikit-learn.org/stable/auto_examples/model_selection/plot_det.html

@JosephBARBIERDARNAL
Copy link
Contributor Author

JosephBARBIERDARNAL commented Nov 27, 2024

@ogrisel That sounds good to me. I won't be able to work on it immediately, but I’ll definitely be able to get to it within the next few weeks.

I'll ping you and/or @glemaitre for review

@lorentzenchr
Copy link
Member

2 things are important for me:

  • Even if this PR focuses on binary classification, it should be implemented in a way such that (positive) regression can be added easily, in particular
    • without API hassle (note that it is predict_proba for classification but predict for regression)
    • with the same class/function
  • The plot should contain the lines for "perfect" and "random" as in the screenshot from the paper in FEA add CumulativeAccuracyDisplay #28752 (comment).

@JosephBARBIERDARNAL
Copy link
Contributor Author

  • The plot should contain the lines for "perfect" and "random" as in the screenshot from the paper in
  • Will this actually be visible? I'm not entirely sure how broad the possible shape of such a line could be. In my mind, it would resemble the "perfect" line in ROC curves, which suggests that this line would almost always span the spines of the Matplotlib Axes (except perhaps in cases with very low sample sizes?). However, we could adjust the x/y axis limits to accommodate this scenario.
  • If we decide to implement it, what should we name it, and what should its default style be?

@JosephBARBIERDARNAL
Copy link
Contributor Author

JosephBARBIERDARNAL commented Jan 11, 2025

@JosephBARBIERDARNAL to sort things out, please reply in the threads of the review of #28752 and make it explicit which comments have been addressed in #28972 and how, and then mark those threads as resolved.

I resolved most of the conversations there. I didn't touch some of them if I wasn't sure. Feel free to ping me if any changes are needed.

I'm just not sure what "make it explicit which comments have been addressed in https://github.com/scikit-learn/scikit-learn/pull/28972 and how" means

@ogrisel
Copy link
Member

ogrisel commented Apr 16, 2025

Alright. Let's try to implement to implement the following first:

Use binary classification path in from_predictions when y_true has 2 unique values, else regression;

and see out it goes. And also check if we can generate meaningful error messages without too much code complexity.

Note that we will also need to generate a meaningful error message when y_tree has more than 2 unique values are that not numerical (class labels).

Introduce a new parameter with options auto, classification and regression.

Note that the user can already force the classification mode by passing an explicit pos_label value.

@JosephBARBIERDARNAL
Copy link
Contributor Author

Note that we will also need to generate a meaningful error message when y_tree has more than 2 unique values are that not numerical (class labels).

This is already the case thanks to _validate_and_get_response_values()

@JosephBARBIERDARNAL

This comment was marked as resolved.

@JosephBARBIERDARNAL
Copy link
Contributor Author

JosephBARBIERDARNAL commented Apr 23, 2025

I'm having trouble raising the error when y_true has only zeros because of this behavior:

>>> import numpy as np
>>> from sklearn.utils.multiclass import type_of_target
>>> type_of_target(np.array([0.0, 0.0, 0.0, 0.0]))
'binary'
>>> type_of_target(np.array([0, 0, 0, 0]))
'binary'

Is this expected?

@lucyleeow
Copy link
Member

Currently tests are failing due to ImportError: CAPCurveDisplay.from_estimator requires matplotlib. You can install matplotlib with pip install matplotlib. I'm not entirely sure why this is happening, shouldn't matplotlib be installed when testing?

Add pyplot fixture to the test to get matplotlib.

@ogrisel
Copy link
Member

ogrisel commented Apr 25, 2025

@JosephBARBIERDARNAL for information, I just pushed a commit to update the example to adjust the new section on the theoretical curves of non-informative classifier baselines.

Please feel free to ping me once you are done with the regression TODO list.

@ogrisel
Copy link
Member

ogrisel commented Apr 25, 2025

Is this expected?

Not sure: this edge case is ambiguous. But for the sake of not breaking backward compat we can consider it is the expected behavior.

@JosephBARBIERDARNAL
Copy link
Contributor Author

JosephBARBIERDARNAL commented Apr 28, 2025

A few questions:

  • Should we set the same default behavior as in other displays for chance_level (not plotting it). The same goes for plot_perfect.
  • Should we add gini/AR to the plot labels instead of requesting users to call auc and calculate it themselves?

@JosephBARBIERDARNAL
Copy link
Contributor Author

@ogrisel You can review. See my comments in the TODO.

@ogrisel
Copy link
Member

ogrisel commented Apr 29, 2025

Should we set the same default behavior as in other displays for chance_level (not plotting it). The same goes for plot_perfect.

No strong opinion, but I think I would rather update the other to plot the baseline curve by default instead (to be done later).

@ogrisel
Copy link
Member

ogrisel commented Apr 29, 2025

Should we add gini/AR to the plot labels instead of requesting users to call auc and calculate it themselves?

I would rather do that in a follow-up PR that introduces an official Gini scoring function instead.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JosephBARBIERDARNAL. Here is another batch of feedback.

title="Lorenz curves by model",
xlabel="Cumulative proportion of exposure (from safest to riskiest)",
title="Lorenz Curves",
xlabel=("Cumulative proportion of exposure (from safest to riskiest)"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xlabel=("Cumulative proportion of exposure (from safest to riskiest)"),
xlabel="Cumulative proportion of exposure (from safest to riskiest)",

ax.set(
title="Lorenz curves by model",
xlabel="Cumulative proportion of exposure (from safest to riskiest)",
title="Lorenz Curves",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick to stay consistent with the most common casing convention for titles / headers in the scikit-learn doc.

Suggested change
title="Lorenz Curves",
title="Lorenz curves",

self.ax_.set(xlabel=xlabel, ylabel=ylabel, aspect="equal")

if despine:
_despine(self.ax_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand one of the tests to cover this line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe there is a common test to enable for this?

Object that stores computed values.
"""
target_type = type_of_target(y_true)
if target_type == "continuous":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever the user passes an explicit (non-None) pos_label argument, we should assume that they want to treat the problem as a binary classification problem and generate error messages that are consistent with this assumption.

Suggested change
if target_type == "continuous":
if target_type == "continuous" and pos_label is None:

Comment on lines +354 to +355
weighted_y_true = y_true_sorted * sample_weight_sorted
y_true_cumulative = np.cumsum(weighted_y_true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
weighted_y_true = y_true_sorted * sample_weight_sorted
y_true_cumulative = np.cumsum(weighted_y_true)
weighted_y_true_sorted = y_true_sorted * sample_weight_sorted
y_true_cumulative = np.cumsum(weighted_y_true_sorted)

Comment on lines +530 to +533
match = "`y_true` contains negative values, which isn't allowed for "
"continuous targets. If your data shouldn't be treated as "
"continuous, try converting the values to integers or strings "
"instead."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code is not testing what it's supposed to test: to define a multiline string you need parens, otherwise only the first line is assigned to the match variable and the rest is treated as a comment in Python.

Suggested change
match = "`y_true` contains negative values, which isn't allowed for "
"continuous targets. If your data shouldn't be treated as "
"continuous, try converting the values to integers or strings "
"instead."
match = (
"`y_true` contains negative values, which isn't allowed for "
"continuous targets. If your data shouldn't be treated as "
"continuous, try converting the values to integers or strings "
"instead."
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also need to re.escape(...) around that string to avoid getting the punctuation interpreted as regular expression operators.

Comment on lines +162 to +163
self.cumulative_total = self.cumulative_total / x_max
self.y_true_cumulative = self.y_true_cumulative / y_max
Copy link
Member

@ogrisel ogrisel Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be robust to edge-cases.

Suggested change
self.cumulative_total = self.cumulative_total / x_max
self.y_true_cumulative = self.y_true_cumulative / y_max
if x_max != 0:
self.cumulative_total = self.cumulative_total / x_max
if y_max != 0:
self.y_true_cumulative = self.y_true_cumulative / y_max

and please add tests to check that no low level warnings are raised for edge cases like:

CAPCurveClassifier.from_predictions(np.zeros(3), np.asarray([0.1, 0.3, 1.]))
CAPCurveClassifier.from_predictions(np.asarray([0.1, 0.3, 1.]), np.zeros(3))
CAPCurveClassifier.from_predictions(np.zeros(3), np.zeros(3))

Please also test when the classification interpretation forced by setting an explicit pos_label != None:

CAPCurveClassifier.from_predictions(
    np.zeros(3), np.asarray([0.1, 0.3, 1.]), pos_label=0
)
CAPCurveClassifier.from_predictions(
    np.zeros(3), np.asarray([0.1, 0.3, 1.]), pos_label=1
)

I am not 100% sure but maybe some of those edge cases should better trigger high level ValueError with an explicit message instead of plotting a meaningless chart.

else:
y_pred = estimator.predict(X)
if name is None:
name = type(estimator).__name__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please expand the tests to make sure this line is covered.

cumulative_total = display.cumulative_total
y_true_cumulative = display.y_true_cumulative
assert np.all(cumulative_total >= y_true_cumulative - 1e-8), (
f"Lorenz curve dips above the chance line.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"dips below" or "rises above" depending on the value of the prediction ordering parameter. "dips above" seems contradictory.

display.cumulative_total, display._perfect_x, display._perfect_y
)
assert np.all(y_true_cumulative >= perfect_y - 1e-8), (
"Lorenz curve rises below the perfect prediction line.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rises above" or "dips below". "rises below" seems contradictory.

# %% Analysis
# --------
#
# All curves agree that the Random Forest classifier is has more discriminative
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# All curves agree that the Random Forest classifier is has more discriminative
# All curves agree that the Random Forest classifier has more discriminative

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants