-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add CAP curve #28972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH add CAP curve #28972
Conversation
@JosephBARBIERDARNAL Could you please address all the comments from #28752? |
Yes, I'm ready to do it, but I won't be able to get back to it quickly (2 to 3 months) unfortunately. Hope that's okay |
Working on this PR again. Here are the current updates:
Some (likely non-exhaustive) issues:
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import CapCurveDisplay
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_scores = clf.decision_function(X_test)
fig, ax = plt.subplots(ncols=2, dpi=300, figsize=(12,12))
display = CapCurveDisplay.from_predictions(
ax=ax[0],
y_true=y_test,
y_pred=y_scores,
name='normalize_scale=False',
normalize_scale=False,
plot_chance_level=True
)
display = CapCurveDisplay.from_predictions(
ax=ax[1],
y_true=y_test,
y_pred=y_scores,
name='normalize_scale=True',
normalize_scale=True,
plot_chance_level=True
)
|
I'll come back on this PR after the release. This will be one of my priority to be merged for 1.7. |
Ok cool. There's still a few things I need to do anyway before a review (cov test and adding the #30023 check). |
sklearn/metrics/_plot/cap_curve.py
Outdated
|
||
# compute cumulative sums for true positives and all cases | ||
y_true_cumulative = np.cumsum(y_true_sorted * sample_weight_sorted) | ||
cumulative_total = np.cumsum(sample_weight_sorted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For information, there was a concurrent PR to fix the lack of cumsum
of the sample_weight
to define the x-axis of the Lorenz curves in one of our regression examples. To check that this was the correct fix, we ran a quick check on synthetic data with integer valued sample weights to check that there is an exact equivalence between repeated data points and reweighting them by exposure:
Maybe this PR could be expanded to test for this property also holds for CapCurveDisplay.from_predictions
with non-constant, integer-valued sample_weight
.
EDIT: I am not entirely sure how to write such a test, but possibly we could numpy.interp1d
the CAP curve computed on the repeated for the x-axis location of the points of the weighted CAP curve and check that the two curves match up to a small eps at those locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several review comments in #28752 that have not yet been addressed.
I would like to make sure that we do not forget about them when iterating on this PR.
@JosephBARBIERDARNAL to sort things out, please reply in the threads of the review of #28752 and make it explicit which comments have been addressed in #28972 and how, and then mark those threads as resolved.
Also, we need some test and update an existing example, where we compare ROC, DET and CAP curves on the same classifier. I suppose this example is the best candidate:
https://scikit-learn.org/stable/auto_examples/model_selection/plot_det.html
@ogrisel That sounds good to me. I won't be able to work on it immediately, but I’ll definitely be able to get to it within the next few weeks. I'll ping you and/or @glemaitre for review |
2 things are important for me:
|
|
I resolved most of the conversations there. I didn't touch some of them if I wasn't sure. Feel free to ping me if any changes are needed. I'm just not sure what |
Alright. Let's try to implement to implement the following first:
and see out it goes. And also check if we can generate meaningful error messages without too much code complexity. Note that we will also need to generate a meaningful error message when
Note that the user can already force the classification mode by passing an explicit |
This is already the case thanks to |
This comment was marked as resolved.
This comment was marked as resolved.
I'm having trouble raising the error when >>> import numpy as np
>>> from sklearn.utils.multiclass import type_of_target
>>> type_of_target(np.array([0.0, 0.0, 0.0, 0.0]))
'binary'
>>> type_of_target(np.array([0, 0, 0, 0]))
'binary' Is this expected? |
Add |
@JosephBARBIERDARNAL for information, I just pushed a commit to update the example to adjust the new section on the theoretical curves of non-informative classifier baselines. Please feel free to ping me once you are done with the regression TODO list. |
Not sure: this edge case is ambiguous. But for the sake of not breaking backward compat we can consider it is the expected behavior. |
A few questions:
|
@ogrisel You can review. See my comments in the TODO. |
No strong opinion, but I think I would rather update the other to plot the baseline curve by default instead (to be done later). |
I would rather do that in a follow-up PR that introduces an official Gini scoring function instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JosephBARBIERDARNAL. Here is another batch of feedback.
title="Lorenz curves by model", | ||
xlabel="Cumulative proportion of exposure (from safest to riskiest)", | ||
title="Lorenz Curves", | ||
xlabel=("Cumulative proportion of exposure (from safest to riskiest)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xlabel=("Cumulative proportion of exposure (from safest to riskiest)"), | |
xlabel="Cumulative proportion of exposure (from safest to riskiest)", |
ax.set( | ||
title="Lorenz curves by model", | ||
xlabel="Cumulative proportion of exposure (from safest to riskiest)", | ||
title="Lorenz Curves", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick to stay consistent with the most common casing convention for titles / headers in the scikit-learn doc.
title="Lorenz Curves", | |
title="Lorenz curves", |
self.ax_.set(xlabel=xlabel, ylabel=ylabel, aspect="equal") | ||
|
||
if despine: | ||
_despine(self.ax_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand one of the tests to cover this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe there is a common test to enable for this?
Object that stores computed values. | ||
""" | ||
target_type = type_of_target(y_true) | ||
if target_type == "continuous": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever the user passes an explicit (non-None
) pos_label
argument, we should assume that they want to treat the problem as a binary classification problem and generate error messages that are consistent with this assumption.
if target_type == "continuous": | |
if target_type == "continuous" and pos_label is None: |
weighted_y_true = y_true_sorted * sample_weight_sorted | ||
y_true_cumulative = np.cumsum(weighted_y_true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weighted_y_true = y_true_sorted * sample_weight_sorted | |
y_true_cumulative = np.cumsum(weighted_y_true) | |
weighted_y_true_sorted = y_true_sorted * sample_weight_sorted | |
y_true_cumulative = np.cumsum(weighted_y_true_sorted) |
match = "`y_true` contains negative values, which isn't allowed for " | ||
"continuous targets. If your data shouldn't be treated as " | ||
"continuous, try converting the values to integers or strings " | ||
"instead." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code is not testing what it's supposed to test: to define a multiline string you need parens, otherwise only the first line is assigned to the match
variable and the rest is treated as a comment in Python.
match = "`y_true` contains negative values, which isn't allowed for " | |
"continuous targets. If your data shouldn't be treated as " | |
"continuous, try converting the values to integers or strings " | |
"instead." | |
match = ( | |
"`y_true` contains negative values, which isn't allowed for " | |
"continuous targets. If your data shouldn't be treated as " | |
"continuous, try converting the values to integers or strings " | |
"instead." | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also need to re.escape(...)
around that string to avoid getting the punctuation interpreted as regular expression operators.
self.cumulative_total = self.cumulative_total / x_max | ||
self.y_true_cumulative = self.y_true_cumulative / y_max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be robust to edge-cases.
self.cumulative_total = self.cumulative_total / x_max | |
self.y_true_cumulative = self.y_true_cumulative / y_max | |
if x_max != 0: | |
self.cumulative_total = self.cumulative_total / x_max | |
if y_max != 0: | |
self.y_true_cumulative = self.y_true_cumulative / y_max |
and please add tests to check that no low level warnings are raised for edge cases like:
CAPCurveClassifier.from_predictions(np.zeros(3), np.asarray([0.1, 0.3, 1.]))
CAPCurveClassifier.from_predictions(np.asarray([0.1, 0.3, 1.]), np.zeros(3))
CAPCurveClassifier.from_predictions(np.zeros(3), np.zeros(3))
Please also test when the classification interpretation forced by setting an explicit pos_label != None
:
CAPCurveClassifier.from_predictions(
np.zeros(3), np.asarray([0.1, 0.3, 1.]), pos_label=0
)
CAPCurveClassifier.from_predictions(
np.zeros(3), np.asarray([0.1, 0.3, 1.]), pos_label=1
)
I am not 100% sure but maybe some of those edge cases should better trigger high level ValueError
with an explicit message instead of plotting a meaningless chart.
else: | ||
y_pred = estimator.predict(X) | ||
if name is None: | ||
name = type(estimator).__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please expand the tests to make sure this line is covered.
cumulative_total = display.cumulative_total | ||
y_true_cumulative = display.y_true_cumulative | ||
assert np.all(cumulative_total >= y_true_cumulative - 1e-8), ( | ||
f"Lorenz curve dips above the chance line.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dips below" or "rises above" depending on the value of the prediction ordering parameter. "dips above" seems contradictory.
display.cumulative_total, display._perfect_x, display._perfect_y | ||
) | ||
assert np.all(y_true_cumulative >= perfect_y - 1e-8), ( | ||
"Lorenz curve rises below the perfect prediction line.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"rises above" or "dips below". "rises below" seems contradictory.
# %% Analysis | ||
# -------- | ||
# | ||
# All curves agree that the Random Forest classifier is has more discriminative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# All curves agree that the Random Forest classifier is has more discriminative | |
# All curves agree that the Random Forest classifier has more discriminative |
This PR is a dupplicate from #28752. I'm recreating a PR because I accidentally deleted my fork with the old one.
Reference Issue
fix for #10003
What does this implement/fix?
creation of a CumulativeAccuracyDisplay class for plots
"The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The output is called a CAP curve.[1] The CAP is distinct from the receiver operating characteristic (ROC) curve, which plots the true-positive rate against the false-positive rate." (wikipedia definition)
It's mainly inspired from the
RocCurveDisplay
class.other
It's currently a work in progress.
TODO
Binary classification
ValueError
infrom_estimator
if the estimator is not fitted or is a classifier that was fitted with more than 3 classes;pos_label
handling when the positive class;response_method="decision_function"
andresponse_method="predict_proba"
for aLogisticRegression
classifier fit with string labels and for all 3 possible values ofpos_label
;test_display_from_estimator_and_from_prediction
;y_true_cumulative
andcumulative_total
have the same dtype asy_pred
in the test aboutfrom_predictions
. We can test fory_pred
passed either asnp.float32
ornp.float64
.CAPCurveDisplay.from_estimator(LinearSVC().fit(X, y), ...)
works (even if it does not have apredict_proba
method. This should cover one of the line reported as uncovered by codecov.test_common_curve_display.py
to reuse some generic tests onCAPCurveDisplay
and maybe remove redundant tests on invalid inputs fromtest_cap_curve_display.py
if any;despine
argument?*Display
classes in scikit-learn. Feel free to open an issue to discuss this with screen shots e.g. on ROC or PR curves and your analysis of pros and cons.despine
keyword for ROC and PR curves #26367). I'm not sure it makes much sense forConfusionMatrixDisplay
(?). I'll open an issue (when this PR will be merged) forCAPCurveDisplay
,PredictionErrorDisplay
andDetCurveDisplay
because I think they're the only ones that don't have this option.Regression
ValueError
with an informative error message ify_true
has negative values;ValueError
if ally_true
are zeros (the plot would be degenerate and would raise a low leveldivide by zero
warning whithnormalize_scale=True
);y_true
are zeros, it will be considered a case of classificationPoissonRegressor
) and check that the regressor curve lie between the "chance level" and "perfect" curves;examples/linear_model/plot_tweedie_regression_insurance_claims.py
andexamples/linear_model/plot_poisson_regression_non_normal_loss.py
) to use theCAPCurveDisplay
class instead of manually plotting the Lorenz curves.Other
doc/whats_new/upcoming_changes/
doc/visualization
) to reference this new tool.DetCurveDisplay
,RocCurveDisplay
and thenCAPCurveDisplay
, which seems inconsistent.Nice to have