Skip to content

FEA add CumulativeAccuracyDisplay #28752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

FEA add CumulativeAccuracyDisplay #28752

wants to merge 5 commits into from

Conversation

JosephBARBIERDARNAL
Copy link
Contributor

Reference Issue

fix for #10003

What does this implement/fix?

creation of a CumulativeAccuracyDisplay class for plots

"The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The output is called a CAP curve.[1] The CAP is distinct from the receiver operating characteristic (ROC) curve, which plots the true-positive rate against the false-positive rate." (wikipedia definition)

It's mainly inspired from the RocCurveDisplay class.

other

It's currently a work in progress.

@JosephBARBIERDARNAL JosephBARBIERDARNAL marked this pull request as draft April 2, 2024 16:56
Copy link

github-actions bot commented Apr 2, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 73ca739. Link to the linter CI: here

@glemaitre glemaitre changed the title create cumulative accuracy profile class FEA add CumulativeAccuracyDisplay Apr 2, 2024
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

On a first pass, could you fix the linter CI. I would advise to directly use pre-commit. It would corresponds to the optional step 10 of the guideline(https://scikit-learn.org/dev/developers/contributing.html#how-to-contribute). It is optional but it makes life easier :)

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the PR. Here is quick first pass of reviews.


CumulativeAccuracyDisplay.from_predictions(y_test, y_scores, name='Logistic Regression')
plt.title('CAP Curve using from_predictions')
plt.show()
Copy link
Member

@ogrisel ogrisel Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For information when running this code I get:

image

This looks fine but I would like to also have an option to display the curve for oracle predictions and the curve for the random predictions baseline.

EDIT: we need to check the literature to see if it's common to normalize the cumulated values to 1 instead of plotting absolute numbers. Maybe an option to select between relative and absolute scales would make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I think we should set better the x- and y-lim (as in ROC curve display).
Also, is it common for this curve to be be in a squared axis? I see this is kind of different from the ROC and PR curve in this regards, where the x- and y-axis are not the same unit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • An option for relative VS absolute scales sounds relevant to me. Don't have a opinion of which should be the default.

  • +1 for squared axis (especially with normalized scale). This paper (Tasche 2006 "Validation of internal rating systems and PD estimates" contains one example with a squared axis cap curve:

Screenshot 2024-04-03 at 20 06 46

It reminds of ROC curve, but maybe there are reasons of not doing so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An option for relative VS absolute scales sounds relevant to me. Don't have a opinion of which should be the default.

I would use relative intuitively as the default.

@JosephBARBIERDARNAL
Copy link
Contributor Author

I've implemented some of the elements mentioned, but I don't have the time to do everything quickly.

I also have several concerns:

  • I haven't yet understood how to use this type of curve in the case of a regressor, and I don't think I've seen any article on the subject?
  • I'm not sure I understand how to use pre-commit/linters properly (I haven't managed to commit with pre-commit).
  • Being a total beginner in the use of tests, I haven't shifted the code for this, and could greatly use a simple practical explanation.

@ogrisel
Copy link
Member

ogrisel commented Apr 4, 2024

I haven't yet understood how to use this type of curve in the case of a regressor, and I don't think I've seen any article on the subject?

The example I linked above is precisely an example of a Lorenz curve which based on the linked discussion is very related to the CAP curve.

EDIT: I also posted an extra reference in the parent issue: #10003 (comment).

@ogrisel
Copy link
Member

ogrisel commented Apr 4, 2024

Being a total beginner in the use of tests, I haven't shifted the code for this, and could greatly use a simple practical explanation.

For an example, you can have a look at the tests for RocCurveDisplay in sklearn/metrics/_plot/tests/test_roc_curve_display.py.

You can run all the tests in this module with the pytest command (pip install pytest first if needed):

$ pytest sklearn/metrics/_plot/tests/test_roc_curve_display.py -vl

you can run a specific test with:

$ pytest sklearn/metrics/_plot/tests/test_roc_curve_display.py -vl -k test_roc_curve_display_plotting
================================================================================================ test session starts =================================================================================================
platform darwin -- Python 3.11.7, pytest-8.1.1, pluggy-1.4.0 -- /Users/ogrisel/miniforge3/envs/dev/bin/python3.11
cachedir: .pytest_cache
Using --randomly-seed=2393278150
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/Users/ogrisel/code/scikit-learn/.hypothesis/examples'))
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/ogrisel/code/scikit-learn
configfile: setup.cfg
plugins: snapshot-0.9.0, httpserver-1.0.8, repeat-0.9.3, randomly-3.15.0, cov-4.1.0, clarity-1.0.1, mock-3.12.0, hypothesis-6.93.0, xdist-3.5.0, anyio-4.2.0, benchmark-4.0.0
collected 53 items / 21 deselected / 32 selected                                                                                                                                                                     

sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-False-False-True-decision_function] PASSED                                            [  3%]
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_predictions-Classifier-True-False-True-decision_function] PASSED                                                   [  6%]
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-False-True-True-predict_proba] PASSED                                                 [  9%]
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-False-False-False-decision_function] PASSED                                           [ 12%]
...
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-True-True-False-predict_proba] PASSED                                                 [ 96%]
sklearn/metrics/_plot/tests/test_roc_curve_display.py::test_roc_curve_display_plotting[from_estimator-LogisticRegression-True-True-True-predict_proba] PASSED                                                  [100%]

========================================================================================= 32 passed, 21 deselected in 0.67s ==========================================================================================

since this is a parametrized test (with @pytest.mark.parametrize decorators) this is runs actually 21 variations of the same test functions. To run a single version you can do:

$ pytest sklearn/metrics/_plot/tests/test_roc_curve_display.py -vl -k "test_roc_curve_display_plotting and from_estimator-LogisticRegression-True-True-True-predict_proba"

So for your branch, you can create a file named sklearn/metrics/_plot/tests/test_cap_curve_display.py or similar and write some test functions that make assertions similar to what we do for the ROC curve display.

More details on pytest usage in https://scikit-learn.org/dev/developers/tips.html#pytest-tips or on the pytest website itself.

@ogrisel
Copy link
Member

ogrisel commented Apr 4, 2024

I'm not sure I understand how to use pre-commit/linters properly (I haven't managed to commit with pre-commit).

Some pre-commit tasks such as the black formatted and the ruff import statement formatter can auto-fix the code with you type git commit. If that happens the commit command fails but the files have been fixed: you need to git add the auto-fixed files and then git commit again.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments:


# compute cumulative sums for true positives and all cases
cumulative_true_positives = np.cumsum(y_true_sorted * sample_weight_sorted)
cumulative_total = np.cumsum(sample_weight_sorted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to a Lorenz curve that we manually define in this example.

def lorenz_curve(y_true, y_pred, exposure):
y_true, y_pred = np.asarray(y_true), np.asarray(y_pred)
exposure = np.asarray(exposure)
# order samples by increasing predicted risk:
ranking = np.argsort(y_pred)
ranked_frequencies = y_true[ranking]
ranked_exposure = exposure[ranking]
cumulated_claims = np.cumsum(ranked_frequencies * ranked_exposure)
cumulated_claims /= cumulated_claims[-1]
cumulated_exposure = np.cumsum(ranked_exposure)
cumulated_exposure /= cumulated_exposure[-1]
return cumulated_exposure, cumulated_claims

The differences are (as far as I can see):

  • y_pred is the estimated probability of a positive class membership given by a classifier for CAP/Lift/Gain curves while y_pred hold the continuous prediction of a positive target variable for a regressor.
  • sorted_indices = np.argsort(y_pred)[::-1] is reversed in CAP while it stays in ascending order in a Lorenz curve.

So I think we could indeed reuse the same class both for CAP/Lift/Gain curves and Lorenz curves.

To switch between both, we can probably introduce a keyword parameter x_ordering_style="cap" vs x_ordering_style="lorenz" for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the main thread of the PR, let's keep that for a follow-up PR.

@JosephBARBIERDARNAL
Copy link
Contributor Author

I'm not sure I understand how to use pre-commit/linters properly (I haven't managed to commit with pre-commit).

Some pre-commit tasks such as the black formatted and the ruff import statement formatter can auto-fix the code with you type git commit. If that happens the commit command fails but the files have been fixed: you need to git add the auto-fixed files and then git commit again.

thanks! helps a lot actually (also for tests)

@ogrisel
Copy link
Member

ogrisel commented Apr 5, 2024

Great. Now that you have tests, you can see in the diff view that the test coverage analysis has resulted in a few automated inline comments that highlight section of your code that are currently not tested:

https://github.com/scikit-learn/scikit-learn/pull/28752/files

To merge this PR we will need to have improved test coverage.

However, I would rather put the focus on the upgrading the code itself to:

  • make it possible to use the display on positively valued regression tasks;
  • make it possible to swap the order of the x-axis to use a Lorenz curve style instead of a CAP/Lift curve style;
  • updated the linked example to use the display instead of reimplementing a Lorenz curve from scratch.

@lorentzenchr
Copy link
Member

make it possible to swap the order of the x-axis to use a Lorenz curve style instead of a CAP/Lift curve style;

To keep this PR minimal, I would postpone this to a follow-up PR.

@ogrisel
Copy link
Member

ogrisel commented Apr 5, 2024

Alright, I agree to limit the scope of this PR to the case of CAP / Lift / Gain curve on classification tasks for now.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above I am ok to focus this PR on the classification case and leave regression for later. However I think we should already chose public attribute names for the cumulated statistics that are compatible with the anticipated regression/Lorenz curve case (see inline comment below)

Furthermore, I think we should update one of the existing example that displays a ROC curve for a binary classifier to also include a CAP curve. I think that the following example that already displays both a DET curve and a ROC curve is a good candidate:

https://scikit-learn.org/stable/auto_examples/model_selection/plot_det.html

We should probably refactor it a bit to change the title and even the filename to make it explicit that it's about more than DET curves.

The alternative would be to create a new example file dedicated to CAP but I think we all agree that the example gallery has already too many short examples and it's better to consolidate them into larger examples that combine related components from the scikit-learn API and better contrasts the commonalities and differences.

if plot_chance_level:
assert (
cap_display.chance_level_ is not None
), "Chance level line should be present"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also extend the tests to check for the presence shape and dtypes of other public attributes such as y_true_cumulative (new name suggested in https://github.com/scikit-learn/scikit-learn/pull/28752/files#r1553497449).

I might also make sense to check that the values of those attribute match (with assert_allclose) when calling CumulativeAccuracyDisplay.from_estimator or CumulativeAccuracyDisplay.from_predictions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the change for y_true_cumulative, but I'm not sure about shape or dtypes. Are displays supposed to have these attributes (or just this one)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realised that I've misread it. You meant checking the attributes of y_true_cumulative, not the display itself.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of from_estimator.

Comment on lines +233 to +234
This is also known as a Gain or Lift Curve for classification, and a Lorenz
curve for regression with a positively valued target.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is also known as a Gain or Lift Curve for classification, and a Lorenz
curve for regression with a positively valued target.
This is also known as a Gain or Lift Curve, or the mirrored Lorenz
curve of Machine Learning for regression with a positively valued target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we agree to leave the Lorenz curve / regression case for a follow-up PR I would defer this change for later. Speaking about regression on an class that is designed and document to only accept classifier predictions would be a source confusion.

@lorentzenchr
Copy link
Member

The alternative would be to create a new example file dedicated to CAP but I think we all agree that the example gallery has already too many short examples

I would also not add another example. We can replace the lorenz curve with CAP in 2 examples (Poisson and Tweedie).

@ogrisel
Copy link
Member

ogrisel commented Apr 8, 2024

We can replace the lorenz curve with CAP in 2 examples (Poisson and Tweedie).

But this is only for regression. I would like to illustrate CAP in a classification context and updating an example with a ROC and DET curve would make that possible and also improve the discoverability of the plot. Furthermore I find it interesting to visually compare the 3 plots for the same models / data.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sklearn.metrics should expose the CumulativeAccuracyDisplay class as part of the public API and include it in its __all__ module level attribute.

I think the docstrings of the 2 class methods should:

  • mention that this curve assesses the ranking power of classifiers, that is the ability to correctly order the individual predictions by probability of actually be labeled by the positive class,
  • make it explicit this curve is invariant under any monotonic transformation of the predictions, therefore it cannot be used to assess the calibration of the predictions, only the ranking power,
  • have a "See also" section (at the end of the docstring) and cross-link RocCurveDisplay as an alternative curve to assess the ranking power of classifiers.

@lorentzenchr
Copy link
Member

I just want to avoid a new example. Therefore, either extend existing examples or postpone to a followup PR. But please, no new example! (Strong opinion there)

@lorentzenchr
Copy link
Member

Alright, I agree to limit the scope of this PR to the case of CAP / Lift / Gain curve on classification tasks for now.

I'm more in favor of including regression from the start. That's where it adds additional value. For classification, it is kind of equivalent to ROC. On top, this helps to find good names.

@JosephBARBIERDARNAL JosephBARBIERDARNAL closed this by deleting the head repository Apr 27, 2024
@lorentzenchr
Copy link
Member

@JosephBARBIERDARNAL Why did you close? This would be a valuable addition.

@JosephBARBIERDARNAL
Copy link
Contributor Author

I forgot that this PR was still open when I deleted my fork, sorry! Any solution for this type of case?

@JosephBARBIERDARNAL
Copy link
Contributor Author

I created a new PR (#28972) with the same code. I'll hopefully keep this one open. again sorry for the mistake!

@JosephBARBIERDARNAL JosephBARBIERDARNAL mentioned this pull request May 7, 2024
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants