FEA add binary_classification_curve #30134

SuccessMoses · 2024-10-22T21:13:55Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Fixes #16470

Any other comments?

In sklearn/metrics/_ranking.py, changed the name of the function _binary_clf_curve to binary_classifcation_curve without changing the body. I also changed test functions like test_binary_clf_curve_multiclass_error without changing the body
det_curve, roc_curve and precision_recall_curve call this function, so I updated the name of the function in the body
I added examples in the docstring of the function

github-actions · 2024-10-22T21:15:41Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 3094eca. Link to the linter CI: here}

…into feature UPDATE.

adrinjalali

You'd also need to add this in api_reference.py under the right section to have it rendered in the docs properly.

@glemaitre you happy with the name?

sklearn/metrics/_ranking.py

glemaitre · 2024-11-05T19:09:18Z

I think I'm fine with the name. I was looking if can have the word counts in the name of the function but it starts to be really long. So I would be OK with the proposed name.

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

adrinjalali

Otherwise LGTM.

doc/whats_new/upcoming_changes/sklearn.metrics/30134.api.rst

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

SuccessMoses · 2024-11-07T09:29:08Z

@adrinjalali There is an issue with numpydoc validation in the binary_classification_curve function, the error is RT03: Return value has no description. Do you know why this is? I already documented the return value of the function.

adrinjalali · 2024-11-07T11:19:56Z

When you look at the CI log, this is the error, not the return:

[gw0] linux -- Python 3.12.7 /usr/share/miniconda/envs/testvenv/bin/python
810         Decreasing score values.
811 
812     Examples
813     -------
814     >>> import numpy as np
815     >>> from sklearn.metrics import binary_classification_curve
816     >>> y_true = np.array([0, 0, 1, 1])
817     >>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
818     >>> fps, tps, thresholds = binary_classification_curve(y_true, y_scores)
819     >>> fps
Expected:
    array([0, 1, 1, 2])
Got:
    array([0., 1., 1., 2.])

You just need to fix the output to floats

…into feature

SuccessMoses · 2024-11-07T11:53:27Z

You just need to fix the output to floats

I fixed this one. But this is the log I was talking about


E           array([0.8, 0.4, 0.35, 0.1])
E           
E           # Errors
E           
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description
E            - RT03: Return value has no description

function_name = 'sklearn.metrics._ranking.binary_classification_curve'
msg        = '\n\n/home/vsts/work/1/s/sklearn/metrics/_ranking.py\n\nTested function: sklearn.metrics._ranking.binary_classificatio...3: Return value has no description\n - RT03: Return value has no description\n - RT03: Return value has no description'
request    = <FixtureRequest for <Function test_function_docstring[sklearn.metrics._ranking.binary_classification_curve]>>
res        = {'deprecated': False, 'docstring': 'Calculate true and false positives per binary classification threshold.\n\nParamet...n'), ('RT03', 'Return value has no description'), ...], 'file': '/home/vsts/work/1/s/sklearn/metrics/_ranking.py', ...}

This test tests.test_docstrings.py::test_function_docstring[sklearn.metrics._ranking.binary_classification_curve] is failing.

adrinjalali · 2024-11-08T16:10:29Z

That's indeed very odd. @StefanieSenger @lucyleeow would you have an idea here?

sklearn/metrics/_ranking.py

…into feature

doc/whats_new/upcoming_changes/sklearn.metrics/30134.api.rst

glemaitre

Since this is a new functionality, we also need to:

amend an example where it makes sense to use this feature. These examples are part of the gallery.
add the reference into the "See Also" sections of other metrics such as the other curves function and the confusion matrix function at least.
the function should also be discussed the user guide documentation in the metric section.

sklearn/metrics/_ranking.py

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

glemaitre · 2024-11-25T17:10:30Z

By looking at the example change, I went back to the issue to understand the real use case here. So it seems that users are interested to get the entry of the confusion matrix to be able to compute different metrics of interest.

I'm wondering that what we have here is not enough since we only return the TP and FP and we don't provide any information about the negatives. Eg. computing the two samples Kolmogorov-Smirnov test (https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) could require these data.

@adrinjalali would you think that if we make this function public, we should also return the TN and FN for completness?

adrinjalali · 2024-11-26T16:18:10Z

Yeah I think that makes sense @glemaitre .

Also related, we probably want to get #15522 merged

SuccessMoses · 2025-01-01T14:06:50Z

@glemaitre it is possible to compute the negatives from the positives. Since the tps array is increasing, false negatives are given by tps[-1] - tps. Similarly, true negatives are given by fps[-1] - fps. _binary_clf_curve does not return negatives because they are easy to calculate.

glemaitre · 2025-01-02T17:55:43Z

@SuccessMoses You will need an additional statistic most probably to infer the four statistics. However, on a user perspective, it is really annoying to have to carry on some code to compute the negative part. I would like just to call the function, get my counts and then deal with them. It is a nicer user experience.

SuccessMoses added 3 commits October 22, 2024 09:37

Changed _binary_clf_curve to binary_clf_curve

75a8512

Changed binary_clf_curve to binary_classification_curve

8b26c82

DOC Added examples for binary_classification_curve

4e2b276

github-actions bot added the module:metrics label Oct 22, 2024

Merge branch 'main' into feature

ad7ff13

SuccessMoses added 3 commits October 22, 2024 14:58

Reformatted with black

97d3a92

Merge branch 'scikit-learn:main' into feature

8f8c41c

Merge branch 'feature' of https://github.com/SuccessMoses/scikit-learn …

48c80cc

…into feature UPDATE.

adrinjalali reviewed Nov 5, 2024

View reviewed changes

glemaitre reviewed Nov 5, 2024

View reviewed changes

sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved

glemaitre reviewed Nov 5, 2024

View reviewed changes

sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved

glemaitre reviewed Nov 5, 2024

View reviewed changes

sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved

glemaitre changed the title ~~Added binary_classification_curve from _binary_clf_curve~~ FEA Added binary_classification_curve from _binary_clf_curve Nov 5, 2024

glemaitre changed the title ~~FEA Added binary_classification_curve from _binary_clf_curve~~ FEA add binary_classification_curve Nov 5, 2024

SuccessMoses and others added 8 commits November 6, 2024 09:45

update documentation

c6079b7

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

update documentation

c37f479

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

update documentation

bba7958

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

Merge branch 'main' into feature

761221f

add new api to api_reference

8c89cbe

add new api to __init__.py

f4be0b0

add validate_parameters

50f1a01

add changelog

fbf0172

adrinjalali approved these changes Nov 7, 2024

View reviewed changes

doc/whats_new/upcoming_changes/sklearn.metrics/30134.api.rst Outdated Show resolved Hide resolved

update changelog

4477d6d

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

adrinjalali requested a review from glemaitre November 7, 2024 09:06

fix doctest error

0d7ff48

Merge branch 'feature' of https://github.com/SuccessMoses/scikit-learn …

ac58b10

…into feature

glemaitre reviewed Nov 8, 2024

View reviewed changes

sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved

SuccessMoses added 5 commits November 8, 2024 19:32

add -

26b5ab9

Merge branch 'main' into feature

47baa3f

fix docstring

5b40023

Merge branch 'feature' of https://github.com/SuccessMoses/scikit-learn …

2bb2d4b

…into feature

fix docstring

f9105e2

SuccessMoses requested a review from glemaitre November 8, 2024 22:42

glemaitre reviewed Nov 15, 2024

View reviewed changes

doc/whats_new/upcoming_changes/sklearn.metrics/30134.api.rst Outdated Show resolved Hide resolved

glemaitre reviewed Nov 15, 2024

View reviewed changes

sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved

sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved

update changelog message

06228bf

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

glemaitre added this to the 1.7 milestone Nov 18, 2024

SuccessMoses added 3 commits November 19, 2024 14:56

Improve documentation for binary_classification_curve

3fd686d

fix doc

3b864be

fix CI

3094eca

SuccessMoses requested review from glemaitre and lucyleeow November 20, 2024 14:10

glemaitre requested review from adrinjalali and removed request for glemaitre November 25, 2024 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA add binary_classification_curve #30134

FEA add binary_classification_curve #30134

SuccessMoses commented Oct 22, 2024

github-actions bot commented Oct 22, 2024 •

edited

Loading

adrinjalali left a comment

glemaitre commented Nov 5, 2024

adrinjalali left a comment

SuccessMoses commented Nov 7, 2024

adrinjalali commented Nov 7, 2024

SuccessMoses commented Nov 7, 2024

adrinjalali commented Nov 8, 2024

glemaitre left a comment

glemaitre commented Nov 25, 2024 •

edited

Loading

adrinjalali commented Nov 26, 2024

SuccessMoses commented Jan 1, 2025

glemaitre commented Jan 2, 2025

FEA add binary_classification_curve #30134

Are you sure you want to change the base?

FEA add binary_classification_curve #30134

Conversation

SuccessMoses commented Oct 22, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Oct 22, 2024 • edited Loading

✔️ Linting Passed

adrinjalali left a comment

Choose a reason for hiding this comment

glemaitre commented Nov 5, 2024

adrinjalali left a comment

Choose a reason for hiding this comment

SuccessMoses commented Nov 7, 2024

adrinjalali commented Nov 7, 2024

SuccessMoses commented Nov 7, 2024

adrinjalali commented Nov 8, 2024

glemaitre left a comment

Choose a reason for hiding this comment

glemaitre commented Nov 25, 2024 • edited Loading

adrinjalali commented Nov 26, 2024

SuccessMoses commented Jan 1, 2025

glemaitre commented Jan 2, 2025

github-actions bot commented Oct 22, 2024 •

edited

Loading

glemaitre commented Nov 25, 2024 •

edited

Loading