-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA add binary_classification_curve #30134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd also need to add this in api_reference.py
under the right section to have it rendered in the docs properly.
@glemaitre you happy with the name?
I think I'm fine with the name. I was looking if can have the word |
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
@adrinjalali There is an issue with numpydoc validation in the |
When you look at the CI log, this is the error, not the return: [gw0] linux -- Python 3.12.7 /usr/share/miniconda/envs/testvenv/bin/python
810 Decreasing score values.
811
812 Examples
813 -------
814 >>> import numpy as np
815 >>> from sklearn.metrics import binary_classification_curve
816 >>> y_true = np.array([0, 0, 1, 1])
817 >>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
818 >>> fps, tps, thresholds = binary_classification_curve(y_true, y_scores)
819 >>> fps
Expected:
array([0, 1, 1, 2])
Got:
array([0., 1., 1., 2.]) You just need to fix the output to floats |
I fixed this one. But this is the log I was talking about
This test |
That's indeed very odd. @StefanieSenger @lucyleeow would you have an idea here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a new functionality, we also need to:
- amend an example where it makes sense to use this feature. These examples are part of the gallery.
- add the reference into the "See Also" sections of other metrics such as the other curves function and the confusion matrix function at least.
- the function should also be discussed the user guide documentation in the metric section.
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
By looking at the example change, I went back to the issue to understand the real use case here. So it seems that users are interested to get the entry of the confusion matrix to be able to compute different metrics of interest. I'm wondering that what we have here is not enough since we only return the TP and FP and we don't provide any information about the negatives. Eg. computing the two samples Kolmogorov-Smirnov test (https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) could require these data. @adrinjalali would you think that if we make this function public, we should also return the TN and FN for completness? |
Yeah I think that makes sense @glemaitre . Also related, we probably want to get #15522 merged |
@glemaitre it is possible to compute the negatives from the positives. Since the tps array is increasing, false negatives are given by tps[-1] - tps. Similarly, true negatives are given by fps[-1] - fps. |
@SuccessMoses You will need an additional statistic most probably to infer the four statistics. However, on a user perspective, it is really annoying to have to carry on some code to compute the negative part. I would like just to call the function, get my counts and then deal with them. It is a nicer user experience. |
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Fixes #16470
Any other comments?
sklearn/metrics/_ranking.py
, changed the name of the function_binary_clf_curve
tobinary_classifcation_curve
without changing the body. I also changed test functions liketest_binary_clf_curve_multiclass_error
without changing the bodydet_curve
,roc_curve
andprecision_recall_curve
call this function, so I updated the name of the function in the body