-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replaced the `data_binary` fixture that filtered classes from a multiclass dataset with a new fixture generating a synthetic binary classification dataset using `make_classification`. This ensures consistent data characteristics, introduces label noise, and better simulates real-world classification challenges.
@lucyleeow I removed data() as suggested, but I'm still unsure why the Linux check is failing |
@@ -1,11 +1,11 @@ | |||
import numpy as np | |||
import pytest | |||
from numpy.testing import assert_allclose | |||
from scipy.integrate import trapezoid | |||
from scipy.integrate import trapz as trapezoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
The CI failure seems to be due to this
E ImportError: cannot import name 'trapz' from 'scipy.integrate' (/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/scipy/integrate/init.py)
you can see the test failure details by clicking through 'details' eg. https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=76020&view=logs&j=dde5042c-7464-5d47-9507-31bdd2ee0a3a&t=4bd2dad8-62b3-5bf9-08a5-a9880c530c94&l=918
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot i'll fix it
PR Description:
Summary of Changes:
This PR refactors the
data_binary
fixture in thetest_roc_curve_display.py
file. The previous fixture filtered a multiclass dataset (Iris) to create a binary classification task. However, this approach resulted in AUC values consistently reaching 1.0, which does not reflect real-world challenges.The new fixture utilizes
make_classification
fromsklearn.datasets
to generate a synthetic binary classification dataset with the following characteristics:flip_y=0.1
) to simulate real-world imperfections in the data.class_sep=0.8
) set to avoid perfect separation.These changes provide a more complex and representative dataset for testing the
roc_curve_display
function and other related metrics, thereby improving the robustness of tests.Reference Issues/PRs:
test_roc_curve_display.py
#31243from_cv_results
inRocCurveDisplay
(singleRocCurveDisplay
) #30399 (comment)For Reviewers: