Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31254

NEREUScode · 2025-04-25T19:09:47Z

Commit Description:

Replaced the data_binary fixture that filtered classes from a multiclass dataset with a new fixture generating a synthetic binary classification dataset using make_classification. This ensures consistent data characteristics, introduces label noise, and better simulates real-world classification challenges.

PR Description:

Summary of Changes:

This PR refactors the data_binary fixture in the test_roc_curve_display.py file. The previous fixture filtered a multiclass dataset (Iris) to create a binary classification task. However, this approach resulted in AUC values consistently reaching 1.0, which does not reflect real-world challenges.

The new fixture utilizes make_classification from sklearn.datasets to generate a synthetic binary classification dataset with the following characteristics:

200 samples and 20 features.
5 informative features and 2 redundant features.
10% label noise (flip_y=0.1) to simulate real-world imperfections in the data.
Class separation (class_sep=0.8) set to avoid perfect separation.

These changes provide a more complex and representative dataset for testing the roc_curve_display function and other related metrics, thereby improving the robustness of tests.

Reference Issues/PRs:

Fixes Use more complex data in test_roc_curve_display.py #31243
See also ENH add from_cv_results in RocCurveDisplay (single RocCurveDisplay) #30399 (comment)

For Reviewers:

This change ensures that the dataset used for testing is more reflective of real-world data, particularly in classification tasks that may involve noise and less clear separation between classes.

Replaced the `data_binary` fixture that filtered classes from a multiclass dataset with a new fixture generating a synthetic binary classification dataset using `make_classification`. This ensures consistent data characteristics, introduces label noise, and better simulates real-world classification challenges.

github-actions · 2025-04-25T19:10:46Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: e8b1e45. Link to the linter CI: here}

lucyleeow

Thanks for the PR!

There is a lint problem, see: #31254 (comment)

Just 2 items, otherwise looks good.

lucyleeow · 2025-04-28T04:04:30Z

sklearn/metrics/_plot/tests/test_roc_curve_display.py

@@ -26,8 +26,16 @@ def data():



I think the data fixture above can be removed as it is now no longer used (please double check).

lucyleeow · 2025-04-28T04:06:14Z

sklearn/metrics/_plot/tests/test_roc_curve_display.py

+        n_features=20,
+        n_informative=5,


Not sure if we need that many features (and so many uninformative ones), but I will leave to another maintainer to determine.

NEREUScode added 4 commits April 25, 2025 19:33

Update test_roc_curve_display.py

e299bf6

Update test_roc_curve_display.py

7ab9430

Replace filtered data fixture with synthetic binary dataset

4cfe688

github-actions bot added the module:metrics label Apr 25, 2025

lucyleeow reviewed Apr 28, 2025

View reviewed changes

lucyleeow added the No Changelog Needed label Apr 28, 2025

update the data_binary and delete the data()

e8b1e45

NEREUScode closed this Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31254

Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31254

NEREUScode commented Apr 25, 2025

github-actions bot commented Apr 25, 2025 •

edited

Loading

lucyleeow left a comment

lucyleeow Apr 28, 2025

lucyleeow Apr 28, 2025

Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31254

Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31254

Conversation

NEREUScode commented Apr 25, 2025

Commit Description:

PR Description:

Summary of Changes:

Reference Issues/PRs:

For Reviewers:

github-actions bot commented Apr 25, 2025 • edited Loading

✔️ Linting Passed

lucyleeow left a comment

Choose a reason for hiding this comment

lucyleeow Apr 28, 2025

Choose a reason for hiding this comment

lucyleeow Apr 28, 2025

Choose a reason for hiding this comment

github-actions bot commented Apr 25, 2025 •

edited

Loading