Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31266

NEREUScode · 2025-04-28T11:54:35Z

PR Description:

Summary of Changes:

This PR refactors the data_binary fixture in the test_roc_curve_display.py file. The previous fixture filtered a multiclass dataset (Iris) to create a binary classification task. However, this approach resulted in AUC values consistently reaching 1.0, which does not reflect real-world challenges.

The new fixture utilizes make_classification from sklearn.datasets to generate a synthetic binary classification dataset with the following characteristics:

200 samples and 20 features.
5 informative features and 2 redundant features.
10% label noise (flip_y=0.1) to simulate real-world imperfections in the data.
Class separation (class_sep=0.8) set to avoid perfect separation.

These changes provide a more complex and representative dataset for testing the roc_curve_display function and other related metrics, thereby improving the robustness of tests.

Reference Issues/PRs:

Fixes Use more complex data in test_roc_curve_display.py #31243
See also ENH add from_cv_results in RocCurveDisplay (single RocCurveDisplay) #30399 (comment)

For Reviewers:

This change ensures that the dataset used for testing is more reflective of real-world data, particularly in classification tasks that may involve noise and less clear separation between classes.

Replaced the `data_binary` fixture that filtered classes from a multiclass dataset with a new fixture generating a synthetic binary classification dataset using `make_classification`. This ensures consistent data characteristics, introduces label noise, and better simulates real-world classification challenges.

github-actions · 2025-04-28T11:55:30Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 90077ae. Link to the linter CI: here}

NEREUScode · 2025-04-28T13:20:27Z

@lucyleeow i guess everything is good now

lucyleeow

LGTM! My only nit is that I am not sure we need 20 features but I'll let the 2nd reviewer decide that.

NEREUScode · 2025-04-29T09:31:26Z

@lucyleeow, I added 20 features to prevent overfitting. Without them we'd likely get a perfect ROC AUC of 1.0. Despite the added features, training time remains fast, so performance isn't a concern

marcin-okon

LGTM

lucyleeow · 2025-04-29T11:41:03Z

@NEREUScode thanks for explaining. What AUC do you get with 20 features? And what AUC do you get with e.g., 10?

adrinjalali · 2025-04-30T09:00:44Z

@lucyleeow 's question stands, but looks good anyway.

NEREUScode · 2025-04-30T10:55:12Z

@lucyleeow I'll run more tests to see if the feature number needs adjusting

NEREUScode added 8 commits April 25, 2025 19:33

Update test_roc_curve_display.py

e299bf6

Update test_roc_curve_display.py

7ab9430

Replace filtered data fixture with synthetic binary dataset

4cfe688

update the data_binary and delete the data()

e8b1e45

Merge branch 'main' into main

57bc822

fix the importation

b845dcf

Merge branch 'main' of github.com:NEREUScode/scikit-learn

3d086a2

github-actions bot added the module:metrics label Apr 28, 2025

ogrisel added the No Changelog Needed label Apr 28, 2025

lucyleeow approved these changes Apr 29, 2025

View reviewed changes

lucyleeow added the Waiting for Second Reviewer First reviewer is done, need a second one! label Apr 29, 2025

marcin-okon approved these changes Apr 29, 2025

View reviewed changes

NEREUScode added 2 commits April 29, 2025 14:45

Merge branch 'main' into main

d90e108

Merge branch 'main' into main

90077ae

adrinjalali approved these changes Apr 30, 2025

View reviewed changes

adrinjalali merged commit d51f17b into scikit-learn:main Apr 30, 2025
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31266

Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31266

NEREUScode commented Apr 28, 2025

github-actions bot commented Apr 28, 2025 •

edited

Loading

NEREUScode commented Apr 28, 2025

lucyleeow left a comment

NEREUScode commented Apr 29, 2025

marcin-okon left a comment

lucyleeow commented Apr 29, 2025

adrinjalali commented Apr 30, 2025

NEREUScode commented Apr 30, 2025

Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31266

Enhance ROC Curve Display Tests for Improved Clarity and Maintainability #31266

Conversation

NEREUScode commented Apr 28, 2025

PR Description:

Summary of Changes:

Reference Issues/PRs:

For Reviewers:

github-actions bot commented Apr 28, 2025 • edited Loading

✔️ Linting Passed

NEREUScode commented Apr 28, 2025

lucyleeow left a comment

Choose a reason for hiding this comment

NEREUScode commented Apr 29, 2025

marcin-okon left a comment

Choose a reason for hiding this comment

lucyleeow commented Apr 29, 2025

adrinjalali commented Apr 30, 2025

NEREUScode commented Apr 30, 2025

github-actions bot commented Apr 28, 2025 •

edited

Loading