Skip to content

Conversation

lucyleeow
Copy link
Member

@lucyleeow lucyleeow commented Dec 3, 2024

Reference Issues/PRs

Reference Issues/PRs

Supercedes #25939

This is part of a group of draft PRs to determine best API for adding plots for cv results to our displays.

For all 3 options we take the output of cross_validate, and use the fitted estimator and test indicies. No fitting is done in the display.

We do recalculate the predictions (which would have already been done in cross_validate), which could be avoided if we decided to change cross_validate to optionally return the predictions as well (note this would make cross_val_predict redundant).
See more thread: #25939 (comment)). I think should be outside of the scope of this body of work though.

What does this implement/fix? Explain your changes.

Not 100% I've implemented this optimally.

  • RocCurveDisplay object may contain data (fpr/tpf etc) for single or multi curves
  • RocCurveDisplay returns single mpl Artist object, or list of objects for multi curves
  • RocCurveDisplay.plot handles both single and multi-curve plotting, this has meant a lot more checking is required (c.f. the other 2 implementations, as this is the only case where you can use plot directly to plot multi-curves)

More specific concerns detailed in review comments

Plot looks like:

image

TODO

We should update visualization.rst after this PR is in to add a section about from_cv_results.

Copy link

github-actions bot commented Dec 3, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ec04011. Link to the linter CI: here

Comment on lines 30 to 33
if n_multi is None:
name = self.estimator_name if name is None else name
else:
name = [f"{curve_type} fold {curve_idx}:" for curve_idx in range(n_multi)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with this implementation, I thought this change could be used for other multi cv displays. Not 100% sure on this change though.

@jeremiedbb
Copy link
Member

As discussed in today's meeting, this is my favorite solution because it's the simplest and least surprising one from a user point of view, even though it adds a bit more internal complexity than the others. And I think we can mitigate some of it by extracting parts of the plot code into dedicated _plot_single and _plot_multiple methods. Or just into small helpers, that would already help readability.

It also looks like a good portion of the added complexity will be exactly the same for other displays like PRCurveDisplay, so there might be a chance that we'll be able to factorize some parts to be used by several displays.

@lucyleeow
Copy link
Member Author

The changes in f0908e1 and 7e77d4c factorizes out common code (compared to #30508), adding helper function to either _BinaryClassifierCurveDisplayMixin (if function relevant to other binary displays) or sklearn/utils/_plotting.py (if function more generally applicable to more diplays - these potentially could be a parent class method?)

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a couple of nitpicks. It looks great on my side and almost ready to be merged.

@lucyleeow
Copy link
Member Author

I think I have addressed everything, thanks @glemaitre !

@glemaitre glemaitre mentioned this pull request May 5, 2025
17 tasks
@lucyleeow
Copy link
Member Author

@jeremiedbb gentle ping on this, thanks!

@lucyleeow lucyleeow added the Waiting for Second Reviewer First reviewer is done, need a second one! label May 19, 2025
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed tiny nitpicks.

LGTM. Thanks !

@lucyleeow
Copy link
Member Author

Thanks @jeremiedbb !

@jeremiedbb jeremiedbb merged commit 1b05e8f into scikit-learn:main May 26, 2025
36 checks passed
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request May 30, 2025
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
elhambbi pushed a commit to elhambbi/scikit-learn that referenced this pull request Jun 1, 2025
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
jeremiedbb added a commit that referenced this pull request Jun 5, 2025
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
@lucyleeow lucyleeow deleted the cv_results3 branch June 13, 2025 02:39
JosephBARBIERDARNAL added a commit to JosephBARBIERDARNAL/scikit-learn that referenced this pull request Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:metrics module:utils Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants