-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH add from_cv_results in RocCurveDisplay #25939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH add from_cv_results in RocCurveDisplay #25939
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. My main concern is about the API.
sklearn/metrics/_plot/roc_curve.py
Outdated
for fold_id, (estimator, test_indices, name) in enumerate( | ||
zip(cv_results["estimator"], cv_results["indices"]["test"], fold_name_) | ||
): | ||
y_true = _safe_indexing(y, test_indices) | ||
y_pred = _get_response( | ||
_safe_indexing(X, test_indices), | ||
estimator, | ||
response_method=response_method, | ||
pos_label=pos_label, | ||
)[0] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me wonder if it's the right approach. The user has to provide exactly the same X as the one used to compute the cross val and we have no way to ensure that. For instance if the dataset is shuffled in between, the results will be overconfident due to leakage.
Would it make sense to not accept the cv_results, but compute the cross val as part of this method ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to add a return_predictions
to cross_validation
and directly get the prediction instead of the indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant run the cross-validation inside the display, but giving it more thoughts it's probably not a good idea: we don't want the display to fit the estimators.
I remember that we talked about returning the prediction from cross_validate, but it was not necessarily easy. I can't remember why, can you remind me ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that we talked about returning the prediction from cross_validate, but it was not necessarily easy. I can't remember why, can you remind me ?
It is just that this is not implemented but we could make cross_validate
return the predictions. However, we should also know the method that provides the predictions.
It is recommended to use | ||
:meth:`~sklearn.metrics.RocCurveDisplay.from_cv_results` to create a | ||
:class:`~sklearn.metrics.MultiRocCurveDisplay`. All parameters are stored as | ||
attributes. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't want the users to directly use it, we can make it private instead, no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do the same mention in over display. However, I am not against making the class private. However, I am thinking that it could be a handy display to plot several ROC curves even if they don't come from cross-validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other display we discourage calling plot
directly. Here it's the whole class. If we find it useful on its own, maybe the message should be something like "if you want to plot multiple roc curves from cv results, you should use ... instead".
I'm guessing that if we want to add from_cv_results to other displays, we'll have to make a Multi...Display
for each. Do you think we could at least factor some code into a Multi...Mixin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other display we discourage calling plot directly. Here it's the whole class. If we find it useful on its own, maybe the message should be something like "if you want to plot multiple roc curves from cv results, you should use ... instead".
I see. I can alter the code.
I'm guessing that if we want to add from_cv_results to other displays, we'll have to make a Multi...Display for each. Do you think we could at least factor some code into a Multi...Mixin ?
Indeed. I would postpone this introduction to the next display to see how much in common we will have between displays.
I also think that we can improve the vizualisation. For now, the curves for each fold have a different color, the mean is in black and the fill-between is grey. My proposition is that we use a single color (probably blue) for all curves and the +-1std fill-between, with different line styles and alphas. The drawback is that we can't distinguish the different folds in the legend, but is it bad ? |
I assume there are cases where you want to later inspect the fold. But it might not be the main use case. So having the default style as the one that you mention could be fine and provide a way (which is already the case) to have an individual style per CV fold. I don't know if having a string option to the kwargs could be useful. |
I also need to make the improvements proposed here: #25929 |
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
There is a very related paper recently published at TMLR:
I have not read it yet but I think we should before considering adding aggregation methods to this PR. |
I haven't read this PR and only skimmed through the paper. My first reaction is to not pursue averaging of ROC cuves. I also think that a ROC curve is a decision instrument and not a model selection instrument. |
I have had a look at the paper and a bit of digging into what R does. RPicked these three popular packages mostly based on this blog (which summarised a few ROC curve plotting packages after narrowing search results via orphaned status, relevance and downloads).
scikit-learnWe have an example, Receiver Operating Characteristic (ROC) with cross validation, which uses vertical averaging. Hogan & AdamsThe paper does not give any warnings against averaging of ROC curves, but compares the different methods and advises on how to select the most appriopriate method. The paper introduces 3 'main' methods, each of which supported by a paper:
Pooling and threshold gives averages for a fixed threshold but assumes that the "classifier outputs across folds of cross-validation are comparable and thus can be globally ordered", which has been shown to be generally not valid and can lead to large pessimistic biases. Averaging in ROC space does not suffer such problems. Ultimately, they recommend that if classifier is intended to operate on all datasets using a common threshold, use threshold averaging. Otherwise, should be done in ROC space and user should decide direction depending on use case. Going forwardI think that if we were to have a method to average, we should offer more than one method, as no one method is suitable for every use case (and it's what R does and I just assume the R community knows their stats 😅 )
@lorentzenchr do you mean decision instrument, as in deciding on the best threshold? Can you expand on why it shouldn't be used as a model selection instrument, or point me to a paper to read? I know (IRL and in papers) that it is used as a model selection instrument... |
Add a method
from_cv_results
inRocCurveDisplay
to plot several ROC curves at once obtained from the cross-validation.Discussions
In terms of design, I created a class handling multiple single-curve displays. An alternative is to alterate
RocCurveDisplay
to accept a list ofndarray
for thefpr
,tpr
andthresholds
. However, it makes a relatively complex change in theplot
method as well as adding new attributes aslines_
in conjunction withline_
.The code becomes difficult to read similar to
PartialDependenceDisplay
and the reason is to handle multiple and single curves in the same object.The reasoning here is to expose
from_cv_results
directly fromRocCurveDisplay
but return aMultiRocCurveDisplay
that can use the information stored in each underlying display to make the plotting.It is the based design that we come up with @ogrisel so far.