-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH/FIX add drop_intermediate to DET curve and add threshold at infinity #29151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH/FIX add drop_intermediate to DET curve and add threshold at infinity #29151
Conversation
❌ Linting issuesThis PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling You can see the details of the linting issues under the
|
@ArturoAmorQ Could you also place the written-out name "Detection error tradeoff" (DET) in the docstrings of |
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, here is some feedback.
Should I add a change log? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another pass. Please include a changelog entry to document the change in the array attributes of the display object.
On top of the changelog entry, we might want to also include a EDIT: actually the attributes are class constructor parameters so maybe it's a bit weird, even though the typical user will not pass the constructor parameters but instead use a factory method. |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A final comment but otherwise still LGTM.
sklearn/metrics/_ranking.py
Outdated
----- | ||
.. versionchanged:: 1.6 | ||
An arbritrary threshold at infinity is added to represent a classifier | ||
that always predicts the negative class, i.e. `fpr=0` and `fnr=1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather move this below the .. versionadded:: 0.24
after the main paragraph of the docstring instead of adding a new "Notes" section to this docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should make this information consistent in both det_curve
and roc_curve
.
I might change the roc_curve
in this PR as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say a new PR. (this one takes long enough already.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm it seems I did not send this review.
sklearn/metrics/_ranking.py
Outdated
----- | ||
.. versionchanged:: 1.6 | ||
An arbritrary threshold at infinity is added to represent a classifier | ||
that always predicts the negative class, i.e. `fpr=0` and `fnr=1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should make this information consistent in both det_curve
and roc_curve
.
I might change the roc_curve
in this PR as well.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
I added support to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: Adding a threshold at infinity lead to redundant thresholds, so we decided to include a drop_intermediate option as implemented in precision_recall_curve and roc_curve.
It seems that adding the drop_intermediate
option is not strictly required for this PR: currently it is disabled in the display class and the example still renders fine.
I am not opposed to adding this option as part of this PR, but then it should be exposed in the .from_estimator
and .from_predictions
methods of the display class (and probably enabled by default, both for the sake of matplotlib rendering efficiency and consistency with other displays).
This new drop_intermediate
option should also be documented as an enh
in a new changelog entry and the tested when called from the display class methods.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
import scipy as sp | ||
|
||
from ...utils._plotting import _BinaryClassifierCurveDisplayMixin | ||
from .._ranking import det_curve | ||
|
||
|
||
class DetCurveDisplay(_BinaryClassifierCurveDisplayMixin): | ||
"""DET curve visualization. | ||
"""Detection Error Tradeoff (DET) curve visualization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as def det_curve
: Could you fix the below link to the user guide to https://scikit-learn.org/stable/modules/model_evaluation.html#detection-error-tradeoff-det?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for DetCurveDisplay
and it's methods. The cross-references to the user guide in in the RocCurveDisplay and in PrecisionRecallDisplay point to https://scikit-learn.org/stable/visualizations.html, shall we change those as well in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another PR.
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
2 follow-up PRs should be opened:
|
drop_intermediate : bool, default=True | ||
Whether to drop thresholds where true positives (tp) do not change | ||
from the previous or subsequent threshold. All points with the same | ||
tp value have the same `fnr` and thus same y coordinate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArturoAmorQ could you please open a follow-up PR to add the missing ..versionadded
markers for the new public parameters?
Decreasing score values. | ||
Decreasing thresholds on the decision function (either `predict_proba` | ||
or `decision_function`) used to compute FPR and FNR. An arbitrary | ||
threshold at infinity is added for the case `fpr=0` and `fnr=1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is worth adding a versionchanged
directive.
…ity (scikit-learn#29151) Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Follow-up from #24818. See also #30352.
What does this implement/fix? Explain your changes.
This PR implements the same logic used in #26194 for setting a threshold at infinity. Then uses it to show the chance level in our current example on DET curves.
Edit: Adding a threshold at infinity lead to redundant thresholds, so we decided to include a
drop_intermediate
option as implemented inprecision_recall_curve
androc_curve
.It also takes the opportunity to set a
random_state
in theRandomForestClassifier
for deterministic plots.Any other comments?
Maybe in a follow-up we can implement the
plot_chance_level
option as introduced in #25929.