-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Add replace_undefined_by
param to class_likelihood_ratios
#29288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add replace_undefined_by
param to class_likelihood_ratios
#29288
Conversation
…ow done in confusion_matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should add more test to check all the possibility for zero_division !="warn"
. We should also check that we raise the proper deprecation warning for raise_warning
.
sklearn/metrics/_classification.py
Outdated
"zero_division": [ | ||
Hidden(StrOptions({"default"})), | ||
StrOptions({"warn"}), | ||
dict, # this needs to be further defined, but Options only takes unmutable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dict
is enough. Further validation will be handled in the function itself. The parameter is not doing advance checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there currently is no validation inplace on the function itself, because I was using open else
statements for any other input. An option would be to use elif
s with well defined conditions for the zero_division
args and raise an input error on else if the user passes something invalid.
But my impression had been that the param validation meant to handle this rather than the function code and the goal of having this was to unburden the actual function code from validations like these. Which way should be precede?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter is not doing advance checks.
Oh, did you mean: parameter_validation
is not used for advance checks? And advance checks, does this mean checks that go further than checking types and maybe immutable arguments? Then I think I understand now and the input validation needs to happen in the function itself.
Edit: Done
sklearn/metrics/_classification.py
Outdated
To define the return values in case of a division by zero, use | ||
`zero_division` instead. | ||
|
||
zero_division : str or dict, default="warn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know exactly the string, we can state it directly.
I'm also thinking to add np.nan
as a parameter that would be a shorthand for {"LR+": np.nan, "LR-": np.nan}
. It is a bit lighter.
zero_division : str or dict, default="warn" | |
zero_division : "warn", np.nan or dict, default="warn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think we also keep the {"LR+": np.nan, "LR-": np.nan}
option and add the np.nan option additionally. I slighly deviated and made it a str "nan"
, because a) this is how it is used in the other zero_division param cases and b) np.nan resulted difficult to work with (in the validation and in other places).
Edit: this is done
sklearn/metrics/_classification.py
Outdated
# zero_division="warn" does not need to be "np.nan" anymore and should be the lowest | ||
# score for each metric respectively (1 for LR+ and 0 for LR-) to match the other | ||
# functions that take a zero_division param. Return values and warning messages need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this change of behaviour should also be done with a deprecation from 1.8 to 1.10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand. Do you say there should be two cycles of deprecation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this anew, I see that you meant that the change of the default value for zero_division
from warn
to the lowest possible scores needs to be a new deprecation cycle.
Can we also consider doing two steps in one? Because the goal is that the functionality of zero_division
matches the other functions that take a zero_division
param and the longer this is not the case, the more confusing.
Edit: I have added a FutureWarning for the changing default behaviour and commented on this in the # TODO
section.
sklearn/metrics/_classification.py
Outdated
negative_likelihood_ratio = neg_num / neg_denom | ||
elif isinstance( | ||
zero_division.get("LR-", None), (int, float) | ||
) and zero_division.get("LR-", None) in [0, 1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I don't think that we will raise an error if we don't have any other thing that 0/1. I'm thinking that we could always create when possible a dictionary and check that the values in the dict are the one expected (0/1/np.inf). This would be safer at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very interesting. Before I jump into making this, let me reformulate to clarify if we are talking about the same idea:
Should we accept any value that the metrics could output as a valid input in the the zero_division
dict?
Should we allow it for class_likelihood_ratios
only or also for the other classification metrics?
Would it be useful to someone?
I feel I could not judge if it was useful and would rather stick with how zero_division
works on the other metrics by implementing it in a similar way here. (Which means restricting possible values to the min and max scores.) If you or someone goes "yes, yes yes" for the above questions, I'd be happy to learn more about this idea and to make it work this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After we had talked about this, I have now implemented a lenient check of the input values for the zero_division
param at the beginning of the function, that allows all the values, the ratios could take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @StefanieSenger, I would also briefly document the zero_division
behavior in the dedicated mathematical divergences section of the user guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review @glemaitre, I have done most things you have suggested and commented on a few, that were not that clear and I would like to exchange some thoughts or concerns.
sklearn/metrics/_classification.py
Outdated
To define the return values in case of a division by zero, use | ||
`zero_division` instead. | ||
|
||
zero_division : str or dict, default="warn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think we also keep the {"LR+": np.nan, "LR-": np.nan}
option and add the np.nan option additionally. I slighly deviated and made it a str "nan"
, because a) this is how it is used in the other zero_division param cases and b) np.nan resulted difficult to work with (in the validation and in other places).
Edit: this is done
sklearn/metrics/_classification.py
Outdated
# zero_division="warn" does not need to be "np.nan" anymore and should be the lowest | ||
# score for each metric respectively (1 for LR+ and 0 for LR-) to match the other | ||
# functions that take a zero_division param. Return values and warning messages need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand. Do you say there should be two cycles of deprecation?
sklearn/metrics/_classification.py
Outdated
"zero_division": [ | ||
Hidden(StrOptions({"default"})), | ||
StrOptions({"warn"}), | ||
dict, # this needs to be further defined, but Options only takes unmutable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there currently is no validation inplace on the function itself, because I was using open else
statements for any other input. An option would be to use elif
s with well defined conditions for the zero_division
args and raise an input error on else if the user passes something invalid.
But my impression had been that the param validation meant to handle this rather than the function code and the goal of having this was to unburden the actual function code from validations like these. Which way should be precede?
sklearn/metrics/_classification.py
Outdated
negative_likelihood_ratio = neg_num / neg_denom | ||
elif isinstance( | ||
zero_division.get("LR-", None), (int, float) | ||
) and zero_division.get("LR-", None) in [0, 1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very interesting. Before I jump into making this, let me reformulate to clarify if we are talking about the same idea:
Should we accept any value that the metrics could output as a valid input in the the zero_division
dict?
Should we allow it for class_likelihood_ratios
only or also for the other classification metrics?
Would it be useful to someone?
I feel I could not judge if it was useful and would rather stick with how zero_division
works on the other metrics by implementing it in a similar way here. (Which means restricting possible values to the min and max scores.) If you or someone goes "yes, yes yes" for the above questions, I'd be happy to learn more about this idea and to make it work this way.
Thank you @ArturoAmorQ, I have mentioned it in this section. I also couldn't hold myself back to try to define the conditions for zero divisions a bit clearer there. Please let me know if I succeeded with my attempt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @StefanieSenger, here are just a couple of comments regarding the documentation aspects of this PR. I'll let the others review for the actual code :)
doc/modules/model_evaluation.rst
Outdated
interpreted as the classifier never wrongly identifying negative cases as positives. | ||
This happens, for instance, when using a `DummyClassifier` that always predicts the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording "never wrongly identifying negative cases as positives" does not bring added value, as it is an equivalence to fp=0
.
The original wording "perfectly identifying positive cases" was intended to remind the user that LR+
is a "higher is better" kind of metrics (indeed, if fp=0
any sample classified as positive has to be a real positive and then LR+=inf
). The paragraph is meant to highlight that the interpretation "higher is better ergo inf is perfect" is lost when both fp=tp=0
, as that could be the case of a DummyClassifier
(the absence of positive predictions leads to fp=tp=0
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe my point would be clearer if we give some examples:
- Case 1: fp=0, tp≠0, LR_+ diverges
- the classifier is perfect
- in coherence with being a "higher is better" metric e.g.:
- y_true = np.array([0, 0, 0, 1, 1, 1])
- y_pred = np.array([0, 0, 0, 1, 1, 1])
- Case 2: fp=0, tp=0, LR_+ diverges
- y_pred only includes the majority class
- can be misleading as it's the case of a
DummyClassifier
with imbalanced data e.g.: - y_true = np.array([0, 0, 0, 0, 0, 1])
- y_pred = np.array([0, 0, 0, 0, 0, 0])
- Case 3: fn=0, tp=0, LR_+ and LR_- diverge
- the test set only includes the majority class, but the classifier still makes errors
- can be misleading as it's a common case when cross-validating with imbalanced data e.g.:
- y_true = np.array([0, 0, 0, 0, 0, 0])
- y_pred = np.array([0, 0, 0, 0, 0, 1])
- Case 4: tn=0, LR_- diverges
- positive and negative classes are inverted
- can be indicative of divergences in LR_+ if classes are reverted e.g.:
- y_true = np.array([1, 1, 1, 0, 0, 0])
- y_pred = np.array([0, 0, 0, 1, 1, 1])
Having this in the shape of table would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something similar to:
| Zeros in the Confusion Matrix | Diverges | Reason | Interpretation | Example |
|-------------------------------|---------------|----------------------------------------------------------------|---------------------------------------------------------------------------------------------------|----------------------------------------------|
| fp=0, tp≠0 | LR_+ | The classifier is perfect | In coherence with being a "higher is better" metric | `y_true = [0, 0, 0, 1, 1, 1]`<br>`y_pred = [0, 0, 0, 1, 1, 1]` |
| fp=0, tp=0 | LR_+ | `y_pred` only includes the majority class | Can be misleading, as it’s the case of a `DummyClassifier` with imbalanced data | `y_true = [0, 0, 0, 0, 0, 1]`<br>`y_pred = [0, 0, 0, 0, 0, 0]` |
| fn=0, tp=0 | LR_+ and LR_- | The test set only includes the majority class, but classifier still makes errors | Can be misleading, common when cross-validating with imbalanced data | `y_true = [0, 0, 0, 0, 0, 0]`<br>`y_pred = [0, 0, 0, 0, 0, 1]` |
| tn=0 | LR_- | Positive and negative classes are inverted | Can indicate divergences in LR_+ if classes are reversed | `y_true = [1, 1, 1, 0, 0, 0]`<br>`y_pred = [0, 0, 0, 1, 1, 1]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarifying this, @ArturoAmorQ. I think I understand for the LR+
case (not so sure about the LR-
case though, so I didn't touch it).
I have tried to point out these two cases fp==0
might indicate and also stay in the scope of a side note/drop down on mathematical divergences. I feel that this could be material for an extensive example, but this is outside the scope of this PR.
Please let me know what you think about my attempt to rephrase and give advice on how to interpret the positive likelihood ratio in case of a divergence / warning shown to the user.
sklearn/metrics/_classification.py
Outdated
Conditions under which this warning is raised in relation to the confusion matrix | ||
deriving from the `y_true` and `y_pred` arguments: | ||
When the number of false positives is 0, the positive likelihood ratio is undefined. | ||
When the number of true negatives is 0, the negative likelihood ratio is undefined. | ||
`UndefinedMetricWarning` is also (and regardles of the `zero_division` and | ||
`raise_warning` arguments) raised when no samples of the positive class are present | ||
in `y_true` (when the sum of true positives and false negatives is 0). Then, both | ||
the positive and the negative likelihood ratios are undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditions under which this warning is raised in relation to the confusion matrix | |
deriving from the `y_true` and `y_pred` arguments: | |
When the number of false positives is 0, the positive likelihood ratio is undefined. | |
When the number of true negatives is 0, the negative likelihood ratio is undefined. | |
`UndefinedMetricWarning` is also (and regardles of the `zero_division` and | |
`raise_warning` arguments) raised when no samples of the positive class are present | |
in `y_true` (when the sum of true positives and false negatives is 0). Then, both | |
the positive and the negative likelihood ratios are undefined. | |
A warning is raised given the following conditions in `y_true` and `y_pred`: | |
- False positives are 0: positive likelihood ratio is undefined. | |
- True negatives are 0: negative likelihood ratio is undefined. | |
- No positive class samples seen in `y_true`: both likelihood ratios are | |
undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion, @ArturoAmorQ. I agree that the previous text was a bit confusing and I took your suggestion to work on it in my new commit. I have also mentioned the input arguments for zero_division
and raise_warning
that also influence, if a warning in raised.
doc/modules/model_evaluation.rst
Outdated
interpreted as the classifier never wrongly identifying negative cases as positives. | ||
This happens, for instance, when using a `DummyClassifier` that always predicts the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe my point would be clearer if we give some examples:
- Case 1: fp=0, tp≠0, LR_+ diverges
- the classifier is perfect
- in coherence with being a "higher is better" metric e.g.:
- y_true = np.array([0, 0, 0, 1, 1, 1])
- y_pred = np.array([0, 0, 0, 1, 1, 1])
- Case 2: fp=0, tp=0, LR_+ diverges
- y_pred only includes the majority class
- can be misleading as it's the case of a
DummyClassifier
with imbalanced data e.g.: - y_true = np.array([0, 0, 0, 0, 0, 1])
- y_pred = np.array([0, 0, 0, 0, 0, 0])
- Case 3: fn=0, tp=0, LR_+ and LR_- diverge
- the test set only includes the majority class, but the classifier still makes errors
- can be misleading as it's a common case when cross-validating with imbalanced data e.g.:
- y_true = np.array([0, 0, 0, 0, 0, 0])
- y_pred = np.array([0, 0, 0, 0, 0, 1])
- Case 4: tn=0, LR_- diverges
- positive and negative classes are inverted
- can be indicative of divergences in LR_+ if classes are reverted e.g.:
- y_true = np.array([1, 1, 1, 0, 0, 0])
- y_pred = np.array([0, 0, 0, 1, 1, 1])
Having this in the shape of table would be nice.
doc/modules/model_evaluation.rst
Outdated
interpreted as the classifier never wrongly identifying negative cases as positives. | ||
This happens, for instance, when using a `DummyClassifier` that always predicts the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something similar to:
| Zeros in the Confusion Matrix | Diverges | Reason | Interpretation | Example |
|-------------------------------|---------------|----------------------------------------------------------------|---------------------------------------------------------------------------------------------------|----------------------------------------------|
| fp=0, tp≠0 | LR_+ | The classifier is perfect | In coherence with being a "higher is better" metric | `y_true = [0, 0, 0, 1, 1, 1]`<br>`y_pred = [0, 0, 0, 1, 1, 1]` |
| fp=0, tp=0 | LR_+ | `y_pred` only includes the majority class | Can be misleading, as it’s the case of a `DummyClassifier` with imbalanced data | `y_true = [0, 0, 0, 0, 0, 1]`<br>`y_pred = [0, 0, 0, 0, 0, 0]` |
| fn=0, tp=0 | LR_+ and LR_- | The test set only includes the majority class, but classifier still makes errors | Can be misleading, common when cross-validating with imbalanced data | `y_true = [0, 0, 0, 0, 0, 0]`<br>`y_pred = [0, 0, 0, 0, 0, 1]` |
| tn=0 | LR_- | Positive and negative classes are inverted | Can indicate divergences in LR_+ if classes are reversed | `y_true = [1, 1, 1, 0, 0, 0]`<br>`y_pred = [0, 0, 0, 1, 1, 1]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise this now LGTM.
sklearn/metrics/_classification.py
Outdated
"raise_warning": ["boolean", Hidden(StrOptions({"deprecated"}))], | ||
"replace_undefined_by": [ | ||
Hidden(StrOptions({"default"})), | ||
(StrOptions({"worst"})), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(StrOptions({"worst"})), | |
1.0, |
So here I used to think the worst case for the two scores is different (0 and 1), but if for both of them it's 1
, then I think it makes sense here for the argument option to be simply 1
, while explaining in the docstring (as it's already done in this PR), what the ranges for the two scores are.
cc @glemaitre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to think that "worst"
is more intuitive than 1.0
, but it's true that this then would be different than the other cases where we have implemented zero_division
.
But no strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer "worst"
for its explicitly but I would like to remain consistent. I would therefore prefer to go for 1.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it back from "worst"
to to 1.0
here, in the tests, in the control flow checks, in the warning messages and in the docstring.
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
doc/modules/model_evaluation.rst
Outdated
default an appropriate warning message and returns `nan` to avoid pollution when | ||
averaging over cross-validation folds. | ||
The positive likelihood ratio (`LR+`) is undefined when :math:`fp = 0`, meaning the | ||
classifier does not misclassify any negatives as positives. This condition can either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to call "negative samples" (or observations) or "positive samples" instead of "negatives" and "positives"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rephrased this into "does not misclassify any negative labels as positives."
I think that is even clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only minor changes. The code is pretty great and well documented I should say (quite easy to get lost in this zero_division
mess otherwise).
Really good job @StefanieSenger.
@adrinjalali you might want to be the one merging after the comments are addressed.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review @glemaitre.
I have made all the required changes.
doc/modules/model_evaluation.rst
Outdated
default an appropriate warning message and returns `nan` to avoid pollution when | ||
averaging over cross-validation folds. | ||
The positive likelihood ratio (`LR+`) is undefined when :math:`fp = 0`, meaning the | ||
classifier does not misclassify any negatives as positives. This condition can either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rephrased this into "does not misclassify any negative labels as positives."
I think that is even clearer.
sklearn/metrics/_classification.py
Outdated
"raise_warning": ["boolean", Hidden(StrOptions({"deprecated"}))], | ||
"replace_undefined_by": [ | ||
Hidden(StrOptions({"default"})), | ||
(StrOptions({"worst"})), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it back from "worst"
to to 1.0
here, in the tests, in the control flow checks, in the warning messages and in the docstring.
if (isinstance(constraint, str) and constraint == "nan") or ( | ||
isinstance(constraint, float) and np.isnan(constraint) | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the test case.
Not of extremely importance on a Friday afternoon but FYI this broke Relevant part from build log
|
So, didn't we build the examples properly before in the CI? I don't understand why this didn't show up before. |
The whole doc is not built in the CI unless you add |
Otherwise only the files I touched are build? |
Yes, exactly. I should have remembered to ask you to submit that commit with the message, happens every now and then. All good, a follow up PR to fix is okay. |
Reference Issues/PRs
towards #29048
What does this implement/fix? Explain your changes.
This PR adds a
zero_division
param toclass_likelihood_ratios
like we're doing in the above issue. Since this function returns two scores, the input to thezero_division
param also needs to encompass two values.There is a
raise_warning
param already used for a similar purpose, that I deprecated here in a way that translates its functionality (exclusively raising warnings, the return values are not affected) to the new param.Question:
The output of
zero_division="warn"
(default) is set to np.nan as it is with the current function in case of a zero division (which is also the content of the warning). The idea when we talked about it was to keep backwards compatibility.I think we don't need to do this and can return the lowest scores for each metric respectively (1 for LR+ and 0 for LR-) in case of
zero_division="warn"
right away, because the return values don't have anything to do with the deprecated param.Does that make sense?
Edit: this question was answered: yes, we keep the np.nan default return value for backwards compatibility until version 1.8.
Any other comments?
The warning that was previously raised if
support_pos == 0
has nothing to do with dividing by zero and thus I decided to decouple it from the new param (and the old one). Since this doesn't change the functionality and only adds an additional warning in a certain case, that is probably alright, isn't it?