-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Averaging of precision/recall/F1 inconsistent #3122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
it is more that binary classification has been historically handled differently. On the contrary, I would like this to cease, and for users to select binary P/R/F explicitly (see #2610, #2679).
The documentation is already much clearer than it used to be about the interaction between The major reason #2610 is still a work in progress, however, is that there is a usage that needs deprecation for a version before it can be implemented (see #2952). In short this is a feature, not a bug; but it's a bad feature ("explicit is better than implicit") and is gradually being fixed. I'm therefore closing this issue, but if you can word a documentation clarification, go ahead and submit a PR. |
PS: this problem has bitten us even in internal tests, so it's not being taken lightly, it's just something that needs gradual change in order to maintain backwards compatibility. |
Hello everyone,
I've come across what I believe is inconsistent behaviour in several functions in the
sklearn.metrics
module. I think the problem is with the documentation, but I prefer not to change that until I've ran it past the community and confirmed my understanding is correct (also a fix seems to be in the works #2610 ).In the example below I'm using
recall_score
, but the same applies toprecision_score
andf1_score
.The output of the above is what I expect it to be, particularly the macro-averaged recall is just the mean of the three recalls per class.
When averaging in the binary case, if the labels contain the magic value
1
, you get the recall for class1
. This is mentioned in the documentation (even though I do not understand why this is done). However, if the labels do not contain the integer1
, an error is raised.In #2094 @jnothman suggests binary classification should be handled differently. I am not sure why this is the case. My understanding the macro averaging works the same way for any number of classes > 2. To get the behaviour I expect out of
recall_score
in the binary case, I had to call it withpos_label=None
:I think two points need to be made more explicit in the documentation:
pos_label
andlabels
The text was updated successfully, but these errors were encountered: