-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add P4 classification metric #31218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the proposal. I took a look at the paper, and it is quite recent (published in 2023) and cited only 31 times according to google scholar. As such, this does not meet our inclusion criteria for scikit-learn. Personal opinion: We already have MCC and the P4 metric does seem to be very similar in the sense that both metrics penalize models that have at least one bad entry in their confusion matrix. In that respect, F4 seems a bit redundant. More importantly, I don't think generic metrics computed on thresholded (hard) predictions (MCC, F4, F1, balanced accuracy...) are the best way to choose a binary classifier for a given problem. Instead, I would select the best classifier based on threshold-independent binary classification metrics, either purely discriminative metrics, such as ROC AUC or Average Precision (area under the PR curve), or calibration-aware metrics, such as log-loss and Brier score, and afterward, find the optimal decision threshold based on application-specific constraints:
All of those can be implemented with the help of the |
I propose to close this feature request as "not planned" for now. If people disagree with what I wrote above, please feel free to upvote this issue and comment below to extend the analysis and we can consider reopening once the inclusion criteria are met. |
Describe the workflow you want to enable
Hi, while working on a classification problem I found out there is no dedicated function to compute the P4 metric implemented in sklearn. As a reminder, P4 metrics is a binary classification metric that is commonly seen as an extension of the f_beta metrics because it takes into account all four True Positive, False Positive, True Negative and False Negative values, and because is it symmetrical unlike the f_beta metrics.
P4 is defined as follows : P4 = 4 / ( 1/precision + 1/recall + 1/specificity + 1/NPV )
Wikipedia page right here
Medium article right there
Describe your proposed solution
My idea was to create a function
p4_support
similar toprecision_recall_fscore_support
. Since it is a binary metric, multiclass and multi-label inputs would be managed withmultilabel_confusion_matrix
so the arguments foraverage
would be'macro', 'samples', 'weighted', 'binary', None
.I would compute all necessaries values such as 1/precision, 1/recall, 1/specificity and 1/NPV using
_prf_divide
. If any of these four ratios are zero divisions, then P4 would also return the zero division argument. Indeed, for example if precision is null, then 1/precision is +inf and the whole denominator of the P4 is +inf which make P4 = 0 (Btw, this behavior is a reason why it is harder to achieve a high P4 score than f_score since all four ratios need to be 1 to have a P4 equals to 1.). The function would return the tuple (p4_value, support)A second function
p4_score
which would be the one actually used by users would return only the first element of the previously describedp4_support
function.Describe alternatives you've considered, if relevant
Extras :
Since specificity and NVP are computed anyway, the
p4_support
function could return the tuple (specificity, NVP, p4_score, support) and then be calledspecificity_nvp_p4_support
. It would then also be possible to addspecificity
andNVP
functions as well using the same scheme as precision or recall.Responding to #21000 issue, P4 could be added in the
classification_report
function and would be a good summary of all TP, FP, TN, FN values and their combinations.Additional context
I have checked that this feature is not already in the issues or pull requests.
The text was updated successfully, but these errors were encountered: