Skip to content

ENH: add normalize parameter to metrics.classification.confusion_matrix #14478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Sam1320
Copy link

@Sam1320 Sam1320 commented Jul 26, 2019

Allows to get a normalized confusion matrix directly from the function
call. I use confusion_matrix frequently and find the need to always
normalize the matrix manually maybe unnecessary.

I am aware of the fact that other functions like accuracy_score already
have this exact functionality implemented, so probably the lack of the
normalize parameter is intentional and I'm missing the why. But in case
its not intentional you might find this contribution useful :).

Sam1320 added 2 commits July 26, 2019 10:55
Allows to get a normalized confusion matrix directly from the function
call. I use this function frequently and find the need to always
normalize the matrix manually a bit tedious.
@@ -211,6 +212,10 @@ def confusion_matrix(y_true, y_pred, labels=None, sample_weight=None):
If none is given, those that appear at least once
in ``y_true`` or ``y_pred`` are used in sorted order.

normalize : bool, optional (default=False)
If True, , return the fraction of classified samples (float),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can normalise by number of samples (like accuracy), by number of samples in the ground truth for each class (like recall), or by number of samples predicted for each class (like precision). I find this description highly ambiguous, and I'm not persuaded we should assume one normalisation is more appropriate than another for the user.

Copy link
Author

@Sam1320 Sam1320 Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I did it that way because most frequently the normalization is done with regard to the ground truth. When we think of a normalized confusion matrix the picture that comes to mind is precisely something like this.

To avoid suggesting to the user just one specific type of normalization, different arguments for the normalize parameter could be specified, e.g, "precision/accuracy/recall".

@jnothman
Copy link
Member

jnothman commented Jul 29, 2019 via email

@@ -184,7 +184,8 @@ def accuracy_score(y_true, y_pred, normalize=True, sample_weight=None):
return _weighted_sum(score, sample_weight, normalize)


def confusion_matrix(y_true, y_pred, labels=None, sample_weight=None):
def confusion_matrix(y_true, y_pred, labels=None,
normalize=False, sample_weight=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should keep sample_weight before normalize just in case.

@glemaitre
Copy link
Member

With your proposal, you also need to implement tests to ensure that the function will work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants