-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Implementing multilabel confusion matrix - Issue #3452 #3614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing multilabel confusion matrix - Issue #3452 #3614
Conversation
Thanks. This currently doesn't work with mutlilabel data. The only For example:
Here, for label 0 (the first column), there is 1 tp, 1 fp (and 0 fn, 0 tn). For label 1, there is 1 fp, 1 fn (and 0 tp, 0 tn). (Basically, this amounts to a binary confusion matrix for each column.) The function will need a test (and not just a doctest). It will also need to be discussed in |
@jnothman Thanks a lot for the reply, I misunderstood the multilabel part for a multi-label classification, I got it clear now, and will enhance it, Regards |
Can you write tests for your function and narrative documentation to advertise your work? |
@arjoly Sure, Ok ! |
sklearn/metrics/classification.py
Outdated
|
||
Returns | ||
------- | ||
C : array, shape = [n_classes, 4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rather than the ordering being arbitrary, let's make this a proper contingency matrix, e.g. [n_classes, 2, 2]
where axis 1 indicates false/true and axis 2 indicates negative/positive. So true positives for each class can be accessed by C[:, 1, 1]
(or C[:, True, True]
but it's believable that such indexing will be deprecated in numpy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also possible to have a n_classes by n_classes matrix which have all information (and even more information).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's late at night and maybe I'm confused, but how does [n_classes, n_classes]
signify a true negative prediction for any particular class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine something like [n_classes, 2, n_classes]
in which [a, b, c]
indicates when a
was b (t/f) in the ground truth, how many times was c
predicted? Then true negatives are n_samples - [a, False, a]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting something like the table 7 of this article. However, this will be hard to represent with a numpy array. Sorry for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman concerning your first proposal, do you suggest we implement the shape [n_classes, 2, 2] and re-implement it if it's deprecated or stick with [n_classes, 4], can we use a structured array for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The marginals don't need to be shown, so it's possible to implement with a numpy array as long as that additional dimension signified there by /
(it seems) is represented explicitly as an axis of size 2 (or via returning a pair of n_classes x n_classes matrices).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Magellanea the only thing that would be deprecated is whether or not the user can access it with True and False rather than 1 and 0. It's still, I think, a more explicit shape.
columns = np.repeat(range(0, n_labels), 4) | ||
mcm = coo_matrix((data, (rows, columns)), shape=(4, n_labels)).\ | ||
toarray() | ||
return (np.array(list(map(tuple, np.transpose(mcm))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the use of struct arrays for return values, but in any case, list(map(tuple, ...))
is much more than needed. You should just be able to use view
on the existing array and provide your struct dtype.
@Magellanea Are you planning to continue work on this? If not I could cherry-pick your commits and finish it up, if you are busy? |
Sure you can, I'm sorry I've been busy lately
|
No description provided.