Skip to content

Conversation

Magellanea
Copy link

No description provided.

@jnothman
Copy link
Member

jnothman commented Sep 1, 2014

Thanks.

This currently doesn't work with mutlilabel data. The only y_type that should be supported is 'multilabel-indicator', which may be a sparse or dense matrix with 0s and 1s.

For example:

y_true = np.array([[1, 0]]
                   [0, 1]])
y_pred = np.array([[1, 1]]
                   [1, 0]])

Here, for label 0 (the first column), there is 1 tp, 1 fp (and 0 fn, 0 tn). For label 1, there is 1 fp, 1 fn (and 0 tp, 0 tn). (Basically, this amounts to a binary confusion matrix for each column.)

The function will need a test (and not just a doctest). It will also need to be discussed in doc/modules/model_evaluation.rst, and ideally be demonstrated an example.

@Magellanea
Copy link
Author

@jnothman Thanks a lot for the reply, I misunderstood the multilabel part for a multi-label classification, I got it clear now, and will enhance it, Regards

@arjoly
Copy link
Member

arjoly commented Sep 2, 2014

Can you write tests for your function and narrative documentation to advertise your work?

@Magellanea
Copy link
Author

@arjoly You mean for this one or the one that I will implement (suggested by @jnothman )

@arjoly
Copy link
Member

arjoly commented Sep 2, 2014

I should have read @jnothman comment more carefully. I have the same opinion as @jnothman.

@Magellanea
Copy link
Author

@arjoly Sure, Ok !

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 904954a on Magellanea:multilabel-classification-metric into 353840c on scikit-learn:master.


Returns
-------
C : array, shape = [n_classes, 4]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rather than the ordering being arbitrary, let's make this a proper contingency matrix, e.g. [n_classes, 2, 2] where axis 1 indicates false/true and axis 2 indicates negative/positive. So true positives for each class can be accessed by C[:, 1, 1] (or C[:, True, True] but it's believable that such indexing will be deprecated in numpy).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to have a n_classes by n_classes matrix which have all information (and even more information).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's late at night and maybe I'm confused, but how does [n_classes, n_classes] signify a true negative prediction for any particular class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine something like [n_classes, 2, n_classes] in which [a, b, c] indicates when a was b (t/f) in the ground truth, how many times was c predicted? Then true negatives are n_samples - [a, False, a].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting something like the table 7 of this article. However, this will be hard to represent with a numpy array. Sorry for the noise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman concerning your first proposal, do you suggest we implement the shape [n_classes, 2, 2] and re-implement it if it's deprecated or stick with [n_classes, 4], can we use a structured array for example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The marginals don't need to be shown, so it's possible to implement with a numpy array as long as that additional dimension signified there by / (it seems) is represented explicitly as an axis of size 2 (or via returning a pair of n_classes x n_classes matrices).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Magellanea the only thing that would be deprecated is whether or not the user can access it with True and False rather than 1 and 0. It's still, I think, a more explicit shape.

@Magellanea
Copy link
Author

@jnothman @arjoly
I've modified it to use a structured array, now you can call it
a = binarized_multilabel_confusion_matrix(y_true, y_pred)
you can access each by label
so a['tp'] will return an array of true positives for each class

columns = np.repeat(range(0, n_labels), 4)
mcm = coo_matrix((data, (rows, columns)), shape=(4, n_labels)).\
toarray()
return (np.array(list(map(tuple, np.transpose(mcm))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the use of struct arrays for return values, but in any case, list(map(tuple, ...)) is much more than needed. You should just be able to use view on the existing array and provide your struct dtype.

@raghavrv
Copy link
Member

@Magellanea Are you planning to continue work on this? If not I could cherry-pick your commits and finish it up, if you are busy?

@Magellanea
Copy link
Author

Sure you can, I'm sorry I've been busy lately
On Sun, Jan 18, 2015 at 2:30 AM, ragv notifications@github.com wrote:

@Magellanea https://github.com/Magellanea Are you planning to continue
work on this? If not I could cherry-pick your comments and finish it up, if
you are busy?


Reply to this email directly or view it on GitHub
#3614 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants