Skip to content

Evaluation metrics for multi label classifiers #558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Jan 16, 2012 · 13 comments
Closed

Evaluation metrics for multi label classifiers #558

amueller opened this issue Jan 16, 2012 · 13 comments
Milestone

Comments

@amueller
Copy link
Member

As far as I can tell, these are completely missing.
I feel this makes the multi label classifiers much less useful.

I am not sure what common measure there are but two that seem natural to me would be Hamming loss (how many classes per example were correct?) and 0-1 loss (for how many examples were all classes correct).

At least these are two losses that are commonly used in structured prediction afaik.

@mblondel
Copy link
Member

We thought that multi-label metrics were warranting another pull-request as the multi-label branch had been pending for a long time already. Plus, the classifiers are useful even without evaluation metric... :)

Yes, hamming loss is a popular evaluation metric. Others are precision and recall. I implemented them in the test_multiclass.py file but they need to be vectorized and merged into the metrics module. A question is whether we should have dedicated functions (say multilabel_precision and multilabel_recall) or just "overload" the existing ones.

Also note that we need to support both lists of tuples and label indicator matrices for input. Both formats are currently supported by LabelBinarizer (and thus by OneVsRest) and both have their pros and cons.

@amueller
Copy link
Member Author

I didn't know you are working on this. Please don't take this as criticism on the (merging of the) multi-label branch.

I just wanted to raise awareness that this is a feature that still needs to be implemented.

The question about whether to create new functions or use the old ones is a good one.

As far as I can tell, the current score functions don't die gracefully when given multilabel input.
(zero_one thinks everything is wrong, precision raises a "unhashable type" error)
I think we have to do some better input validation any way, since not all classification metrics will support multi label classification.

I think I would prefer separate functions for multilabel and maybe branch from the existing functions if necessary/possible.

@mblondel
Copy link
Member

No sweat. I added a smiley at the end of my first paragraph :p

@amueller
Copy link
Member Author

:)

@satra
Copy link
Member

satra commented Jan 16, 2012

could either of you elaborate on the difference between multilabel and multiclass? are these synonymous or not?

we were working on multiclass metrics (#443) till we ran into possible issues with delayed initialization of these metrics for cross-validation and other testing i.e. grid searches.

@mblondel
Copy link
Member

we were working on multiclass metrics (#443) till we ran into possible issues with delayed initialization of these metrics for cross-validation and other testing i.e. grid searches.

Multi-label is when an instance can be labeled with 0, 1 or more
labels. For example, a newspaper article can be labeled with both
"economy" and "politics".

@amueller
Copy link
Member Author

@satra:
There is an explanation at the top of http://scikit-learn.org/dev/modules/multiclass.html.
Do you think this explanation is sufficiently clear and prominent?

@satra
Copy link
Member

satra commented Jan 16, 2012

@mblondel: thank you. now i am all squared and the metrics additions do not cover this.

@satra
Copy link
Member

satra commented Jan 16, 2012

@amueller thank you. i think the docs are good ( i should read the docs more! ). they do define multilabel and multiclass.

does this explictly mean a multiclass or can that module also do multilabel: "For example, it is possible to use these estimators to turn a binary classifier or a regressor into a multiclass classifier."? and on a side note, perhaps the docs should also point to the tree module as also being able to do multiclass. (sorry for spamming this thread - i'll stop now).

@amueller
Copy link
Member Author

I am not sure if I understood your question.
The one-vs-rest and one-vs-one meta estimators can generate a multi-class or multi label (only ovr) classifier from any
given binary classifier.
Was that the question? If not, could you reformulate?

@satra
Copy link
Member

satra commented Jan 16, 2012

i meant to ask whether the following sentence in the docs should be augmented to say:

"For example, it is possible to use these estimators to turn a binary classifier or a regressor into a multiclass or multilabel classifier."

or whether those estimators could only turn things multiclass.

from your reply it seems it would be good to point out that only ovr can be used with a binary classifier to do multilabel.

@arjoly
Copy link
Member

arjoly commented Jan 11, 2013

I have an implementation for several measures in multilabel classification.
However I had to hack the label binarizer.

To avoid to write one function per format, I wrote several check functions see this gist.

Am I doing it wrong?

@arjoly
Copy link
Member

arjoly commented Jul 22, 2013

Now, there is several multi-label metrics.

@arjoly arjoly closed this as completed Jul 22, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants