-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Evaluation metrics for multi label classifiers #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We thought that multi-label metrics were warranting another pull-request as the multi-label branch had been pending for a long time already. Plus, the classifiers are useful even without evaluation metric... :) Yes, hamming loss is a popular evaluation metric. Others are precision and recall. I implemented them in the Also note that we need to support both lists of tuples and label indicator matrices for input. Both formats are currently supported by |
I didn't know you are working on this. Please don't take this as criticism on the (merging of the) multi-label branch. I just wanted to raise awareness that this is a feature that still needs to be implemented. The question about whether to create new functions or use the old ones is a good one. As far as I can tell, the current score functions don't die gracefully when given multilabel input. I think I would prefer separate functions for multilabel and maybe branch from the existing functions if necessary/possible. |
No sweat. I added a smiley at the end of my first paragraph :p |
:) |
could either of you elaborate on the difference between multilabel and multiclass? are these synonymous or not? we were working on multiclass metrics (#443) till we ran into possible issues with delayed initialization of these metrics for cross-validation and other testing i.e. grid searches. |
Multi-label is when an instance can be labeled with 0, 1 or more |
@satra: |
@mblondel: thank you. now i am all squared and the metrics additions do not cover this. |
@amueller thank you. i think the docs are good ( i should read the docs more! ). they do define multilabel and multiclass. does this explictly mean a multiclass or can that module also do multilabel: "For example, it is possible to use these estimators to turn a binary classifier or a regressor into a multiclass classifier."? and on a side note, perhaps the docs should also point to the tree module as also being able to do multiclass. (sorry for spamming this thread - i'll stop now). |
I am not sure if I understood your question. |
i meant to ask whether the following sentence in the docs should be augmented to say: "For example, it is possible to use these estimators to turn a binary classifier or a regressor into a multiclass or multilabel classifier." or whether those estimators could only turn things multiclass. from your reply it seems it would be good to point out that only ovr can be used with a binary classifier to do multilabel. |
I have an implementation for several measures in multilabel classification. To avoid to write one function per format, I wrote several check functions see this gist. Am I doing it wrong? |
Now, there is several multi-label metrics. |
As far as I can tell, these are completely missing.
I feel this makes the multi label classifiers much less useful.
I am not sure what common measure there are but two that seem natural to me would be Hamming loss (how many classes per example were correct?) and 0-1 loss (for how many examples were all classes correct).
At least these are two losses that are commonly used in structured prediction afaik.
The text was updated successfully, but these errors were encountered: