Skip to content

[MRG] Multi-label metrics: accuracy, hamming loss and zero-one loss #1606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 43 commits into from

Conversation

arjoly
Copy link
Member

@arjoly arjoly commented Jan 22, 2013

This pull request intents to bring 3 new features:

  • a tested and generalized unique_labels function;
  • multi-labels support for accuracy_score and zero_one_loss functions;
  • the hamming loss metrics (hamming_loss ) with multi-label support.

Before merging, I would like to suggest to add a new module where multi-labels utilities such as unique_labels and _is_label_indicator_matrix are collected.

Furthermore, I have to re-organise (cosmit) some of the function in a multi-label categories in metrics.py. But I will wait that reviews are done.

This pull request also tackles issue #558. Reviews and comments are welcome! :-)


Parameters
----------
y_true : array-like or list of labels or label binary matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very happy that you decided to support both list of labels and label binary matrix. Regarding the name of the latter, maybe label indicator matrix or class membership matrix would be more explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks "label indicator matrix" is better name than "label binary matrix".

@mblondel
Copy link
Member

I think I'm +1 for moving _is_label_indicator_matrix and _is_multilabel to the metrics module (and of course, to make them public).

@arjoly
Copy link
Member Author

arjoly commented Jan 22, 2013

Since those function don't assess the performance of an estimator, I am not sure that the metrics module is the best place. I was thinking about a sklearn.multilabel module, a sklearn.utils.multilabels module or to put those functions in sklearn.multiclass.

@larsmans
Copy link
Member

Let's put them in sklearn.multiclass.

@mblondel
Copy link
Member

+1 for multiclass

---------
- :func:`metrics.accuracy_score` and :func:`metrics.zero_one_loss` support
multi-label classification. A new metric :func:`metrics.hamming_loss` is
added with mullti-label support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add your name here: credit where it belongs!

@GaelVaroquaux
Copy link
Member

+1 for multiclass too.

@arjoly
Copy link
Member Author

arjoly commented Jan 23, 2013

When I put is_label_indicator_matrix and is_multilabel into the multiclass, I got circular import.
Do you advise to do lazy import?

@mblondel
Copy link
Member

Another possible place would be in the utils.

@amueller
Copy link
Member

what is the circle?

@arjoly
Copy link
Member Author

arjoly commented Jan 23, 2013

what is the circle?

In preprocessing, LabelBinarizer needs is_multilabel and is_label_indicator_matrix that are (will be) in multiclass.
In multiclass, unique_classes needs LabelBinarizer in preprocessing.

@arjoly
Copy link
Member Author

arjoly commented Jan 23, 2013

Maybe the best place is in preprocess. The narrative doc says:

The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators.

So it would be logical to find function check or analyze raw data.

@GaelVaroquaux
Copy link
Member

In preprocessing, LabelBinarizer needs is_multilabel and
is_label_indicator_matrix that are in multiclass.
In multiclass, unique_classes needs LabelBinarizer in preprocessing.

OK, I thank that tells me that we need to move things in utils.

@arjoly
Copy link
Member Author

arjoly commented Jan 23, 2013

All right! I will create a new utils module.

@arjoly
Copy link
Member Author

arjoly commented Jan 23, 2013

There is now a sklearn.utils.multiclass (to rename to sklearn.utils.multilabels?).

I have pull the unique_labels functionality from LabelBinarizer and concentrate everything in unique_labels to get rid of the circular import problem.

@mblondel
Copy link
Member

Could you add multilabel support to precision / recall / f1 score? Once this is done, the multilabel tests in the multiclass module can be updated to use the metrics directly:
https://github.com/arjoly/scikit-learn/blob/ed98486d0c6b0072afe3b8b96a764037c90d2ad5/sklearn/tests/test_multiclass.py#L35

@arjoly
Copy link
Member Author

arjoly commented Jan 23, 2013

I intended to do that in my next pull request.
But ok, I will have a look at that tomorrow.

@amueller
Copy link
Member

+1 for separate pr

@arjoly
Copy link
Member Author

arjoly commented Jan 24, 2013

+1 for separate pr

The voice of the reason: small and reviewable pr.

Don't worry @mblondel I intend to add an another pr with precision, recall and f-score.
It is pretty high on my todo list.

Perhaps one thing that could change is the name of the new utils module:

  • sklearn.utils.classification
  • sklearn.utils.multilabels
  • sklearn.utils.multiclass

@arjoly
Copy link
Member Author

arjoly commented Jan 24, 2013

I rebase on top of master.

@mblondel
Copy link
Member

No worries.

Could you discuss the relationship between hamming loss and zero-one loss in the docstring? Thanks.

@arjoly
Copy link
Member Author

arjoly commented Jan 25, 2013

@mblondel I think that I have taken your remarks into account.

By the way, I add some more invariance tests.

@mblondel
Copy link
Member

In the multiclass (not multilabel) case, they are the same, right?

@arjoly
Copy link
Member Author

arjoly commented Jan 25, 2013

No, they differ. In the hamming loss, you divide each error by the number of labels.

One small example

In [22]: y2 = np.random.randint(0, 4, size=(5, ))

In [23]: y1 = np.random.randint(0, 4, size=(5, ))

In [24]: y1
Out[24]: array([2, 0, 3, 2, 2])

In [25]: y2
Out[25]: array([3, 1, 2, 1, 2])

In [26]: hamming_loss(y1, y2)
Out[26]: 0.40000000000000002

In [27]: zero_one_loss(y1, y2)
Out[27]: 0.80000000000000004

But thinking about it, the hamming loss is always smaller than the zero one loss. I will correct this.

@mblondel
Copy link
Member

Did you decide to add this normalization or is it always implemented like this in multilabel papers? http://en.wikipedia.org/wiki/Hamming_distance uses the unormalized count.

@mblondel
Copy link
Member

I'm asking because this is important that our implementation of the metrics is as standard as possible. We could add a normalize option (the question is, what should be the default value?).

@arjoly
Copy link
Member Author

arjoly commented Jan 25, 2013

The following papers agree on the normalization with the number labels:

  1. Grigorios Tsoumakas, Ioannis Katakis. Multi-Label Classification: An Overview. International Journal of Data Warehousing & Mining, 3(3), 1-13, July–September 2007.
  2. Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank. Classifier Chains for Multi-label Classification. Machine Learning Journal. Springer. Vol. 85(3), (2011).
  3. Zhang, M.L. and Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning
  4. Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, Sašo Džeroski. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition. Vol. 45(9), (2012).
  5. Wei Gao and. Zhi-Hua Zhou. On the Consistency of. Multi-Label Learning. JMLR.

@mblondel
Copy link
Member

Great. Maybe you can cite the first one then.

@arjoly
Copy link
Member Author

arjoly commented Jan 25, 2013

Great. Maybe you can cite the first one then.

Done

@arjoly
Copy link
Member Author

arjoly commented Mar 2, 2013

I will have time this week to work on the precision, recall and F-measure metrics to support the multi-labels format. Furthemore, I would like to add the jaccard similiarty measure (an example based accuracy measure).

What do you advise to me? I will need some of the function in utils.multiclass.

@amueller
Copy link
Member

amueller commented Mar 2, 2013

Maybe do a PR on top of this one? We really should try to get this one in :-/

@arjoly
Copy link
Member Author

arjoly commented Mar 2, 2013

If I do a pr on top of this one, will I have problem if I rebase this one on top of master?

@larsmans
Copy link
Member

larsmans commented Mar 2, 2013

This one merges cleanly, no reason to rebase (though I'd like to rebase -i it prior to the actual merge to squash some commits).

You can branch off current master, merge this branch into your new branch, then add the functionality you want.

@arjoly
Copy link
Member Author

arjoly commented Mar 2, 2013

This one merges cleanly, no reason to rebase (though I'd like to rebase -i it prior to the actual merge to squash some commits).

You can branch off current master, merge this branch into your new branch, then add the functionality you want.

I will do as you suggest! Thanks!

@larsmans
Copy link
Member

larsmans commented Mar 2, 2013

Btw, it's easier if you first squash the commits using a rebase -i. It doesn't have to be all in one commit, but lots of microcommits make it harder to cherry-pick when an intermediate release is done.

@amueller
Copy link
Member

amueller commented Mar 2, 2013

Or someone can give this one a second +1 and we merge it ;)

@larsmans
Copy link
Member

larsmans commented Mar 2, 2013

I'll try to review the PR this afternoon. I'll merge it if I think it's ready.

@amueller
Copy link
Member

amueller commented Mar 2, 2013

awesome, thanks :)

@larsmans
Copy link
Member

larsmans commented Mar 2, 2013

We've got an inconsistency in the documentation. The dev docs say utils is off-limits to end users, while the multiclass docs now advises their use. I'm going to move the latter remark to the dev docs.

@@ -599,13 +667,16 @@ classification loss (:math:`L_{0-1}`) over :math:`n_{\text{samples}}`. By
defaults, the function normalizes over the sample. To get the sum of the
:math:`L_{0-1}`, set ``normalize`` to ``False``.

In multilabel classification, the :func:`zero_one_loss` function corresponds
to the subset zero one loss: the subset of labels must be correctly predict.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this sentence.

@larsmans
Copy link
Member

larsmans commented Mar 2, 2013

Ok, pushed to master after squashing. Thanks @arjoly for tackling this important problem: evaluation can be dull and it can make your head hurt, but it's crucial for a machine learning toolkit.

@larsmans larsmans closed this Mar 2, 2013
@arjoly
Copy link
Member Author

arjoly commented Mar 2, 2013

Thanks to all reviewers !!!

@larsmans
Copy link
Member

larsmans commented Mar 2, 2013

We have a failure on the Numpy 1.3/Scipy 0.7 build bot: ValueError: 0-d arrays can't be concatenated. It seems limited to the model_selection.rst doctests.

@arjoly
Copy link
Member Author

arjoly commented Mar 3, 2013

I will have a look this afternoon.

@arjoly
Copy link
Member Author

arjoly commented Mar 3, 2013

I am not able to install numpy 1.3 with python 2.6. :-$
So it is a bit hard to investigate :-(

@ogrisel
Copy link
Member

ogrisel commented Mar 3, 2013

I think you can reproduce it with numpy 1.3 on python 2.7. I don't see why it would be specific to 2.6.

@larsmans
Copy link
Member

larsmans commented Mar 4, 2013

No, I suspect it's Numpy-specific.

@arjoly
Copy link
Member Author

arjoly commented Mar 4, 2013

I am working on it.
I have the proper numpy version now, but the installation of scipy 0.7 failed...

@larsmans
Copy link
Member

larsmans commented Mar 4, 2013

I bet SciPy has very little to do with this, so you can try a later version first to see if you get the failures.

(Otherwise, try finding an old version of a Linux distro that has these versions and install it in a VM.)

@arjoly
Copy link
Member Author

arjoly commented Mar 4, 2013

I haven't been able to install to have python 2.7, numpy 1.3 with scipy 0.7.
I got the following error constantly:

error: Command "g++ -pthread -fno-strict-aliasing -I/home/ajoly/opt/local/include -DNDEBUG
-g -fwrapv -O3 -Wall  -fPIC -I/home/ajoly/git/numpy-1.3/numpy/core/include 
-I/home/ajoly/opt/python/include/python2.7 -c scipy/sparse/sparsetools/csr_wrap.cxx
 -o build/temp.linux-i686-2.7/scipy/sparse/sparsetools/csr_wrap.o" failed with exit status 1

Same with scipy 0.6 or any scipy 0.7.x version...

With python 2.6, I am not able to install numpy 1.3 due to a problem with unicode character (ucs2 and ucs4 problem).

Lastly, scipy 0.8 need at least numpy 1.4...
This makes me crazy...


I suppose that a simple np.asarray could solve the problem, but I am not able to investigate.
Or to handle the multiclass directly ...

@arjoly
Copy link
Member Author

arjoly commented Mar 4, 2013

Any suggestion for a linux distro with the required package (and if possible easy installation of) python 2.6, numpy 1.3 and scipy 0.7?

@amueller
Copy link
Member

amueller commented Mar 4, 2013

Ubuntu lucid should do.

@arjoly arjoly deleted the multilabel-metrics branch March 7, 2013 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants