Skip to content

[MRG] Multi class label documentation #2207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 63 additions & 23 deletions doc/modules/multiclass.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,23 +7,28 @@ Multiclass and multilabel algorithms

.. currentmodule:: sklearn.multiclass

This module implements multiclass and multilabel learning algorithms:
The :mod:`sklearn.multiclass` module implements *meta-estimators* to perform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it say somewhere very high up that you don't need this meta-estimators to do multi-class classification as all classifiers have build-in multiclass support

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most estimators, I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all. without exception. if there is an exception, it should be listed below as not supporting multiclass imho.

``multiclass`` and ``multilabel`` classification. Those meta-estimators are
meant to turn a binary classifier or a regressor into a multiclass/label classifier.

- **Multiclass classification** means classification with more than two classes;
e.g., classify a set of images of fruits which may be oranges, apples, or pears.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you need a newline after the item?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, you can add a word on binary classification?


- **Multilabel classification** assigns to each sample a set of target labels.
This can be thought of predicting properties of a data-point that are not mutually
exclusive, such as topics that are relevant for a document. A text might be about any
of religion, politics, finance or education at the same time.

The estimators provided in this module are meta-estimators.
They require a base estimator to be provided in their construcor:

- one-vs-the-rest / one-vs-all
- one-vs-one
- error correcting output codes

Multiclass classification means classification with more than two classes.
Multilabel classification is a different task, where a classifier is used to
predict a set of target labels for each instance; i.e., the set of target
classes is not assumed to be disjoint as in ordinary (binary or multiclass)
classification. This is also called any-of classification.
.. warning::

The estimators provided in this module are meta-estimators: they require a base
estimator to be provided in their constructor. For example, it is possible to
use these estimators to turn a binary classifier or a regressor into a
multiclass classifier. It is also possible to use these estimators with
multiclass estimators in the hope that their accuracy or runtime performance
improves.
One-vs-all is currently the only meta-estimator usable for multilabel classification.

.. note::

Expand All @@ -32,13 +37,34 @@ improves.
multiclass classification out-of-the-box. Below is a summary of the
classifiers supported in scikit-learn grouped by the strategy used.

- Inherently multiclass: :ref:`Naive Bayes <naive_bayes>`, :class:`sklearn.lda.LDA`,
:ref:`Decision Trees <tree>`, :ref:`Random Forests <forest>`
- Inherently multiclass: :ref:`Naive Bayes <naive_bayes>`, :ref:`LDA and QDA <lda_qda>`,
:ref:`Decision Trees <tree>`, :ref:`Ensembles of trees <ensemble>`, :ref:`Nearest neighbors <neighbors>`
- One-Vs-One: :class:`sklearn.svm.SVC`.
- One-Vs-All: :class:`sklearn.svm.LinearSVC`,
:class:`sklearn.linear_model.LogisticRegression`,
:class:`sklearn.linear_model.SGDClassifier`,
:class:`sklearn.linear_model.RidgeClassifier`.
- One-Vs-All: all linear models except :class:`sklearn.svm.SVC`.


Multilabel utilities
====================

Multilabel learning requires a specific data structure to assign multiple labels
to the same sample. The One-vs-Rest meta-classifier currently supports two formats.
The first one is basically a sequence of sequences, and the second one is a 2d binary array
of shape (n_samples, n_classes) where non-zero elements correspond to the labels.

:class:`sklearn.preprocessing.label_binarize` and :class:`sklearn.preprocessing.LabelBinarizer`
are helper functions that can convert one format to the other::

>>> from sklearn.datasets import make_multilabel_classification
>>> from sklearn.preprocessing import LabelBinarizer
>>> X, Y = make_multilabel_classification(n_samples=5)
>>> Y
([0], [1], [2, 3, 4, 1], [3], [1])
>>> LabelBinarizer().fit_transform(Y)
array([[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 1, 1, 1],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0]])


One-Vs-The-Rest
Expand All @@ -52,7 +78,12 @@ classifiers are needed), one advantage of this approach is its
interpretability. Since each class is represented by one and one classifier
only, it is possible to gain knowledge about the class by inspecting its
corresponding classifier. This is the most commonly used strategy and is a fair
default choice. Below is an example::
default choice.

Multiclass learning
-------------------

Below is an example of multiclass learning using OvR::

>>> from sklearn import datasets
>>> from sklearn.multiclass import OneVsRestClassifier
Expand All @@ -68,8 +99,8 @@ default choice. Below is an example::
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Multilabel learning with OvR
----------------------------
Multilabel learning
-------------------

:class:`OneVsRestClassifier` also supports multilabel classification.
To use this feature, feed the classifier a list of tuples containing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use also a binary indicator format.

I think that a small example would be nice which would basically show how to use that estimator with either of list of tuple or binary indicator format.

Expand Down Expand Up @@ -98,7 +129,12 @@ O(n_classes^2) complexity. However, this method may be advantageous for
algorithms such as kernel algorithms which don't scale well with
`n_samples`. This is because each individual learning problem only involves
a small subset of the data whereas, with one-vs-the-rest, the complete
dataset is used `n_classes` times. Below is an example::
dataset is used `n_classes` times.

Multiclass learning
-------------------

Below is an example of multiclass learning using OvO::

>>> from sklearn import datasets
>>> from sklearn.multiclass import OneVsOneClassifier
Expand Down Expand Up @@ -150,7 +186,11 @@ In practice, however, this may not happen as classifier mistakes will
typically be correlated. The error-correcting output codes have a similar
effect to bagging.

Example::

Multiclass learning
-------------------

Below is an example of multiclass learning using Output-Codes::

>>> from sklearn import datasets
>>> from sklearn.multiclass import OutputCodeClassifier
Expand Down