Skip to content

Add averaging option to AMI and NMI #11124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 17, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 32 additions & 19 deletions doc/modules/clustering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1158,8 +1158,8 @@ Given the knowledge of the ground truth class assignments ``labels_true`` and
our clustering algorithm assignments of the same samples ``labels_pred``, the
**Mutual Information** is a function that measures the **agreement** of the two
assignments, ignoring permutations. Two different normalized versions of this
measure are available, **Normalized Mutual Information(NMI)** and **Adjusted
Mutual Information(AMI)**. NMI is often used in the literature while AMI was
measure are available, **Normalized Mutual Information (NMI)** and **Adjusted
Mutual Information (AMI)**. NMI is often used in the literature, while AMI was
proposed more recently and is **normalized against chance**::

>>> from sklearn import metrics
Expand Down Expand Up @@ -1212,17 +1212,11 @@ Advantages
for any value of ``n_clusters`` and ``n_samples`` (which is not the
case for raw Mutual Information or the V-measure for instance).

- **Bounded range [0, 1]**: Values close to zero indicate two label
- **Upper bound of 1**: Values close to zero indicate two label
assignments that are largely independent, while values close to one
indicate significant agreement. Further, values of exactly 0 indicate
**purely** independent label assignments and a AMI of exactly 1 indicates
indicate significant agreement. Further, an AMI of exactly 1 indicates
that the two label assignments are equal (with or without permutation).

- **No assumption is made on the cluster structure**: can be used
to compare clustering algorithms such as k-means which assumes isotropic
blob shapes with results of spectral clustering algorithms which can
find cluster with "folded" shapes.


Drawbacks
~~~~~~~~~
Expand Down Expand Up @@ -1274,15 +1268,15 @@ It also can be expressed in set cardinality formulation:

The normalized mutual information is defined as

.. math:: \text{NMI}(U, V) = \frac{\text{MI}(U, V)}{\sqrt{H(U)H(V)}}
.. math:: \text{NMI}(U, V) = \frac{\text{MI}(U, V)}{\text{mean}(H(U), H(V))}

This value of the mutual information and also the normalized variant is not
adjusted for chance and will tend to increase as the number of different labels
(clusters) increases, regardless of the actual amount of "mutual information"
between the label assignments.

The expected value for the mutual information can be calculated using the
following equation, from Vinh, Epps, and Bailey, (2009). In this equation,
following equation [VEB2009]_. In this equation,
:math:`a_i = |U_i|` (the number of elements in :math:`U_i`) and
:math:`b_j = |V_j|` (the number of elements in :math:`V_j`).

Expand All @@ -1295,7 +1289,19 @@ following equation, from Vinh, Epps, and Bailey, (2009). In this equation,
Using the expected value, the adjusted mutual information can then be
calculated using a similar form to that of the adjusted Rand index:

.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\max(H(U), H(V)) - E[\text{MI}]}
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that mean is configurable and varies in the literature should be discussed here, perhaps with some notes on when one is more appropriate than another


For normalized mutual information and adjusted mutual information, the normalizing
value is typically some *generalized* mean of the entropies of each clustering.
Various generalized means exist, and no firm rules exist for preferring one over the
others. The decision is largely a field-by-field basis; for instance, in community
detection, the arithmetic mean is most common. Each
normalizing method provides "qualitatively similar behaviours" [YAT2016]_. In our
implementation, this is controlled by the ``average_method`` parameter.

Vinh et al. (2010) named variants of NMI and AMI by their averaging method [VEB2010]_. Their
'sqrt' and 'sum' averages are the geometric and arithmetic means; we use these
more broadly common names.

.. topic:: References

Expand All @@ -1304,22 +1310,29 @@ calculated using a similar form to that of the adjusted Rand index:
Machine Learning Research 3: 583–617.
`doi:10.1162/153244303321897735 <http://strehl.com/download/strehl-jmlr02.pdf>`_.

* Vinh, Epps, and Bailey, (2009). "Information theoretic measures
* [VEB2009] Vinh, Epps, and Bailey, (2009). "Information theoretic measures
for clusterings comparison". Proceedings of the 26th Annual International
Conference on Machine Learning - ICML '09.
`doi:10.1145/1553374.1553511 <https://dl.acm.org/citation.cfm?doid=1553374.1553511>`_.
ISBN 9781605585161.

* Vinh, Epps, and Bailey, (2010). Information Theoretic Measures for
* [VEB2010] Vinh, Epps, and Bailey, (2010). "Information Theoretic Measures for
Clusterings Comparison: Variants, Properties, Normalization and
Correction for Chance, JMLR
http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf
Correction for Chance". JMLR
<http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf>

* `Wikipedia entry for the (normalized) Mutual Information
<https://en.wikipedia.org/wiki/Mutual_Information>`_

* `Wikipedia entry for the Adjusted Mutual Information
<https://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_

* [YAT2016] Yang, Algesheimer, and Tessone, (2016). "A comparative analysis of
community
detection algorithms on artificial networks". Scientific Reports 6: 30750.
`doi:10.1038/srep30750 <https://www.nature.com/articles/srep30750>`_.



.. _homogeneity_completeness:

Expand Down Expand Up @@ -1359,7 +1372,7 @@ Their harmonic mean called **V-measure** is computed by
0.51...

The V-measure is actually equivalent to the mutual information (NMI)
discussed above normalized by the sum of the label entropies [B2011]_.
discussed above, with the aggregation function being the arithmetic mean [B2011]_.

Homogeneity, completeness and V-measure can be computed at once using
:func:`homogeneity_completeness_v_measure` as follows::
Expand Down Expand Up @@ -1534,7 +1547,7 @@ Advantages
for any value of ``n_clusters`` and ``n_samples`` (which is not the
case for raw Mutual Information or the V-measure for instance).

- **Bounded range [0, 1]**: Values close to zero indicate two label
- **Upper-bounded at 1**: Values close to zero indicate two label
assignments that are largely independent, while values close to one
indicate significant agreement. Further, values of exactly 0 indicate
**purely** independent label assignments and a AMI of exactly 1 indicates
Expand Down
17 changes: 17 additions & 0 deletions doc/whats_new/v0.20.rst
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,12 @@ Metrics
:func:`metrics.roc_auc_score`. :issue:`3273` by
:user:`Alexander Niederbühl <Alexander-N>`.

- Added control over the normalization in
:func:`metrics.normalized_mutual_information_score` and
:func:`metrics.adjusted_mutual_information_score` via the ``average_method``
parameter. In version 0.22, the default normalizer for each will become
the *arithmetic* mean of the entropies of each clustering. :issue:`11124` by
:user:`Arya McCarthy <aryamccarthy>`.
- Added ``output_dict`` parameter in :func:`metrics.classification_report`
to return classification statistics as dictionary.
:issue:`11160` by :user:`Dan Barkhorn <danielbarkhorn>`.
Expand Down Expand Up @@ -768,6 +774,17 @@ Metrics
due to floating point error in the input.
:issue:`9851` by :user:`Hanmin Qin <qinhanmin2014>`.

- In :func:`metrics.normalized_mutual_information_score` and
:func:`metrics.adjusted_mutual_information_score`,
warn that ``average_method``
will have a new default value. In version 0.22, the default normalizer for each
will become the *arithmetic* mean of the entropies of each clustering. Currently,
:func:`metrics.normalized_mutual_information_score` uses the default of
``average_method='geometric'``, and :func:`metrics.adjusted_mutual_information_score`
uses the default of ``average_method='max'`` to match their behaviors in
version 0.19.
:issue:`11124` by :user:`Arya McCarthy <aryamccarthy>`.

- The ``batch_size`` parameter to :func:`metrics.pairwise_distances_argmin_min`
and :func:`metrics.pairwise_distances_argmin` is deprecated to be removed in
v0.22. It no longer has any effect, as batch size is determined by global
Expand Down
85 changes: 76 additions & 9 deletions sklearn/metrics/cluster/supervised.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,13 @@
# Thierry Guillemot <thierry.guillemot.work@gmail.com>
# Gregory Stupp <stuppie@gmail.com>
# Joel Nothman <joel.nothman@gmail.com>
# Arya McCarthy <arya@jhu.edu>
# License: BSD 3 clause

from __future__ import division

from math import log
import warnings

import numpy as np
from scipy import sparse as sp
Expand Down Expand Up @@ -59,6 +61,21 @@ def check_clusterings(labels_true, labels_pred):
return labels_true, labels_pred


def _generalized_average(U, V, average_method):
"""Return a particular mean of two numbers."""
if average_method == "min":
return min(U, V)
elif average_method == "geometric":
return np.sqrt(U * V)
elif average_method == "arithmetic":
return np.mean([U, V])
elif average_method == "max":
return max(U, V)
else:
raise ValueError("'average_method' must be 'min', 'geometric', "
"'arithmetic', or 'max'")


def contingency_matrix(labels_true, labels_pred, eps=None, sparse=False):
"""Build a contingency matrix describing the relationship between labels.

Expand Down Expand Up @@ -245,7 +262,9 @@ def homogeneity_completeness_v_measure(labels_true, labels_pred):

V-Measure is furthermore symmetric: swapping ``labels_true`` and
``label_pred`` will give the same score. This does not hold for
homogeneity and completeness.
homogeneity and completeness. V-Measure is identical to
:func:`normalized_mutual_info_score` with the arithmetic averaging
method.

Read more in the :ref:`User Guide <homogeneity_completeness>`.

Expand Down Expand Up @@ -444,7 +463,8 @@ def completeness_score(labels_true, labels_pred):
def v_measure_score(labels_true, labels_pred):
"""V-measure cluster labeling given a ground truth.

This score is identical to :func:`normalized_mutual_info_score`.
This score is identical to :func:`normalized_mutual_info_score` with
the ``'arithmetic'`` option for averaging.

The V-measure is the harmonic mean between homogeneity and completeness::

Expand All @@ -459,6 +479,7 @@ def v_measure_score(labels_true, labels_pred):
measure the agreement of two independent label assignments strategies
on the same dataset when the real ground truth is not known.


Read more in the :ref:`User Guide <homogeneity_completeness>`.

Parameters
Expand All @@ -485,6 +506,7 @@ def v_measure_score(labels_true, labels_pred):
--------
homogeneity_score
completeness_score
normalized_mutual_info_score

Examples
--------
Expand Down Expand Up @@ -617,7 +639,8 @@ def mutual_info_score(labels_true, labels_pred, contingency=None):
return mi.sum()


def adjusted_mutual_info_score(labels_true, labels_pred):
def adjusted_mutual_info_score(labels_true, labels_pred,
average_method='warn'):
"""Adjusted Mutual Information between two clusterings.

Adjusted Mutual Information (AMI) is an adjustment of the Mutual
Expand All @@ -626,7 +649,7 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
clusters, regardless of whether there is actually more information shared.
For two clusterings :math:`U` and :math:`V`, the AMI is given as::

AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [max(H(U), H(V)) - E(MI(U, V))]
AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [avg(H(U), H(V)) - E(MI(U, V))]

This metric is independent of the absolute values of the labels:
a permutation of the class or cluster label values won't change the
Expand All @@ -650,9 +673,17 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
labels_pred : array, shape = [n_samples]
A clustering of the data into disjoint subsets.

average_method : string, optional (default: 'warn')
How to compute the normalizer in the denominator. Possible options
are 'min', 'geometric', 'arithmetic', and 'max'.
If 'warn', 'max' will be used. The default will change to
'arithmetic' in version 0.22.

.. versionadded:: 0.20

Returns
-------
ami: float(upperlimited by 1.0)
ami: float (upperlimited by 1.0)
The AMI returns a value of 1 when the two partitions are identical
(ie perfectly matched). Random partitions (independent labellings) have
an expected AMI around 0 on average hence can be negative.
Expand Down Expand Up @@ -691,6 +722,12 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
<https://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_

"""
if average_method == 'warn':
warnings.warn("The behavior of AMI will change in version 0.22. "
"To match the behavior of 'v_measure_score', AMI will "
"use average_method='arithmetic' by default.",
FutureWarning)
average_method = 'max'
labels_true, labels_pred = check_clusterings(labels_true, labels_pred)
n_samples = labels_true.shape[0]
classes = np.unique(labels_true)
Expand All @@ -709,17 +746,29 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
emi = expected_mutual_information(contingency, n_samples)
# Calculate entropy for each labeling
h_true, h_pred = entropy(labels_true), entropy(labels_pred)
ami = (mi - emi) / (max(h_true, h_pred) - emi)
normalizer = _generalized_average(h_true, h_pred, average_method)
denominator = normalizer - emi
# Avoid 0.0 / 0.0 when expectation equals maximum, i.e a perfect match.
# normalizer should always be >= emi, but because of floating-point
# representation, sometimes emi is slightly larger. Correct this
# by preserving the sign.
if denominator < 0:
denominator = min(denominator, -np.finfo('float64').eps)
else:
denominator = max(denominator, np.finfo('float64').eps)
ami = (mi - emi) / denominator
return ami


def normalized_mutual_info_score(labels_true, labels_pred):
def normalized_mutual_info_score(labels_true, labels_pred,
average_method='warn'):
"""Normalized Mutual Information between two clusterings.

Normalized Mutual Information (NMI) is an normalization of the Mutual
Information (MI) score to scale the results between 0 (no mutual
information) and 1 (perfect correlation). In this function, mutual
information is normalized by ``sqrt(H(labels_true) * H(labels_pred))``.
information is normalized by some generalized mean of ``H(labels_true)``
and ``H(labels_pred))``, defined by the `average_method`.

This measure is not adjusted for chance. Therefore
:func:`adjusted_mustual_info_score` might be preferred.
Expand All @@ -743,13 +792,22 @@ def normalized_mutual_info_score(labels_true, labels_pred):
labels_pred : array, shape = [n_samples]
A clustering of the data into disjoint subsets.

average_method : string, optional (default: 'warn')
How to compute the normalizer in the denominator. Possible options
are 'min', 'geometric', 'arithmetic', and 'max'.
If 'warn', 'geometric' will be used. The default will change to
'arithmetic' in version 0.22.

.. versionadded:: 0.20

Returns
-------
nmi : float
score between 0.0 and 1.0. 1.0 stands for perfectly complete labeling

See also
--------
v_measure_score: V-Measure (NMI with arithmetic mean option.)
adjusted_rand_score: Adjusted Rand Index
adjusted_mutual_info_score: Adjusted Mutual Information (adjusted
against chance)
Expand All @@ -773,6 +831,12 @@ def normalized_mutual_info_score(labels_true, labels_pred):
0.0

"""
if average_method == 'warn':
warnings.warn("The behavior of NMI will change in version 0.22. "
"To match the behavior of 'v_measure_score', NMI will "
"use average_method='arithmetic' by default.",
FutureWarning)
average_method = 'geometric'
labels_true, labels_pred = check_clusterings(labels_true, labels_pred)
classes = np.unique(labels_true)
clusters = np.unique(labels_pred)
Expand All @@ -789,7 +853,10 @@ def normalized_mutual_info_score(labels_true, labels_pred):
# Calculate the expected value for the mutual information
# Calculate entropy for each labeling
h_true, h_pred = entropy(labels_true), entropy(labels_pred)
nmi = mi / max(np.sqrt(h_true * h_pred), 1e-10)
normalizer = _generalized_average(h_true, h_pred, average_method)
# Avoid 0.0 / 0.0 when either entropy is zero.
normalizer = max(normalizer, np.finfo('float64').eps)
nmi = mi / normalizer
return nmi


Expand Down
Loading