Skip to content

Add averaging option to AMI and NMI #11124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 17, 2018

Conversation

aryamccarthy
Copy link
Contributor

@aryamccarthy aryamccarthy commented May 23, 2018

Reference Issues/PRs

See #10308; this is a first step toward eventually deprecating one behavior and making their behavior consistent.

What does this implement/fix?

Background: The measures AMI, NMI, and V-measure are intimately related. Each is a normalized version of mutual information, and AMI incorporates adjustment for chance.

  • AMI, NMI, and V-Measure use different strategies for normalizing: the arithmetic mean, geometric mean, and max (i.e. infinity-norm) of the two clusterings' entropies. (V-measure is NMI with arithmetic mean.)
  • This makes the measures difficult to directly compare.
  • Added switch for NMI and AMI to allow choice of normalization
  • Long-term plan: unify behavior. Warning about future deprecation.

Leave current behavior unchanged
Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests, otherwise great!

"""Normalized Mutual Information between two clusterings.

Normalized Mutual Information (NMI) is an normalization of the Mutual
Information (MI) score to scale the results between 0 (no mutual
information) and 1 (perfect correlation). In this function, mutual
information is normalized by ``sqrt(H(labels_true) * H(labels_pred))``.
information is normalized by some generalized mean of ``H(labels_true)``
and ``H(labels_pred))``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe either mention the default here or say "as defined by average_method".

How to compute the normalizer in the denominator. Possible options
are 'min', 'sqrt', 'sum', and 'max'.
If None, 'sqrt' will be used, matching the behavior of
`v_measure_score`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single backticks do nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add ..versionadded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure what ..versionadded is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so which version would that be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.20

@amueller
Copy link
Member

test failures are flake8. you should run flake8 in your editor.

@amueller
Copy link
Member

oh this is related I just saw #8645

@@ -682,6 +705,12 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
<https://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_

"""
if average_method is None:
warnings.warn("The behavior of AMI will change in a future version. "
"To match the behavior of 'v_measure_score', AMI will "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and NMI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a separate warning in the NMI function; perhaps I should mention both functions in each warning? V-measure is the hardest to change since it's computed differently, and we have to change 2 of the 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought we changed only one. Isn't V-measure using sqrt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly enough, no. Check test_v_measure_and_mutual_information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was going by #10308 (comment) and memory. But so after this NMI and v_measure_score will be identical, right? Are they identical in all cases? Then we should either add it as an alias or possibly deprecate and just mention in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are identical in all cases—the formulas are equivalent. Let's not alias due to homogeneity_completeness_v_measure. If you're computing all three, the last is simple to draw from the first two; otherwise, the computation is wasteful. (Then again, to my knowledge V-measure never caught on. People picked up too quickly on the fact that it's just repackaging a measure proposed >20 years beforehand. Do we even need to keep it?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least make it very clear in the doc. We can consider deprecating it, maybe open an issue for discussion. Can you add a note about the identity in the docstring of v-measure and NMI and the user guide of v-measure and nmi and homogeneity_completeness_v_measure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in latest commit.

@@ -700,17 +729,21 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
emi = expected_mutual_information(contingency, n_samples)
# Calculate entropy for each labeling
h_true, h_pred = entropy(labels_true), entropy(labels_pred)
ami = (mi - emi) / (max(h_true, h_pred) - emi)
normalizer = _generalized_average(h_true, h_pred, average_method)
print(normalizer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, whoops!

* Correct the NMI and AMI descriptions in docs

* Update docstrings due to averaging changes

- V-measure
- Homogeneity
- Completeness
- NMI
- AMI
@aryamccarthy
Copy link
Contributor Author

Looks ready to squash and merge—the fix is implemented and the tests passed. @amueller ?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so fast ;)

@@ -1185,7 +1179,7 @@ following equation, from Vinh, Epps, and Bailey, (2009). In this equation,
Using the expected value, the adjusted mutual information can then be
calculated using a similar form to that of the adjusted Rand index:

.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\max(H(U), H(V)) - E[\text{MI}]}
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that mean is configurable and varies in the literature should be discussed here, perhaps with some notes on when one is more appropriate than another

c, d = 12, 12
means = [_generalized_average(c, d, method) for method in methods]
assert_equal(means[0], means[1])
assert_equal(means[1], means[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer use nosetests. Use bare assert instead of such helper functions in all new code

@jnothman
Copy link
Member

Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

* Update v0.20.rst

* Update test_supervised.py

* Update clustering.rst
@aryamccarthy
Copy link
Contributor Author

Mission accomplished :)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit confusing. The normalising constant is always the max of some elementwise mean of U and V.

sqrt and sum don't make sense as names of means: call them geometric and arithmetic

@aryamccarthy
Copy link
Contributor Author

aryamccarthy commented May 27, 2018 via email

@jnothman
Copy link
Member

jnothman commented May 28, 2018 via email

@aryamccarthy
Copy link
Contributor Author

Checking again—is this ready to pull?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet looked at tests

- Added control over the normalizer in
:func:`metrics.normalized_mutual_information_score` and
:func:`metrics.adjusted_mutual_information_score` via the ``average_method``
parameter. In a future version, the default normalizer for each will become
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this future version 0.22, and put this changing default value in to the API Changes Summary

Copy link
Contributor Author

@aryamccarthy aryamccarthy Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changing default value should be explained in the API Changes Summary for 0.22, right? I don't see that file yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a section on deprecations for 0.20 in the API changes Summary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the text be here? Maybe "In metrics.normalized_mutual_information_score and
metrics.adjusted_mutual_information_score, warn that average_method
will have a new default value. In version 0.22, the default normalizer for each
will become the arithmetic mean of the entropies of each clustering."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say "In ... added the parameter average_method ... The default averaging method will change from (whatever it is in each) to arithmetic in 0.22``. I think?

if average_method == "min":
return max(min(U, V), 1e-10)
elif average_method == "geometric":
return max(np.sqrt(U * V), 1e-10) # Avoids zero-division error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but you'll land up with near-infinite measures anyway?? Maybe we're better off warning the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps use eps defined as np.finfo('float64').eps instead of this arbitrary value.

It's also awkward having this comment here and not the first time this eps is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the normalizer = max(normalizer, eps) should happen in NMI, actually, not here. It's not directly applicable in AMI which should check for normalizer - emi == 0.

I think in the case that the denominator is 0, we should be issuing at least a warning, whether or not we then return a finite (backwards compatible in NMI) or infinite value.

@@ -764,6 +812,12 @@ def normalized_mutual_info_score(labels_true, labels_pred):
0.0

"""
if average_method is None:
warnings.warn("The behavior of NMI will change in a future version. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.22

.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]}

For normalized mutual information and adjusted mutual information, the normalizing
value is typically some mean of the entropies of each clustering. Various means exist,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it is controlled by average_method in our implementatin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I like "some mean" because in probability theory there is only one mean. Maybe "some aggregate"? Also @jnothman's comment is not addressed, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right—It's some generalized mean. The comment slipped through the cracks; I'll address it with the next update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it didn't slip through. It's addressed at the end of that paragraph.

if average_method is None:
warnings.warn("The behavior of NMI will change in a future version. "
"To match the behavior of 'v_measure_score', NMI will "
"use arithmetic mean by default."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State "average_method='arithmetic' by default"

elif average_method == "arithmetic":
return max(np.mean([U, V]), 1e-10)
elif average_method == "max":
return max(U, V)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could still be 0 in theory

average_method : string or None, optional (default: None)
How to compute the normalizer in the denominator. Possible options
are 'min', 'geometric', 'arithmetic', and 'max'.
If None, 'max' will be used. This is likely to change in a future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't say likely to. Say it will change to arithmetic in 0.22

average_method : string or None, optional (default: None)
How to compute the normalizer in the denominator. Possible options
are 'min', 'geometric', 'arithmetic', and 'max'.
If None, 'geometric' will be used. This is likely to change in a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

@aryamccarthy
Copy link
Contributor Author

@jnothman or @amueller, is this ready?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this looks good now.

@aryamccarthy
Copy link
Contributor Author

Ah, then pull away! ;)

@jnothman
Copy link
Member

jnothman commented Jun 9, 2018

We require two approvals before merge (sorry!)

Hopefully @amueller can give this another glance.

@aryamccarthy
Copy link
Contributor Author

@amueller?

@jnothman
Copy link
Member

jnothman commented Jun 26, 2018 via email

@aryamccarthy
Copy link
Contributor Author

Bump @qinhanmin2014

@aryamccarthy
Copy link
Contributor Author

Bump @qinhanmin2014 @jnothman

@jnothman
Copy link
Member

jnothman commented Jul 12, 2018 via email

@qinhanmin2014
Copy link
Member

Apologies for the delay and thanks @aryamccarthy for your great work.
I'll mark this as 0.20 to help you attract reviewers. For me, I can only promise to give a review after the release.

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably lots of deprecation warnings in the tests now. Can you please either catch them or explicitly pass the new parameter? (do we have a standard procedure for this btw? Is it documented?)

.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]}

For normalized mutual information and adjusted mutual information, the normalizing
value is typically some mean of the entropies of each clustering. Various means exist,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I like "some mean" because in probability theory there is only one mean. Maybe "some aggregate"? Also @jnothman's comment is not addressed, right?

value is typically some mean of the entropies of each clustering. Various means exist,
and no firm rules exist for preferring one over the others. The decision is largely
a field-by-field basis; for instance, in community detection, the arithmetic mean is
most common. Yang et al. (2016) found that each normalizing method provided
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to reference for citation?

"qualitatively similar behaviours". In our implementation, this is
controlled by the ``average_method`` parameter.

Vinh et al. (2010) named variants of NMI and AMI by their averaging method. Their
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to reference for citation?

@@ -1249,7 +1261,8 @@ Their harmonic mean called **V-measure** is computed by
0.51...

The V-measure is actually equivalent to the mutual information (NMI)
discussed above normalized by the sum of the label entropies [B2011]_.
discussed above normalized by the arithmetic mean of the label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say "when normalized" or "with the aggregation function being the arithmetic mean"?

@@ -121,6 +121,12 @@ Metrics
- Partial AUC is available via ``max_fpr`` parameter in
:func:`metrics.roc_auc_score`. :issue:`3273` by
:user:`Alexander Niederbühl <Alexander-N>`.
- Added control over the normalizer in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add blank line

@@ -531,6 +537,17 @@ Metrics
for :func:`metrics.roc_auc_score`. Moreover using ``reorder=True`` can hide bugs
due to floating point error in the input.
:issue:`9851` by :user:`Hanmin Qin <qinhanmin2014>`.
- In :func:`metrics.normalized_mutual_information_score` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line

normalized_mutual_info_score,
adjusted_mutual_info_score,
]
means = {"min", "geometric", "arithmetic", "max"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel kinda weird about calling these means.

Copy link
Contributor Author

@aryamccarthy aryamccarthy Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can switch to generalized_means if you prefer, but it has to be clear that this is a specific class of aggregations. Product, for instance, wouldn't work. Let me know and I'll ship all changes in one PR update.

@amueller
Copy link
Member

did you check that you're catching all deprecation warnings?

@aryamccarthy
Copy link
Contributor Author

With this?

with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=PendingDeprecationWarning)

Also someone pushed in a way that introduces merge conflicts in the whats-new file, so tests aren't passing.

@amueller
Copy link
Member

can you merge master to fix the conflict?

And I think we're using

      with ignore_warnings(category=DeprecationWarning):

right now.

@amueller
Copy link
Member

fixed the conflict

@aryamccarthy
Copy link
Contributor Author

For my future knowledge, what's the command to do that?

@amueller
Copy link
Member

I merged master into it and fixed the merge conflict and pushed into your branch.
So assuming you're on your branch and have the main repo as upstream remote

git pull upstream master
# fix merge conflict, commit
git push origin master

fyi you shouldn't send PRs from your master branch, you should ideally create a feature branch.

@amueller
Copy link
Member

can you please also add a test that there are deprecation warnings? Also see the updated docs at http://scikit-learn.org/dev/developers/contributing.html#change-the-default-value-of-a-parameter

@aryamccarthy
Copy link
Contributor Author

I've written (but not committed) that. My concern is catching all of the FutureWarnings. I'd have to infect the entire test_supervised.py file with with ignore_warnings(category=FutureWarning):.

@massich
Copy link
Contributor

massich commented Jul 17, 2018

LGTM

+1 to merge

@GaelVaroquaux
Copy link
Member

LGTM. Merging.

@GaelVaroquaux GaelVaroquaux merged commit 52b6a66 into scikit-learn:master Jul 17, 2018
@jnothman
Copy link
Member

jnothman commented Jul 18, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants