-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add averaging option to AMI and NMI #11124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Leave current behavior unchanged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs tests, otherwise great!
"""Normalized Mutual Information between two clusterings. | ||
|
||
Normalized Mutual Information (NMI) is an normalization of the Mutual | ||
Information (MI) score to scale the results between 0 (no mutual | ||
information) and 1 (perfect correlation). In this function, mutual | ||
information is normalized by ``sqrt(H(labels_true) * H(labels_pred))``. | ||
information is normalized by some generalized mean of ``H(labels_true)`` | ||
and ``H(labels_pred))``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe either mention the default here or say "as defined by average_method".
How to compute the normalizer in the denominator. Possible options | ||
are 'min', 'sqrt', 'sum', and 'max'. | ||
If None, 'sqrt' will be used, matching the behavior of | ||
`v_measure_score`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single backticks do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ..versionadded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure what ..versionadded
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://www.sphinx-doc.org/en/stable/markup/para.html#directive-versionadded
do git grep versionadded
to see how we use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so which version would that be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.20
test failures are flake8. you should run flake8 in your editor. |
oh this is related I just saw #8645 |
@@ -682,6 +705,12 @@ def adjusted_mutual_info_score(labels_true, labels_pred): | |||
<https://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_ | |||
|
|||
""" | |||
if average_method is None: | |||
warnings.warn("The behavior of AMI will change in a future version. " | |||
"To match the behavior of 'v_measure_score', AMI will " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and NMI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a separate warning in the NMI function; perhaps I should mention both functions in each warning? V-measure is the hardest to change since it's computed differently, and we have to change 2 of the 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought we changed only one. Isn't V-measure using sqrt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly enough, no. Check test_v_measure_and_mutual_information
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was going by #10308 (comment) and memory. But so after this NMI and v_measure_score will be identical, right? Are they identical in all cases? Then we should either add it as an alias or possibly deprecate and just mention in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are identical in all cases—the formulas are equivalent. Let's not alias due to homogeneity_completeness_v_measure
. If you're computing all three, the last is simple to draw from the first two; otherwise, the computation is wasteful. (Then again, to my knowledge V-measure never caught on. People picked up too quickly on the fact that it's just repackaging a measure proposed >20 years beforehand. Do we even need to keep it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least make it very clear in the doc. We can consider deprecating it, maybe open an issue for discussion. Can you add a note about the identity in the docstring of v-measure and NMI and the user guide of v-measure and nmi and homogeneity_completeness_v_measure
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in latest commit.
@@ -700,17 +729,21 @@ def adjusted_mutual_info_score(labels_true, labels_pred): | |||
emi = expected_mutual_information(contingency, n_samples) | |||
# Calculate entropy for each labeling | |||
h_true, h_pred = entropy(labels_true), entropy(labels_pred) | |||
ami = (mi - emi) / (max(h_true, h_pred) - emi) | |||
normalizer = _generalized_average(h_true, h_pred, average_method) | |||
print(normalizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, whoops!
* Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI
Looks ready to squash and merge—the fix is implemented and the tests passed. @amueller ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so fast ;)
@@ -1185,7 +1179,7 @@ following equation, from Vinh, Epps, and Bailey, (2009). In this equation, | |||
Using the expected value, the adjusted mutual information can then be | |||
calculated using a similar form to that of the adjusted Rand index: | |||
|
|||
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\max(H(U), H(V)) - E[\text{MI}]} | |||
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that mean is configurable and varies in the literature should be discussed here, perhaps with some notes on when one is more appropriate than another
c, d = 12, 12 | ||
means = [_generalized_average(c, d, method) for method in methods] | ||
assert_equal(means[0], means[1]) | ||
assert_equal(means[1], means[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer use nosetests. Use bare assert instead of such helper functions in all new code
Please add an entry to the change log at |
* Update v0.20.rst * Update test_supervised.py * Update clustering.rst
Mission accomplished :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit confusing. The normalising constant is always the max of some elementwise mean of U and V.
sqrt and sum don't make sense as names of means: call them geometric and arithmetic
Right—I wanted to stick to the notation started by Vinh et al and used elsewhere, but the names are awful. I can switch them.
…-am
On May 27, 2018, 6:46 PM -0400, Joel Nothman ***@***.***>, wrote:
@jnothman commented on this pull request.
I think this is a bit confusing. The normalising constant is always the max of some elementwise mean of U and V.
sqrt and sum don't make sense as names of means: call them geometric and arithmetic
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
You can provide a mapping from reasonable names to Vinh in the docs. Thanks
|
Checking again—is this ready to pull? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet looked at tests
doc/whats_new/v0.20.rst
Outdated
- Added control over the normalizer in | ||
:func:`metrics.normalized_mutual_information_score` and | ||
:func:`metrics.adjusted_mutual_information_score` via the ``average_method`` | ||
parameter. In a future version, the default normalizer for each will become |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this future version 0.22, and put this changing default value in to the API Changes Summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changing default value should be explained in the API Changes Summary for 0.22, right? I don't see that file yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a section on deprecations for 0.20 in the API changes Summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should the text be here? Maybe "In metrics.normalized_mutual_information_score
and
metrics.adjusted_mutual_information_score
, warn that average_method
will have a new default value. In version 0.22, the default normalizer for each
will become the arithmetic mean of the entropies of each clustering."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say "In ... added the parameter average_method
... The default averaging method will change from (whatever it is in each) to arithmetic in 0.22``. I think?
if average_method == "min": | ||
return max(min(U, V), 1e-10) | ||
elif average_method == "geometric": | ||
return max(np.sqrt(U * V), 1e-10) # Avoids zero-division error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but you'll land up with near-infinite measures anyway?? Maybe we're better off warning the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps use eps
defined as np.finfo('float64').eps
instead of this arbitrary value.
It's also awkward having this comment here and not the first time this eps is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the normalizer = max(normalizer, eps)
should happen in NMI, actually, not here. It's not directly applicable in AMI which should check for normalizer - emi == 0
.
I think in the case that the denominator is 0, we should be issuing at least a warning, whether or not we then return a finite (backwards compatible in NMI) or infinite value.
@@ -764,6 +812,12 @@ def normalized_mutual_info_score(labels_true, labels_pred): | |||
0.0 | |||
|
|||
""" | |||
if average_method is None: | |||
warnings.warn("The behavior of NMI will change in a future version. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.22
doc/modules/clustering.rst
Outdated
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]} | ||
|
||
For normalized mutual information and adjusted mutual information, the normalizing | ||
value is typically some mean of the entropies of each clustering. Various means exist, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it is controlled by average_method
in our implementatin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I like "some mean" because in probability theory there is only one mean. Maybe "some aggregate"? Also @jnothman's comment is not addressed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right—It's some generalized mean. The comment slipped through the cracks; I'll address it with the next update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it didn't slip through. It's addressed at the end of that paragraph.
if average_method is None: | ||
warnings.warn("The behavior of NMI will change in a future version. " | ||
"To match the behavior of 'v_measure_score', NMI will " | ||
"use arithmetic mean by default." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State "average_method='arithmetic' by default"
elif average_method == "arithmetic": | ||
return max(np.mean([U, V]), 1e-10) | ||
elif average_method == "max": | ||
return max(U, V) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could still be 0 in theory
average_method : string or None, optional (default: None) | ||
How to compute the normalizer in the denominator. Possible options | ||
are 'min', 'geometric', 'arithmetic', and 'max'. | ||
If None, 'max' will be used. This is likely to change in a future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't say likely to. Say it will change to arithmetic in 0.22
average_method : string or None, optional (default: None) | ||
How to compute the normalizer in the denominator. Possible options | ||
are 'min', 'geometric', 'arithmetic', and 'max'. | ||
If None, 'geometric' will be used. This is likely to change in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this looks good now.
Ah, then pull away! ;) |
We require two approvals before merge (sorry!) Hopefully @amueller can give this another glance. |
might be better to have someone else review. @qinhanmin2014 knows metrics,
so maybe...
|
Bump @qinhanmin2014 |
Bump @qinhanmin2014 @jnothman |
This might get some attention with the sprints next week, but probably not
from Hanmin.
|
Apologies for the delay and thanks @aryamccarthy for your great work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably lots of deprecation warnings in the tests now. Can you please either catch them or explicitly pass the new parameter? (do we have a standard procedure for this btw? Is it documented?)
doc/modules/clustering.rst
Outdated
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]} | ||
|
||
For normalized mutual information and adjusted mutual information, the normalizing | ||
value is typically some mean of the entropies of each clustering. Various means exist, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I like "some mean" because in probability theory there is only one mean. Maybe "some aggregate"? Also @jnothman's comment is not addressed, right?
doc/modules/clustering.rst
Outdated
value is typically some mean of the entropies of each clustering. Various means exist, | ||
and no firm rules exist for preferring one over the others. The decision is largely | ||
a field-by-field basis; for instance, in community detection, the arithmetic mean is | ||
most common. Yang et al. (2016) found that each normalizing method provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link to reference for citation?
doc/modules/clustering.rst
Outdated
"qualitatively similar behaviours". In our implementation, this is | ||
controlled by the ``average_method`` parameter. | ||
|
||
Vinh et al. (2010) named variants of NMI and AMI by their averaging method. Their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to reference for citation?
doc/modules/clustering.rst
Outdated
@@ -1249,7 +1261,8 @@ Their harmonic mean called **V-measure** is computed by | |||
0.51... | |||
|
|||
The V-measure is actually equivalent to the mutual information (NMI) | |||
discussed above normalized by the sum of the label entropies [B2011]_. | |||
discussed above normalized by the arithmetic mean of the label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe say "when normalized" or "with the aggregation function being the arithmetic mean"?
doc/whats_new/v0.20.rst
Outdated
@@ -121,6 +121,12 @@ Metrics | |||
- Partial AUC is available via ``max_fpr`` parameter in | |||
:func:`metrics.roc_auc_score`. :issue:`3273` by | |||
:user:`Alexander Niederbühl <Alexander-N>`. | |||
- Added control over the normalizer in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add blank line
doc/whats_new/v0.20.rst
Outdated
@@ -531,6 +537,17 @@ Metrics | |||
for :func:`metrics.roc_auc_score`. Moreover using ``reorder=True`` can hide bugs | |||
due to floating point error in the input. | |||
:issue:`9851` by :user:`Hanmin Qin <qinhanmin2014>`. | |||
- In :func:`metrics.normalized_mutual_information_score` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line
normalized_mutual_info_score, | ||
adjusted_mutual_info_score, | ||
] | ||
means = {"min", "geometric", "arithmetic", "max"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel kinda weird about calling these means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can switch to generalized_means
if you prefer, but it has to be clear that this is a specific class of aggregations. Product, for instance, wouldn't work. Let me know and I'll ship all changes in one PR update.
did you check that you're catching all deprecation warnings? |
With this? with warnings.catch_warnings():
warnings.filterwarnings("ignore",category=PendingDeprecationWarning) Also someone pushed in a way that introduces merge conflicts in the whats-new file, so tests aren't passing. |
can you merge master to fix the conflict? And I think we're using
right now. |
fixed the conflict |
For my future knowledge, what's the command to do that? |
I merged master into it and fixed the merge conflict and pushed into your branch.
fyi you shouldn't send PRs from your master branch, you should ideally create a feature branch. |
can you please also add a test that there are deprecation warnings? Also see the updated docs at http://scikit-learn.org/dev/developers/contributing.html#change-the-default-value-of-a-parameter |
I've written (but not committed) that. My concern is catching all of the FutureWarnings. I'd have to infect the entire |
LGTM +1 to merge |
LGTM. Merging. |
Thanks Arya!
|
Reference Issues/PRs
See #10308; this is a first step toward eventually deprecating one behavior and making their behavior consistent.
What does this implement/fix?
Background: The measures AMI, NMI, and V-measure are intimately related. Each is a normalized version of mutual information, and AMI incorporates adjustment for chance.