-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC: Specify Units for Mutual Information Metrics #18288 #18641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Specify Units for Mutual Information Metrics #18288 #18641
Conversation
3e32ad7
to
f8a079d
Compare
Yes I apologize--mixed in a rebase that shouldn't have been there. Thanks! |
You have lint issues
To check the code that you changed for lint issues , you can run the following command:
or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @armcode. Just a minor comment to fix.
Also, adding U
and V
is not directly related to this pull request, but let's wait for a core dev review.
Looks like it's ok now. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nitpicks but this is a good change.
@@ -740,10 +740,10 @@ def mutual_info_score(labels_true, labels_pred, *, contingency=None): | |||
Parameters | |||
---------- | |||
labels_true : int array, shape = [n_samples] | |||
A clustering of the data into disjoint subsets. | |||
A clustering of the data into disjoint subsets (U). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be nice to have something like
A clustering of the data into disjoint subsets (U). | |
A clustering of the data into disjoint subsets, called $U$ in the above formula. |
|
||
labels_pred : int array-like of shape (n_samples,) | ||
A clustering of the data into disjoint subsets. | ||
A clustering of the data into disjoint subsets (V). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same reference here.
@@ -965,7 +967,8 @@ def normalized_mutual_info_score(labels_true, labels_pred, *, | |||
Returns | |||
------- | |||
nmi : float | |||
score between 0.0 and 1.0. 1.0 stands for perfectly complete labeling | |||
score between 0.0 and 1.0 in normalized nats (based on the natural |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
score between 0.0 and 1.0 in normalized nats (based on the natural | |
Score between 0.0 and 1.0 in normalized nats (based on the natural |
@@ -828,10 +829,10 @@ def adjusted_mutual_info_score(labels_true, labels_pred, *, | |||
Parameters | |||
---------- | |||
labels_true : int array, shape = [n_samples] | |||
A clustering of the data into disjoint subsets. | |||
A clustering of the data into disjoint subsets (U). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to the formula
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these make sense to me. I just pushed an update that should incorporate all of these. Thanks a lot!
The linter is failing. This is probably due to some lines that are too long. Can you ensure to not have more than 79 characters? |
Sorry about that; I fixed the line lengths and couldn't figure out why the linter was still failing. There was a trailing space on one line that sneaked in. Looks good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amrcode!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amrcode! Here are a few suggestions.
A clustering of the data into disjoint subsets, called $U$ in the | ||
above formula. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To render the math formatting.
A clustering of the data into disjoint subsets, called $U$ in the | |
above formula. | |
A clustering of the data into disjoint subsets, called :math:`U` in the | |
above formula. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe of the sentence in a paragraph above can be changed to use:
This metric is furthermore symmetric: switching :math:`U` (i.e
``label_true``) with :math:`V` (i.e. ``label_pred``) will return the
same score value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that those patches can also be applied in some of the others metrics' signatures (e.g. the one of normalized_mutual_info_score
for the math notation).
A clustering of the data into disjoint subsets, called $V$ in the | ||
above formula. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
A clustering of the data into disjoint subsets, called $V$ in the | |
above formula. | |
A clustering of the data into disjoint subsets, called :math:`V` in the | |
above formula. |
A clustering of the data into disjoint subsets, called $U$ in the | ||
above formula. | ||
|
||
labels_pred : int array-like of shape (n_samples,) | ||
A clustering of the data into disjoint subsets. | ||
A clustering of the data into disjoint subsets, called $V$ in the | ||
above formula. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same suggestions apply also here.
Fix lint issues in doc updates.
…return value doc Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Line length adjustment
349ceb7
to
26e2064
Compare
@glemaitre , @jjerphan, if all your comments have been addressed this pull request can probably be merged? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I can't merge but @glemaitre can). |
Thanks @amrcode LGTM |
You're welcome! Thanks for reviewing and merging! |
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Fixes #18288
This PR adds documentation to the return values of the mutual-information-score functions to make it clear that the units are based on the natural logarithm. It also indicates how the parameters are related to the mathematical representations.