DOC: Specify Units for Mutual Information Metrics #18288 #18641

amrcode · 2020-10-19T12:36:01Z

This PR adds documentation to the return values of the mutual-information-score functions to make it clear that the units are based on the natural logarithm. It also indicates how the parameters are related to the mathematical representations.

cmarmo · 2020-10-19T14:25:34Z

Hi @armcode , thanks for your pull request. I'm under the impression that something went wrong in your last merge with upstream.
To revert your branch to your first commit (f8a079d) you might want to use

git reset --hard f8a079d0717ac15dbcc37b409782c13f8ffc9055

amrcode · 2020-10-19T14:47:57Z

Yes I apologize--mixed in a rebase that shouldn't have been there. Thanks!

cmarmo · 2020-10-20T08:31:43Z

You have lint issues

sklearn/metrics/cluster/_supervised.py:625:80: E501 line too long (93 > 79 characters)
sklearn/metrics/cluster/_supervised.py:838:74: W291 trailing whitespace

To check the code that you changed for lint issues , you can run the following command:

git diff upstream/master -u -- "*.py" | flake8 --diff

or make flake8-diff (on unix-like system)

cmarmo

Thanks @armcode. Just a minor comment to fix.
Also, adding U and V is not directly related to this pull request, but let's wait for a core dev review.

sklearn/metrics/cluster/_supervised.py

cmarmo · 2020-12-15T15:42:06Z

Hi @amrcode the failing test is probably unrelated with your pull request. Do you mind synchronizing with the last version in upstream? A useful tutorial is available here if you need help in doing that. Thanks!

amrcode · 2020-12-15T20:47:50Z

Looks like it's ok now. Thanks!

glemaitre

A couple of nitpicks but this is a good change.

glemaitre · 2021-01-05T13:40:36Z

sklearn/metrics/cluster/_supervised.py

@@ -740,10 +740,10 @@ def mutual_info_score(labels_true, labels_pred, *, contingency=None):
    Parameters
    ----------
    labels_true : int array, shape = [n_samples]
-        A clustering of the data into disjoint subsets.
+        A clustering of the data into disjoint subsets (U).


I think it could be nice to have something like

Suggested change

A clustering of the data into disjoint subsets (U).

A clustering of the data into disjoint subsets, called $U$ in the above formula.

glemaitre · 2021-01-05T13:40:56Z

sklearn/metrics/cluster/_supervised.py


    labels_pred : int array-like of shape (n_samples,)
-        A clustering of the data into disjoint subsets.
+        A clustering of the data into disjoint subsets (V).


The same reference here.

glemaitre · 2021-01-05T13:45:09Z

sklearn/metrics/cluster/_supervised.py

@@ -965,7 +967,8 @@ def normalized_mutual_info_score(labels_true, labels_pred, *,
    Returns
    -------
    nmi : float
-       score between 0.0 and 1.0. 1.0 stands for perfectly complete labeling
+       score between 0.0 and 1.0 in normalized nats (based on the natural


Suggested change

score between 0.0 and 1.0 in normalized nats (based on the natural

Score between 0.0 and 1.0 in normalized nats (based on the natural

glemaitre · 2021-01-05T13:45:17Z

sklearn/metrics/cluster/_supervised.py

@@ -828,10 +829,10 @@ def adjusted_mutual_info_score(labels_true, labels_pred, *,
    Parameters
    ----------
    labels_true : int array, shape = [n_samples]
-        A clustering of the data into disjoint subsets.
+        A clustering of the data into disjoint subsets (U).


refer to the formula

All of these make sense to me. I just pushed an update that should incorporate all of these. Thanks a lot!

glemaitre · 2021-01-05T17:13:06Z

The linter is failing. This is probably due to some lines that are too long. Can you ensure to not have more than 79 characters?

amrcode · 2021-02-02T14:54:36Z

Sorry about that; I fixed the line lengths and couldn't figure out why the linter was still failing. There was a trailing space on one line that sneaked in. Looks good now.

cmarmo

Thanks @amrcode!

jjerphan

Thanks @amrcode! Here are a few suggestions.

jjerphan · 2021-06-10T15:39:22Z

sklearn/metrics/cluster/_supervised.py

+        A clustering of the data into disjoint subsets, called $U$ in the
+        above formula.


To render the math formatting.

Suggested change

A clustering of the data into disjoint subsets, called $U$ in the

above formula.

A clustering of the data into disjoint subsets, called :math:`U` in the

above formula.

Also maybe of the sentence in a paragraph above can be changed to use:

This metric is furthermore symmetric: switching :math:`U` (i.e ``label_true``) with :math:`V` (i.e. ``label_pred``) will return the same score value.

Also note that those patches can also be applied in some of the others metrics' signatures (e.g. the one of normalized_mutual_info_score for the math notation).

jjerphan · 2021-06-10T15:39:41Z

sklearn/metrics/cluster/_supervised.py

+        A clustering of the data into disjoint subsets, called $V$ in the
+        above formula.


Ditto.

Suggested change

A clustering of the data into disjoint subsets, called $V$ in the

above formula.

A clustering of the data into disjoint subsets, called :math:`V` in the

above formula.

jjerphan · 2021-06-10T15:42:40Z

sklearn/metrics/cluster/_supervised.py

+        A clustering of the data into disjoint subsets, called $U$ in the
+        above formula.

    labels_pred : int array-like of shape (n_samples,)
-        A clustering of the data into disjoint subsets.
+        A clustering of the data into disjoint subsets, called $V$ in the
+        above formula.


The same suggestions apply also here.

Fix lint issues in doc updates.

…return value doc Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>

Line length adjustment

cmarmo · 2021-10-05T02:03:24Z

@glemaitre , @jjerphan, if all your comments have been addressed this pull request can probably be merged? Thanks!

jjerphan

Thanks for the heads up, @cmarmo.

LGTM, thank you @amrcode!

jjerphan · 2021-10-06T09:09:30Z

(I can't merge but @glemaitre can).

glemaitre · 2021-10-12T20:10:22Z

Thanks @amrcode LGTM

amrcode · 2021-10-13T12:00:51Z

You're welcome! Thanks for reviewing and merging!

Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>

github-actions bot added module:metrics Documentation labels Oct 19, 2020

amrcode force-pushed the specify-units-for-mutual-information-metrics branch from 3e32ad7 to f8a079d Compare October 19, 2020 14:46

cmarmo reviewed Oct 21, 2020

View reviewed changes

sklearn/metrics/cluster/_supervised.py Outdated Show resolved Hide resolved

cmarmo added the Waiting for Reviewer label Dec 15, 2020

glemaitre reviewed Jan 5, 2021

View reviewed changes

glemaitre removed the Waiting for Reviewer label Jan 5, 2021

Base automatically changed from master to main January 22, 2021 10:53

cmarmo approved these changes Feb 5, 2021

View reviewed changes

cmarmo added the Waiting for Reviewer label Feb 5, 2021

jjerphan requested changes Jun 10, 2021

View reviewed changes

amrcode and others added 10 commits June 11, 2021 08:56

Update _supervised.py

bf380eb

Update _supervised.py

7421170

Fix lint issues in doc updates.

Update sklearn/metrics/cluster/_supervised.py to add a period in the …

7cb0dbe

…return value doc Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>

Updates from review

14df87f

Fix line lengths

6c1c787

Update _supervised.py

b6e1275

Line length adjustment

Remove trailing whitespace

85aa9ec

Update math notation

e8ff404

Fix line lengths

bdc8ba9

Remove trailing whitespace

26e2064

amrcode force-pushed the specify-units-for-mutual-information-metrics branch from 349ceb7 to 26e2064 Compare June 11, 2021 12:59

jjerphan approved these changes Oct 5, 2021

View reviewed changes

glemaitre merged commit 3bfb033 into scikit-learn:main Oct 12, 2021

glemaitre mentioned this pull request Oct 23, 2021

Release 1.0.1 #21404

Merged

10 tasks

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Oct 23, 2021

DOC Specify Units for Mutual Information Metrics (scikit-learn#18641)

c77fc4e

Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>

glemaitre pushed a commit that referenced this pull request Oct 25, 2021

DOC Specify Units for Mutual Information Metrics (#18641)

45dc4d4

Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

DOC Specify Units for Mutual Information Metrics (scikit-learn#18641)

8545cc8

Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Specify Units for Mutual Information Metrics #18288 #18641

DOC: Specify Units for Mutual Information Metrics #18288 #18641

amrcode commented Oct 19, 2020 •

edited

Loading

cmarmo commented Oct 19, 2020

amrcode commented Oct 19, 2020

cmarmo commented Oct 20, 2020

cmarmo left a comment

cmarmo commented Dec 15, 2020

amrcode commented Dec 15, 2020

glemaitre left a comment

glemaitre Jan 5, 2021

glemaitre Jan 5, 2021

glemaitre Jan 5, 2021

glemaitre Jan 5, 2021

amrcode Jan 5, 2021

glemaitre commented Jan 5, 2021

amrcode commented Feb 2, 2021

cmarmo left a comment

jjerphan left a comment

jjerphan Jun 10, 2021

jjerphan Jun 10, 2021

jjerphan Jun 10, 2021

jjerphan Jun 10, 2021

jjerphan Jun 10, 2021

cmarmo commented Oct 5, 2021

jjerphan left a comment

jjerphan commented Oct 6, 2021

glemaitre commented Oct 12, 2021

amrcode commented Oct 13, 2021

	A clustering of the data into disjoint subsets (U).
	A clustering of the data into disjoint subsets, called $U$ in the above formula.

	score between 0.0 and 1.0 in normalized nats (based on the natural
	Score between 0.0 and 1.0 in normalized nats (based on the natural

		A clustering of the data into disjoint subsets, called $U$ in the
		above formula.

DOC: Specify Units for Mutual Information Metrics #18288 #18641

DOC: Specify Units for Mutual Information Metrics #18288 #18641

Conversation

amrcode commented Oct 19, 2020 • edited Loading

cmarmo commented Oct 19, 2020

amrcode commented Oct 19, 2020

cmarmo commented Oct 20, 2020

cmarmo left a comment

Choose a reason for hiding this comment

cmarmo commented Dec 15, 2020

amrcode commented Dec 15, 2020

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Jan 5, 2021

amrcode commented Feb 2, 2021

cmarmo left a comment

Choose a reason for hiding this comment

jjerphan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmarmo commented Oct 5, 2021

jjerphan left a comment

Choose a reason for hiding this comment

jjerphan commented Oct 6, 2021

glemaitre commented Oct 12, 2021

amrcode commented Oct 13, 2021

amrcode commented Oct 19, 2020 •

edited

Loading