FIX Fixes issue with exatly_zero_info_score #19179

thomasjpfan · 2021-01-14T21:31:24Z

Reference Issues/PRs

Fixes #19165

What does this implement/fix? Explain your changes.

The issue stemmed from an numerical error where log_N did not cancel out log_a

glemaitre · 2021-01-14T21:41:27Z

@thomasjpfan How did you reproduce the error. I installed the same library version as on the CI and I could not reproduce it. Would it be linked to a different compiler?

thomasjpfan · 2021-01-14T21:53:41Z

I was never able to reproduce the error locally. I had to slowly use the CI to figure out where the issue was.

glemaitre · 2021-01-14T21:54:47Z

Thanks for this :)

…

On Thu, 14 Jan 2021 at 22:53, Thomas J. Fan ***@***.***> wrote: I was never able to reproduce the error locally. I had to slowly use the CI to figure out where the issue was. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#19179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABY32P3PHDWWEILFDIDILY3SZ5RXLANCNFSM4WDDNLGQ> .

-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/

thomasjpfan · 2021-01-14T21:54:50Z

Even so, the issue persist. I think I can create a small test example using np.log that fails on the CI, but passes locally.

glemaitre · 2021-01-15T07:59:37Z

It seems that this is only an issue in openml now?

thomasjpfan

This is now ready for review. This is the smallest diff I can come up with while also resolving the issue on [scipy-dev].

thomasjpfan · 2021-01-15T22:48:36Z

sklearn/metrics/cluster/_expected_mutual_info_fast.pyx

+    log_a = np.log(a)
+    log_b = np.log(b)


Slightly more memory efficient because we would not need to create the 2d array anymore.

lorentzenchr · 2021-01-16T11:05:14Z

sklearn/metrics/cluster/_expected_mutual_info_fast.pyx

@@ -38,9 +38,10 @@ def expected_mutual_information(contingency, int n_samples):
    term1 = nijs / N
    # term2 is log((N*nij) / (a * b)) == log(N * nij) - log(a * b)
    # term2 uses the outer product


Could you adapt/remove this comment. With this PR, no outer product anymore.

adrinjalali

Thanks @thomasjpfan

adrinjalali · 2021-01-16T18:16:14Z

sklearn/metrics/cluster/_supervised.py

@@ -795,6 +795,7 @@ def mutual_info_score(labels_true, labels_pred, *, contingency=None):
    log_outer = -np.log(outer) + log(pi.sum()) + log(pj.sum())
    mi = (contingency_nm * (log_contingency_nm - log(contingency_sum)) +
          contingency_nm * log_outer)
+    mi = np.where(np.abs(mi) < np.finfo(mi.dtype).eps, 0.0, mi)


fair enough, but I wonder what other places we need to be doing it!

adrinjalali · 2021-01-16T18:17:08Z

sklearn/metrics/cluster/_expected_mutual_info_fast.pyx

@@ -54,12 +54,12 @@ def expected_mutual_information(contingency, int n_samples):
    start = np.maximum(start, 1)
    end = np.minimum(np.resize(a, (C, R)).T, np.resize(b, (R, C))) + 1
    # emi itself is a summation over the various values.
-    emi = 0
+    emi = 0.0


emi is defined as DOUBLE, not sure why you've added the .0

it is more explicit if you did not read the definition :)

* ENH Fixes issue with exatly_zero_info_score [scipy-dev] * ENH Remove unneeded line [scipy-dev] * WIP Keep types [scipy-dev] * REV Smaller diff [scipy-dev] * WIP Expand mutual_info_score [scipy-dev] * WIP Removes float casting [scipy-dev] * WIP Adds casting back in * CI [scipy-dev] * WIP Casting is not needed [scipy-dev] * WIP Only clip [scipy-dev] * REV Smaller diff [scipy-dev] * WIP Place expected_mutual_information diff back [scipy-dev] * ENH Uses around * WIP Use where again [scipy-dev] * ENH Adjust comments to match code

thomasjpfan added 2 commits January 14, 2021 15:58

ENH Fixes issue with exatly_zero_info_score [scipy-dev]

240de07

ENH Remove unneeded line [scipy-dev]

637681c

github-actions bot added the module:metrics label Jan 14, 2021

thomasjpfan marked this pull request as draft January 14, 2021 21:53

thomasjpfan added 8 commits January 14, 2021 18:00

WIP Keep types [scipy-dev]

d6fa33a

REV Smaller diff [scipy-dev]

98e9480

WIP Expand mutual_info_score [scipy-dev]

8178a8b

WIP Removes float casting [scipy-dev]

4065ad4

WIP Adds casting back in

e4d17f1

CI [scipy-dev]

e311d89

WIP Casting is not needed [scipy-dev]

6600968

WIP Only clip [scipy-dev]

d261ff1

thomasjpfan added 4 commits January 15, 2021 09:47

REV Smaller diff [scipy-dev]

f67c5cf

WIP Place expected_mutual_information diff back [scipy-dev]

ea3094a

ENH Uses around

d1e3a49

WIP Use where again [scipy-dev]

4572831

thomasjpfan marked this pull request as ready for review January 15, 2021 22:45

thomasjpfan commented Jan 15, 2021

View reviewed changes

lorentzenchr approved these changes Jan 16, 2021

View reviewed changes

ENH Adjust comments to match code

0e705f6

adrinjalali approved these changes Jan 16, 2021

View reviewed changes

adrinjalali merged commit 4b8ab92 into scikit-learn:master Jan 16, 2021

glemaitre mentioned this pull request Jan 18, 2021

DOC add entry for numerical instability mutual information #19200

Merged

dependabot bot mentioned this pull request Mar 12, 2021

Bump scikit-learn from 0.23.1 to 0.24.1 JadHADDAD92/covid-mask-detector#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX Fixes issue with exatly_zero_info_score #19179

FIX Fixes issue with exatly_zero_info_score #19179

thomasjpfan commented Jan 14, 2021

glemaitre commented Jan 14, 2021

thomasjpfan commented Jan 14, 2021

glemaitre commented Jan 14, 2021 via email

thomasjpfan commented Jan 14, 2021

glemaitre commented Jan 15, 2021

thomasjpfan left a comment

thomasjpfan Jan 15, 2021

lorentzenchr Jan 16, 2021

adrinjalali left a comment

adrinjalali Jan 16, 2021

adrinjalali Jan 16, 2021

glemaitre Jan 18, 2021

FIX Fixes issue with exatly_zero_info_score #19179

FIX Fixes issue with exatly_zero_info_score #19179

Conversation

thomasjpfan commented Jan 14, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

glemaitre commented Jan 14, 2021

thomasjpfan commented Jan 14, 2021

glemaitre commented Jan 14, 2021 via email

thomasjpfan commented Jan 14, 2021

glemaitre commented Jan 15, 2021

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan Jan 15, 2021

Choose a reason for hiding this comment

lorentzenchr Jan 16, 2021

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali Jan 16, 2021

Choose a reason for hiding this comment

adrinjalali Jan 16, 2021

Choose a reason for hiding this comment

glemaitre Jan 18, 2021

Choose a reason for hiding this comment