-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Fixes issue with exatly_zero_info_score #19179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Fixes issue with exatly_zero_info_score #19179
Conversation
@thomasjpfan How did you reproduce the error. I installed the same library version as on the CI and I could not reproduce it. Would it be linked to a different compiler? |
I was never able to reproduce the error locally. I had to slowly use the CI to figure out where the issue was. |
Thanks for this :)
…On Thu, 14 Jan 2021 at 22:53, Thomas J. Fan ***@***.***> wrote:
I was never able to reproduce the error locally. I had to slowly use the
CI to figure out where the issue was.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#19179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY32P3PHDWWEILFDIDILY3SZ5RXLANCNFSM4WDDNLGQ>
.
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
|
Even so, the issue persist. I think I can create a small test example using |
It seems that this is only an issue in openml now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now ready for review. This is the smallest diff I can come up with while also resolving the issue on [scipy-dev]
.
log_a = np.log(a) | ||
log_b = np.log(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly more memory efficient because we would not need to create the 2d array anymore.
@@ -38,9 +38,10 @@ def expected_mutual_information(contingency, int n_samples): | |||
term1 = nijs / N | |||
# term2 is log((N*nij) / (a * b)) == log(N * nij) - log(a * b) | |||
# term2 uses the outer product |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you adapt/remove this comment. With this PR, no outer product anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @thomasjpfan
@@ -795,6 +795,7 @@ def mutual_info_score(labels_true, labels_pred, *, contingency=None): | |||
log_outer = -np.log(outer) + log(pi.sum()) + log(pj.sum()) | |||
mi = (contingency_nm * (log_contingency_nm - log(contingency_sum)) + | |||
contingency_nm * log_outer) | |||
mi = np.where(np.abs(mi) < np.finfo(mi.dtype).eps, 0.0, mi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, but I wonder what other places we need to be doing it!
@@ -54,12 +54,12 @@ def expected_mutual_information(contingency, int n_samples): | |||
start = np.maximum(start, 1) | |||
end = np.minimum(np.resize(a, (C, R)).T, np.resize(b, (R, C))) + 1 | |||
# emi itself is a summation over the various values. | |||
emi = 0 | |||
emi = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emi is defined as DOUBLE
, not sure why you've added the .0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is more explicit if you did not read the definition :)
* ENH Fixes issue with exatly_zero_info_score [scipy-dev] * ENH Remove unneeded line [scipy-dev] * WIP Keep types [scipy-dev] * REV Smaller diff [scipy-dev] * WIP Expand mutual_info_score [scipy-dev] * WIP Removes float casting [scipy-dev] * WIP Adds casting back in * CI [scipy-dev] * WIP Casting is not needed [scipy-dev] * WIP Only clip [scipy-dev] * REV Smaller diff [scipy-dev] * WIP Place expected_mutual_information diff back [scipy-dev] * ENH Uses around * WIP Use where again [scipy-dev] * ENH Adjust comments to match code
* ENH Fixes issue with exatly_zero_info_score [scipy-dev] * ENH Remove unneeded line [scipy-dev] * WIP Keep types [scipy-dev] * REV Smaller diff [scipy-dev] * WIP Expand mutual_info_score [scipy-dev] * WIP Removes float casting [scipy-dev] * WIP Adds casting back in * CI [scipy-dev] * WIP Casting is not needed [scipy-dev] * WIP Only clip [scipy-dev] * REV Smaller diff [scipy-dev] * WIP Place expected_mutual_information diff back [scipy-dev] * ENH Uses around * WIP Use where again [scipy-dev] * ENH Adjust comments to match code
Reference Issues/PRs
Fixes #19165
What does this implement/fix? Explain your changes.
The issue stemmed from an numerical error where
log_N
did not cancel outlog_a