-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Protect generators from log(0.0) #13416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
At least one of the distributions could be fixed without recursion at the cost of substantially changing the stream. I thought this is a bad idea since the recursion will only be called 1 in 10**53 draws. |
xref #13361 |
5ca5a4e
to
b1daecc
Compare
If this goes in, it should have a release note. Given that changing the stream is unlikely, and that rejection algorithms have (unlikely) problems with maintaining reproducible streams due to compiler/platform quirks in any case, I am OK with this fix. I'll let @rkern make the decision. |
I think these all only reject if the integer value(s) is exactly (int) 0. |
Would the release note be 1.16.4 or 1.17.0, or both? If this gets accepted into mtrand then I'll port it to the RandomState in #13163 |
In this PR the release note should be 1.17.0. When backported, the person doing the backport should add the 1.16.x note. After the release of that version, the entire release note will be forward-merged to master. |
6999986
to
24ca469
Compare
the stream changes for any given seed is extremely small. If a 0 is encountered in the | ||
underlying generator, then incorrect value (either ``np.inf`` or ``np.nan``) which | ||
was produced in rejected and dropped from the stream. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"was produced in rejected and dropped" is not clear. Did you mean "was produced, then rejected and dropped"?
An aside: do we describe the use of a rejection-based algorithm in the overview or documentation of the distributions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I clarified. The specific algo isn't often discussed and so I removed "rejected" from the note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"then the incorrect value ... is now dropped" reads a little better I think
24ca469
to
cee32a5
Compare
Ensure log(0.0) doesn't produce inf/nan values when generating random values
cee32a5
to
52cb238
Compare
LGTM. @rkern any thoughts? |
After the new PRNG system #13163 gets merged, NEP 19 says that these kinds of changes to the legacy distributions are forbidden. They get frozen, bugs and all. What to do up until the minute #13163 gets merged is formally up in the air, but it seems near enough to me to lean towards not making these changes. My mind might be changed if there were a report of someone actually running into this and having a problem because of it, and the next release would have this change but not #13163, but I think we only have theoretical issues. |
The main reason why I suggested fixing this now, which is the last chance, is that it simplifies the shared code between the future |
I suppose one could argue that a fix that hardly ever matters is a change that is hardly ever there. @rkern What do you think of the code argument? |
Thanks @bashtage . |
The distributions code is not shared between |
Or at least, I thought we had agreed to do so: #13163 (comment) My argument was meant to apply with respect to the C distributions code too. Completely sever the two. |
I guess with just the C distributions, my argument doesn't matter as much since we can just add new C functions to the same file and point Nonetheless, in that scheme, it was still unnecessary to make this change to |
@rkern This isn't how it has been implemented (as of now). They share when the function is common and split when a break occurs. This simplifies having 2 copies around of these 5 distributions (for now). |
Fix entropy, port numpy#13416 to newer code
Ensure log(0.0) doesn't produce inf/nan values when generating random values
These are all edge cases where
a. Either NaN, inf, or an undefined value are returned. Each has approx probability 1 in 10**53 of happening
b. They all technically change the stream, and so it may be the case that this PR cannot be accepted. This PR only changes the stream for the invalid values, in which case the invalid value is rejected so that the stream is "as-if" the bad value is skipped, and otherwise the same.
@mattip I'm submitting this so that it can receive some thought since these bugs are all in #13163 . If this is acceptable, then I'll make a PR against the branch.
@rkern Do you think these are worth fixing?