Skip to content

BUG: Protect generators from log(0.0) #13416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 11, 2019

Conversation

bashtage
Copy link
Contributor

Ensure log(0.0) doesn't produce inf/nan values when generating random values

These are all edge cases where

a. Either NaN, inf, or an undefined value are returned. Each has approx probability 1 in 10**53 of happening
b. They all technically change the stream, and so it may be the case that this PR cannot be accepted. This PR only changes the stream for the invalid values, in which case the invalid value is rejected so that the stream is "as-if" the bad value is skipped, and otherwise the same.

@mattip I'm submitting this so that it can receive some thought since these bugs are all in #13163 . If this is acceptable, then I'll make a PR against the branch.

@rkern Do you think these are worth fixing?

@bashtage
Copy link
Contributor Author

At least one of the distributions could be fixed without recursion at the cost of substantially changing the stream. I thought this is a bad idea since the recursion will only be called 1 in 10**53 draws.

@bashtage
Copy link
Contributor Author

xref #13361

@charris charris changed the title BUG: Proect generators form log(0.0) BUG: Protect generators from log(0.0) Apr 27, 2019
@bashtage bashtage force-pushed the protect-log-random-generation branch from 5ca5a4e to b1daecc Compare April 27, 2019 16:53
@charris
Copy link
Member

charris commented Apr 27, 2019

If this goes in, it should have a release note. Given that changing the stream is unlikely, and that rejection algorithms have (unlikely) problems with maintaining reproducible streams due to compiler/platform quirks in any case, I am OK with this fix. I'll let @rkern make the decision.

@bashtage
Copy link
Contributor Author

I think these all only reject if the integer value(s) is exactly (int) 0.

@bashtage
Copy link
Contributor Author

Would the release note be 1.16.4 or 1.17.0, or both? If this gets accepted into mtrand then I'll port it to the RandomState in #13163

@mattip
Copy link
Member

mattip commented Apr 28, 2019

In this PR the release note should be 1.17.0. When backported, the person doing the backport should add the 1.16.x note.

After the release of that version, the entire release note will be forward-merged to master.

@bashtage bashtage force-pushed the protect-log-random-generation branch 2 times, most recently from 6999986 to 24ca469 Compare April 29, 2019 07:11
@numpy numpy deleted a comment from bashtage Apr 29, 2019
the stream changes for any given seed is extremely small. If a 0 is encountered in the
underlying generator, then incorrect value (either ``np.inf`` or ``np.nan``) which
was produced in rejected and dropped from the stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"was produced in rejected and dropped" is not clear. Did you mean "was produced, then rejected and dropped"?

An aside: do we describe the use of a rejection-based algorithm in the overview or documentation of the distributions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I clarified. The specific algo isn't often discussed and so I removed "rejected" from the note.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"then the incorrect value ... is now dropped" reads a little better I think

@bashtage bashtage force-pushed the protect-log-random-generation branch from 24ca469 to cee32a5 Compare April 29, 2019 14:48
Ensure log(0.0) doesn't produce inf/nan values when generating random values
@bashtage bashtage force-pushed the protect-log-random-generation branch from cee32a5 to 52cb238 Compare April 29, 2019 18:13
@mattip
Copy link
Member

mattip commented Apr 30, 2019

LGTM. @rkern any thoughts?

@rkern
Copy link
Member

rkern commented Apr 30, 2019

After the new PRNG system #13163 gets merged, NEP 19 says that these kinds of changes to the legacy distributions are forbidden. They get frozen, bugs and all. What to do up until the minute #13163 gets merged is formally up in the air, but it seems near enough to me to lean towards not making these changes.

My mind might be changed if there were a report of someone actually running into this and having a problem because of it, and the next release would have this change but not #13163, but I think we only have theoretical issues.

@bashtage
Copy link
Contributor Author

bashtage commented May 1, 2019

The main reason why I suggested fixing this now, which is the last chance, is that it simplifies the shared code between the future RandomState and RandomGenerator. If this isn't fixed, then the fixed version of these functions will need be available for RandomGenerator while the broken version will need to be retailed for RandomState.

@charris
Copy link
Member

charris commented May 1, 2019

I suppose one could argue that a fix that hardly ever matters is a change that is hardly ever there. @rkern What do you think of the code argument?

@charris charris merged commit 4a2b6a7 into numpy:master May 11, 2019
@charris
Copy link
Member

charris commented May 11, 2019

Thanks @bashtage .

@rkern
Copy link
Member

rkern commented May 11, 2019

The distributions code is not shared between RandomState and RandomGenerator. Fixing the distributions for RandomState does not help RandomGenerator. There is no pressure to fix RandomState for the sake of RandomGenerator.

@rkern
Copy link
Member

rkern commented May 11, 2019

Or at least, I thought we had agreed to do so: #13163 (comment)

My argument was meant to apply with respect to the C distributions code too. Completely sever the two.

@rkern
Copy link
Member

rkern commented May 11, 2019

I guess with just the C distributions, my argument doesn't matter as much since we can just add new C functions to the same file and point RandomGenerator to the new functions as we evolve them. That's less problematic than the sharing of methods via subclassing.

Nonetheless, in that scheme, it was still unnecessary to make this change to RandomState. It would be fine to have had random_beta() for the legacy RandomState and add the new random_beta_2() for the RandomGenerator. We'll be doing that more and more going forward. It's not costly. RandomState should never hold us back from evolving RandomGenerator.

@bashtage
Copy link
Contributor Author

The distributions code is not shared between RandomState and RandomGenerator. Fixing the distributions for RandomState does not help RandomGenerator. There is no pressure to fix RandomState for the sake of RandomGenerator.

@rkern This isn't how it has been implemented (as of now). They share when the function is common and split when a break occurs. This simplifies having 2 copies around of these 5 distributions (for now).

@bashtage bashtage deleted the protect-log-random-generation branch May 12, 2019 07:58
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label May 14, 2019
mattip added a commit to mattip/numpy that referenced this pull request May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants