BUG: fix negative samples generated by Wald distribution #29609
+8
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In
numpy.random
, when generating Wald samples, if the mean and scale have a large discrepancy, the current implementation sometimes generates negative samples. An explicit example of this has been added totest_generator_mt19937.py
,mean=1e8, scale=2.25
(which is specific to the seed used in that test file).The key line implicated in
distributions.c
:which has numerical issues when
Y >> scale
.I have replaced this with the equivalent rationalized form:
This can be seen my noting the following algebraic identity:
As
mu_2l = mean / (2 * scale)
, we then have:Introduce a new variable for the denominator and factoring gives the final expressions:
And now the test passes, i.e., all samples are non-negative.
I have not made this change to the legacy implementation (nor the legacy test) as I figure they are deprecated, although it also suffers from the same issue.
Disclaimer: I found this failing property during a research project I am working on using LLM agents to write property-based tests (i.e., in Hypothesis). This property, that all Wald samples should be non-negative, was proposed and written "autonomously" by the agent. I spent several hours analyzing the failure and becoming acquainted with the NumPy codebase myself in order to verify it was a real issue, running it with the seed in the test suite, etc. I also pinpointed the cause of the problem and wrote this fix and the PR myself. I take full responsibility for it.
The Hypothesis test that caught the issue was simply:
Note that even after this PR, this property fails, albeit the minimal example is now
mean=280432326968266.0, scale=0.25
(i.e., even more extreme values). But, at such values, it seems like we hit the limits of floating point computations.