Skip to content

BUG: np.random.Generator.hypergeometric is weaker than np.random.RandomState.hypergeometric #27880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vincent-163 opened this issue Nov 30, 2024 · 1 comment
Labels
33 - Question Question about NumPy usage or development

Comments

@vincent-163
Copy link

Describe the issue:

The documentation suggests that np.random.Generator should be used in favor of np.random.RandomState, except to reproduce exact same random sequences prior to NumPy v1.16. The expectation is that np.random.Generator should have better support for all functions in np.random.RandomState; this is not the case for np.random.Generator.hypergeometric, which allows ngood and nbad up to $10^9$, compared to np.random.RandomState.hypergeometric which allows any ngood and nbad within int64 range. Either np.random.Generator should be updated to support the same cases that np.random.RandomState do, or the docs should be updated to mention that legacy RandomState have to be used for this specific function .

Reproduce the code example:

import numpy as np
np.random.RandomState(1337).hypergeometric(10**15,10**15,10**9) # 499964666
np.random.default_rng(1337).hypergeometric(10**15,10**15,10**9) # ValueError

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "numpy/random/_generator.pyx", line 3496, in numpy.random._generator.Generator.hypergeometric
ValueError: both ngood and nbad must be less than 1000000000

Python and NumPy Versions:

1.26.4
3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]

Runtime Environment:

No response

Context for the issue:

The implementation for np.random.Generator.hypergeometric is at

cdef double HYPERGEOM_MAX = 10**9
written in Python, which begins by checking if ngood and nbad is less than 10^9.

The implementation for np.random.RandomState.hypergeometric is at

int64_t random_hypergeometric(bitgen_t *bitgen_state,
written in C.

I think the math involved in np.random.RandomState.hypergeometric (the "HRUA" algorithm) might be a little bit too complex which made it a little hard to translate it into Python. But the only primitives used in the C implementation are next_double and logfactorial, where logfactorial is implemented in https://github.com/numpy/numpy/blob/main/numpy/random/src/distributions/logfactorial.c. With the help of generative coding assistants it might be easier than ever to translate the code into Python. I don't have experience with NumPy development yet but I hope someone will pick this up.

@rkern
Copy link
Member

rkern commented Nov 30, 2024

Please see #13761 for why these limits are in place. The older implementation in RandomState suffers from all the same issues; we just can't forbid those inputs because of our compatibility policy.

@rkern rkern closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2024
@rkern rkern added 33 - Question Question about NumPy usage or development and removed 00 - Bug labels Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
33 - Question Question about NumPy usage or development
Projects
None yet
Development

No branches or pull requests

2 participants