BUG: np.random.Generator.hypergeometric is weaker than np.random.RandomState.hypergeometric #27880

vincent-163 · 2024-11-30T17:03:13Z

Describe the issue:

The documentation suggests that np.random.Generator should be used in favor of np.random.RandomState, except to reproduce exact same random sequences prior to NumPy v1.16. The expectation is that np.random.Generator should have better support for all functions in np.random.RandomState; this is not the case for np.random.Generator.hypergeometric, which allows ngood and nbad up to $10^9$, compared to np.random.RandomState.hypergeometric which allows any ngood and nbad within int64 range. Either np.random.Generator should be updated to support the same cases that np.random.RandomState do, or the docs should be updated to mention that legacy RandomState have to be used for this specific function .

Reproduce the code example:

import numpy as np
np.random.RandomState(1337).hypergeometric(10**15,10**15,10**9) # 499964666
np.random.default_rng(1337).hypergeometric(10**15,10**15,10**9) # ValueError

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "numpy/random/_generator.pyx", line 3496, in numpy.random._generator.Generator.hypergeometric
ValueError: both ngood and nbad must be less than 1000000000

Python and NumPy Versions:

1.26.4
3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]

Runtime Environment:

No response

Context for the issue:

The implementation for np.random.Generator.hypergeometric is at

numpy/numpy/random/_generator.pyx

Line 3599 in 421546e

cdef double HYPERGEOM_MAX = 10**9

written in Python, which begins by checking if ngood and nbad is less than 10^9.

The implementation for np.random.RandomState.hypergeometric is at

numpy/numpy/random/src/distributions/random_hypergeometric.c

Line 246 in 421546e

int64_t random_hypergeometric(bitgen_t *bitgen_state,

written in C.

I think the math involved in np.random.RandomState.hypergeometric (the "HRUA" algorithm) might be a little bit too complex which made it a little hard to translate it into Python. But the only primitives used in the C implementation are next_double and logfactorial, where logfactorial is implemented in https://github.com/numpy/numpy/blob/main/numpy/random/src/distributions/logfactorial.c. With the help of generative coding assistants it might be easier than ever to translate the code into Python. I don't have experience with NumPy development yet but I hope someone will pick this up.

The text was updated successfully, but these errors were encountered:

rkern · 2024-11-30T18:28:57Z

Please see #13761 for why these limits are in place. The older implementation in RandomState suffers from all the same issues; we just can't forbid those inputs because of our compatibility policy.

vincent-163 added the 00 - Bug label Nov 30, 2024

rkern closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2024

rkern added 33 - Question Question about NumPy usage or development and removed 00 - Bug labels Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: np.random.Generator.hypergeometric is weaker than np.random.RandomState.hypergeometric #27880

BUG: np.random.Generator.hypergeometric is weaker than np.random.RandomState.hypergeometric #27880

vincent-163 commented Nov 30, 2024

rkern commented Nov 30, 2024

BUG: np.random.Generator.hypergeometric is weaker than np.random.RandomState.hypergeometric #27880

BUG: np.random.Generator.hypergeometric is weaker than np.random.RandomState.hypergeometric #27880

Comments

vincent-163 commented Nov 30, 2024

Describe the issue:

Reproduce the code example:

Error message:

Python and NumPy Versions:

Runtime Environment:

Context for the issue:

rkern commented Nov 30, 2024