Description
The issue of np.linalg.eigh returning wrong results or crashing is still real (using numpy 1.26.2):
>>> import numpy as np
>>> n=32767
>>> b=np.random.rand(n)
>>> m_32767=np.diag(b)
>>> m_32767.shape
(32767, 32767)
>>> V_32767=np.linalg.eigh(m_32767)
** On entry to DSTEDC parameter number 8 had an illegal value
** On entry to DORMTR parameter number 12 had an illegal value
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pearu/miniconda3/envs/jax-cuda-dev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 1487, in eigh
w, vt = gufunc(a, signature=signature, extobj=extobj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pearu/miniconda3/envs/jax-cuda-dev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 118, in _raise_linalgerror_eigenvalues_nonconvergence
raise LinAlgError("Eigenvalues did not converge")
numpy.linalg.LinAlgError: Eigenvalues did not converge
where the exception "Eigenvalues did not converge" is very likely wrong and misleading. With n == 32766, the above example works fine.
The underlying problem is that when computing lwork
for the lapack syevd
function using expression 1 + 6 * n + 2 * n * n
, it will overflow when n == 32767
. This problem is not unique to syevd
but it exists for all lapack functions that work array sizes are quadratic wrt input sizes.
While switching to lapack implementations that uses 64-bit integer inputs, the overflow issue is seemingly resolved but in fact it is just harder to reproduce because the critical n
size will be 2147483647
where the issue re-merges when the lwork
expression above will overflow for int64.
I have implemented a solution to the same problem in JAX (jax-ml/jax#19288) that will lead to an overflow exception rather than wrong results or crashes. I think something similar is appropriate for NumPy as well.
Originally posted by @pearu in #13956 (comment)