Skip to content

BUG: mkl errors and segfaults with numpy 1.26.0 on i9-10900K #24846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stuart-nolan opened this issue Oct 3, 2023 · 16 comments · Fixed by #24893
Closed

BUG: mkl errors and segfaults with numpy 1.26.0 on i9-10900K #24846

stuart-nolan opened this issue Oct 3, 2023 · 16 comments · Fixed by #24893
Labels

Comments

@stuart-nolan
Copy link

stuart-nolan commented Oct 3, 2023

Describe the issue:

This is cpu architecture and numpy version dependent. I observe these issues on an i9-10900K with numpy 1.26.0. I do not observe an issue on the same i9-10900K and numpy 1.25.2 - all other factors unchanged. I do not observe any issues on an i5-4300U laptop (same version for os, python, numpy - 1.25.2 or 1.26.0, mkl, and test script). See context below for additional detail.

os: Ubuntu 22.04.3 LTS
python (in a virtual env): Python 3.11.4
note that I use "-march=native" amongst other flags for both python and numpy builds on both cpu's.
mkl: 2023.2.0

Reproduce the code example:

import numpy as np                                                               
from time import time                                                            
import sysconfig                                                                 
import psutil                                                                    
import re                                                                        

mkl = True                                                                                  
# Print numpy see whether mkl/blas is available                                  
np.show_config()                                                                 
print("python compile flags")                                                    
print(sysconfig.get_config_var('CFLAGS'))                                        
                                                                                 
re_cpu = re.compile("^model name        : (.*)")                                 
with open('/proc/cpuinfo') as f:                                                 
    for line in f:                                                               
        model = re_cpu.match(line)                                               
        if model:                                                                
            print(model.group(1))                                                
            break                                                                
                                                                                 
print("mkl: %s" % mkl)                                                           
print("physical cores: %s" % psutil.cpu_count())                                 
print("logical cores: %s" % psutil.cpu_count(logical=True))                      
print("cpu min freq: %s" % psutil.cpu_freq().min)                                
print("cpu current freq: %s" % psutil.cpu_freq().current)                        
print("cpu max freq: %s" % psutil.cpu_freq().max)                                
print("load average: %s %s %s" % psutil.getloadavg())                            
print("Total memory: %s GB" % round(psutil.virtual_memory().total/1000000000, 2))                                                                                
print("Available memory: %s GB" % round(psutil.virtual_memory().available/1000000000, 2))                                                                        

np.random.seed(0)                                                                
                                                                                 
size = 4096                                                                      
A, B = np.random.random((size, size)), np.random.random((size, size))            
C, D = np.random.random((size * 128,)), np.random.random((size * 128,))          
E = np.random.random((int(size / 2), int(size / 4)))                             
F = np.random.random((int(size / 2), int(size / 2)))                             
F = np.dot(F, F.T)                                                               
G = np.random.random((int(size / 2), int(size / 2)))                             
                                                                                 
# Matrix multiplication                                                          
N = 10                                                                           
t = time()                                                                       
for i in range(N):                                                               
    np.dot(A, B)                                                                 
delta = time() - t                                                               
print('Dotted two %dx%d matrices in %0.2f s.' % (size, size, delta / N))         
del A, B                                                                         
# Vector multiplication                                                          
N = 10                                                                           
t = time()                                                                       
for i in range(N):                                                               
    np.dot(C, D)                                                                 
delta = time() - t                                                               
print('Dotted two vectors of length %d in %0.2f ms.' % (size * 128, 1e3 * delta / N))                                                                            
del C, D                                                                         
                                                                                 
# Singular Value Decomposition (SVD)                                             
N = 3                                                                            
t = time()                                                                       
for i in range(N):                                                               
    np.linalg.svd(E, full_matrices = False)                                      
delta = time() - t                                                               
print("SVD of a %dx%d matrix in %0.2f s." % (size / 2, size / 4, delta / N))     
del E

# Cholesky Decomposition                                                         
N = 3                                                                            
t = time()                                                                       
for i in range(N):                                                               
    np.linalg.cholesky(F)                                                        
delta = time() - t                                                               
print("Cholesky decomposition of a %dx%d matrix in %0.2f s." % (size / 2, size /\
 2, delta / N))

"""tests continue, but segfault on i9-10900K, numpy 1.26 at "Cholesky Decomposition""""

Error message:

MKL_VERBOSE=1, there is no "traceback", full output from numpy 1.26 env in the hope it helps...

MKL_VERBOSE oneMKL 2023.0 Update 2 Product build 20230613 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.70GHz ilp64 intel_thread
MKL_VERBOSE SDOT(2,0x556cea252820,1,0x556cea252820,1) 765.28us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
Build Dependencies:
  blas:
    detection method: cmake
    found: true
    name: MKL
    version: 2023.2.0
  lapack:
    detection method: cmake
    found: true
    name: MKL
    version: 2023.2.0
Compilers:
  c:
    commands: ccache, cc
    linker: ld.bfd
    name: gcc
    version: 11.4.0
  c++:
    commands: ccache, c++
    linker: ld.bfd
    name: gcc
    version: 11.4.0
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.2
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
Python Information:
  path: /tmp/build-env-3t41cf80/bin/python
  version: '3.11'
SIMD Extensions:
  baseline:
  - SSE
  - SSE2
  - SSE3
  - SSSE3
  - SSE41
  - POPCNT
  - SSE42
  - AVX
  - F16C
  - FMA3
  - AVX2
  not found:
  - AVX512F
  - AVX512CD
  - AVX512_KNL
  - AVX512_KNM
  - AVX512_SKX
  - AVX512_CLX
  - AVX512_CNL
  - AVX512_ICL

python compile flags
-Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -O3 -march=native -O3 -march=native
Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
mkl: True
physical cores: 10
logical cores: 10
cpu min freq: 800.0
cpu current freq: 4395.1093
cpu max freq: 5300.0
load average: 1.39501953125 0.7900390625 0.330078125
Total memory: 33.48 GB
Available memory: 31.87 GB
MKL_VERBOSE DSYRK(L,T,2048,2048,0x7fff7c2069a0,0x7f9d51624010,2048,0x7fff7c2069a8,0x7f9d4f623010,2048) 18.58ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.93ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 280.20ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 285.86ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.18ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 283.32ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 279.40ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 286.43ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.06ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.67ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 285.34ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
Dotted two 4096x4096 matrices in 0.29 s.
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 352.45us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 173.90us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 29.56us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 28.30us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 28.26us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 27.08us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 26.16us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 28.40us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 26.52us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 25.99us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
Dotted two vectors of length 524288 in 0.08 ms.

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD.
MKL_VERBOSE DGESDD(S,4398046513152,8796093023232,0x7f9d60e1d010,8796093024256,0x7f9d61e1d010,0x7f9d61e1f010,4398046513152,0x7f9d62e1f010,-4294966272,0x7fff7c204cd0,8944236082153652223,0x7f9d6361f010,-10) 42.27us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
init_gesdd failed init

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD.
MKL_VERBOSE DGESDD(S,4398046513152,8796093023232,0x7f9d6261f010,8796093024256,0x7f9d6361f010,0x7f9d63621010,4398046513152,0x7f9d64621010,-4294966272,0x7fff7c204cd0,8944236082153652223,0x7f9d64e21010,-10) 8.10us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
init_gesdd failed init

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD.
MKL_VERBOSE DGESDD(S,4398046513152,8796093023232,0x7f9d6261f010,8796093024256,0x7f9d6361f010,0x7f9d63621010,4398046513152,0x7f9d64621010,-4294966272,0x7fff7c204cd0,8944236082153652223,0x7f9d64e21010,-10) 6.55us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
init_gesdd failed init
SVD of a 2048x1024 matrix in 0.00 s.
Segmentation fault (core dumped)

Runtime information:

MKL_VERBOSE oneMKL 2023.0 Update 2 Product build 20230613 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.70GHz ilp64 intel_thread
MKL_VERBOSE SDOT(2,0x5568bfa04550,1,0x5568bfa04550,1) 723.91us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
1.26.0
3.11.4 (main, Jul 26 2023, 10:09:23) [GCC 11.3.0]

Context for the issue:

Note that suitesparse 7.2.0 demos (not python) built with the same mkl installation all complete without error on the i9-10900K. scikit-sparse and scikits.odes tests (python based and linked against the same suitesparse libs), also now error and seg fault with numpy 1.26.0 but not numpy 1.25.2.

EDIT: different errors for scikit-sprase. e.g.

nose2 -v sksparse
... output truncated ...
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360b8,4,0x55820160dee8,4) 91ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360c0,4,0x55820160def0,4) 61ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360c8,4,0x55820160def8,4) 50ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
ok
sksparse.test_cholmod.test_cholesky_matrix_market ... /home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:189: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
answer = np.linalg.lstsq(X.todense(), y)[0]

Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELSD.
MKL_VERBOSE DGELSD(1374389535753,4294967616,72151610872037377,0x7f0d8b96c010,1033,0x7f0d8bbf1a10,140728898421769,0x7f0d8bbf3a58,0x7f0da1f65d80,139696528745944,0x7ffee9469e40,139698106269695,0x7ffee9469de0,-5) 31.53us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
init_gelsd failed init
FAIL
sksparse.test_cholmod.test_cholesky_smoke_test ... /home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:66: CholmodTypeConversionWarning: converting matrix of class dia_matrix to CSC format
f = cholesky(sparse.eye(10, 10))
dense
sparse
MKL_VERBOSE DAXPY(20,0x7ffee946acc8,0x5582018e0040,1,0x558201858840,1) 1.05us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
csr
/home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:76: CholmodTypeConversionWarning: converting matrix of class csr_matrix to CSC format
assert sparse.issparse(f(s_csr))
/home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:77: CholmodTypeConversionWarning: converting matrix of class csr_matrix to CSC format
assert_allclose(f(s_csr).todense(), s_csr.todense())
MKL_VERBOSE DAXPY(20,0x7ffee946acc8,0x5582018e0040,1,0x558201858840,1) 292ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
extract
ok
sksparse.test_cholmod.test_complex ... MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de50,4,0x55820160dee0,4) 341ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de58,4,0x55820160dee8,4) 147ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de60,4,0x55820160def0,4) 37ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de68,4,0x55820160def8,4) 37ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
Segmentation fault (core dumped)

@kernel-loophole
Copy link

kernel-loophole commented Oct 3, 2023

HI @stuart-nolan after Running the same code in colab i got the following output.

openblas64__info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
    runtime_library_dirs = ['/usr/local/lib']
blas_ilp64_opt_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
    runtime_library_dirs = ['/usr/local/lib']
openblas64__lapack_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
    runtime_library_dirs = ['/usr/local/lib']
lapack_ilp64_opt_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
    runtime_library_dirs = ['/usr/local/lib']
Supported SIMD extensions in this NumPy install:
    baseline = SSE,SSE2,SSE3
    found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
    not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
python compile flags
-Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g       -fstack-protector-strong -Wformat -Werror=format-security  -g -fwrapv -O2   
mkl: True
physical cores: 2
logical cores: 2
cpu min freq: 0.0
cpu current freq: 2200.15
cpu max freq: 0.0
load average: 1.0537109375 0.6015625 0.23486328125
Total memory: 13.61 GB
Available memory: 12.64 GB
Dotted two 4096x4096 matrices in 4.40 s.
Dotted two vectors of length 524288 in 1.39 ms.
SVD of a 2048x1024 matrix in 1.56 s.
Cholesky decomposition of a 2048x2048 matrix in 0.31 s.

Also if you wanted to run code here is link for colab notebook https://colab.research.google.com/drive/1hOXH9qPc0MU4RseMYJVCbmhuGTljus8q?usp=sharing

@stuart-nolan
Copy link
Author

@kernel-loophole
I'm not sure how your post is relevant to this bug report. Your output shows numpy using openblas, and it does not show the output detecting the cpu. If your suggesting using openblas, thank you - I am aware of that option.

@seberg
Copy link
Member

seberg commented Oct 3, 2023

You should tell us how exactly you installed numpy and MKL, since we do not distribute NumPy linked against MKL.

One possibility is that MKL might offer a 64bit and 32bit lapack and if you dynamically swap things out, weird things may happen (you seem to use a 64bit one, but not sure about symbol suffix). If we had a real issue in NumPy, I would expect more reports about it, so I think that is more likely and you need to clarify your setup. You can also try using: np.show_runtime() with threadpoolctl installed.

@rgommers
Copy link
Member

rgommers commented Oct 3, 2023

You should tell us how exactly you installed numpy and MKL

That would indeed be helpful. Up to 1.25.2, we linked MKL through the Single Dynamic Library (i.e., with -lmk_rt for libmkl_rt.so) in numpy.distutils. With 1.26.0, you're likely building with Meson, so both BLAS linking and SIMD flags had major changes.

@stuart-nolan
Copy link
Author

stuart-nolan commented Oct 3, 2023

MKL is installed via:
sudo apt install intel-oneapi-mkl intel-oneapi-mkl-devel

numpy install is via a custom bash script. I'll indicate below what is different between numpy 1.25.2 and 1.26.0.

the bash shell environment is set up with

       intel_setvars_sh="/opt/intel/oneapi/setvars.sh"                           
       if [ -f ${intel_setvars_sh} ]; then                                       
          source ${intel_setvars_sh} >/dev/null                                  
          export MKL_THREADING_LAYER=GNU                                         
          export MKL_INTERFACE_LAYER=GNU                                         
       fi                                                                        

excerpts from the "custom bash script"

     git clone https://github.com/numpy/numpy.git numpy --recursive         
     cd numpy
     latestTag=$(git describe --tags $(git rev-list --tags --max-count=1))       
     git checkout ${latestTag}                                              
     export CFLAGS="-fopenmp -m64 -march=native -DNDEBUG -O3 -Wl,--no-as-needed" 
     export CXXFLAGS="-fopenmp -m64 -march=native -DNDEBUG -O3 -Wl,--no-as-needed"
     export LDFLAGS="-ldl -lm"
     export FFLAGS="-fopenmp -m64 -march=native -DNDEBUG -O3"                    

For numpy 1.26.0 (on both the i5-4300U and the i9-10900K):

     python -m build -Csetup-args=-Dblas=MKL -Csetup-args=-Dlapack=MKL      
     pip install dist/numpy*.whl                                            

For numpy 1.25.2 (on both the i5-4300U and the i9-10900K):

     ncp1=$(expr $(cat /proc/cpuinfo | grep processor | wc -l) + 1)          
     NPY_BLAS_ORDER='MKL' NPY_LAPACK_ORDER='MKL' python setup.py build -j ${ncp1} install                                                                      

I'll check if mkl is using 64bit and 32bit lapack, if it is different between the i5-4300U and the i9-10900K, and report back in a bit. The mkl install does not change on the i9-10900K between the functional numpy 1.25.2 and non- functional numpy 1.26.0.

@rgommers
Copy link
Member

rgommers commented Oct 3, 2023

  • I imagine that the problem remains if you leave out the -march=native flag?
  • If you want to compare apples to apples, I think you want to install pkg-config and use -Dblas=mkl-sdl and -Dlapack=mkl-sdl (I don't know what CMake defaulted to in your current build)

@stuart-nolan
Copy link
Author

stuart-nolan commented Oct 3, 2023

  • leave out the -march=native flag
  • install pkg-config and use -Dblas=mkl-sdl and -Dlapack=mkl-sdl

give me a bit. I do need to check without -march=native - a hold over from my gentoo days and I'm not sure it does anything for me now.

pkg-config should be installed. I'll try mkl-sdl

@stuart-nolan
Copy link
Author

stuart-nolan commented Oct 3, 2023

Using
python -m build -Csetup-args=-Dblas=mkl-sdl -Csetup-args=-Dlapack=mkl-sdl
results in a functional numpy 1.26.0 on the i9-10900K.

FWIW, removing -march=native (as well as -Wl,--no-as-needed) has no effect.

I have seen your comment in #24808

a plain pip install numpy with only MKL installed in a standard system location ... should just work

and would like to use something like that. Past experience is that having 3 blas libraries installed (no amount of update-alternatives configuring seems to stop numpy from using openblas if it finds it) and wanting to tune cflags seems to get in the way. I hope this changes with meson.

EDIT: the i5-4300U device is indeed using mkl_rt with numpy 1.26.0 despite the same (non mkl-sdl) build arguments. Scipy also now uses meson (I adapted my original numpy/meson/mkl based build commands from that), and on the i5-4300U does not default to mkl_rt (as it did before meson). I had not noticed the difference nor have I observed issues with scipy yet.

The short of it is, this is beyond my ability to troubleshoot or fix. If there is something you would like me to test please let me know.

Thank you for your fast and helpful responses.

@rgommers
Copy link
Member

rgommers commented Oct 3, 2023

and would like to use something like that.

Great - should arrive soon (working on it right now), I hope days rather than weeks.

Thanks for digging into this @stuart-nolan. It sounds like you're good for now - I'll finish my updates to this detection machinery first, and will then circle back to this.

I think for the default -Dblas=mkl we can aim for mkl_rt, but in addition I'd like to support ILP64 builds and threading options - for those, we do need the split libraries.

@stuart-nolan
Copy link
Author

@seberg

One possibility is that MKL might offer a 64bit and 32bit lapack

I'd like to be able to show what MKL is doing at run time. Perhaps there are clues in the MKL_VERBOSE output; however I don't have the knowledge at this time to decode that.

For now, if I ldd the *.so files in the scipy package dir, I see the following (currently its the same on both devices)

ldd _fblas.cpython-311-x86_64-linux-gnu.so
libiomp5.so => /opt/intel/oneapi/compiler/2023.2.1/linux/compiler/lib/intel64_lin/libiomp5.so (0x00007fc9f8600000)
libmkl_core.so.2 => /opt/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_core.so.2 (0x00007fc9f4200000)
libmkl_gf_ilp64.so.2 => /opt/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_gf_ilp64.so.2 (0x00007fc9f3200000)
libmkl_intel_thread.so.2 => /opt/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_intel_thread.so.2 (0x00007fc9efa00000)

So all links go back to the intel64 directory (I do not even have a intel32 dir). Also it looks like mkl wants to use 64 bit integers (ilp64). IIRC, I saw similar with numpy 1.26.0 on the i9-10900K before reverting to mkl_rt which still points back to the intel64 dir. I do recall having issues trying to use 64 bit integers with mkl and another library (sundials?) but that is out of scope for this bug report.

Thanks for the tip on threadpoolctl and for responding.

@rgommers
Copy link
Member

This should be fixed now with gh-24893. It also adds a CI job with MKL that passed with layered LP64, layered ILP64 and LP64 SDL.

@stuart-nolan
Copy link
Author

stuart-nolan commented Oct 15, 2023

@rgommers

First, ty for the above.

If I build numpy 1.26.1 with -Csetup-args=-Dblas=mkl-sdl -Csetup-args=-Dlapack=mkl-sdl, I'm still good (i.e. I do not observe the segfault below).

However, if I build numpy 1.26.1 with

python -m build -Csetup-args=-Dblas=MKL -Csetup-args=-Dlapack=MKL -Csetup-args=-Duse-ilp64=true

(64 bit integers do not seem to be the issue here. I rebuilt numpy w/ and wo/ -Duse-ilp64=true. Scipy 1.11.3 built with -Csetup-args=-Dblas=MKL -Csetup-args=-Dlapack=MKL is apparently built with 64 bit integer support by default - and, yes, I've tried rebuilding scipy after each build/install of numpy 1.26.1)

I get a (new) segfualt from ipython after:

... <python code after # Cholesky Decomposition mentioned above - which now does not segfault>
import scipy
import scipy.sparse as sps
matrix = sps.rand(1000, 1000, density=0.001) + sps.diags(np.random.rand(1000))   
scipy.linalg.qr(matrix.toarray(),pivoting=True)

Segmentation fault (core dumped)

I'm not really sure where to go from here. Please advise.

@rgommers
Copy link
Member

Thanks for testing @stuart-nolan.

Scipy 1.11.3 built with -Csetup-args=-Dblas=MKL -Csetup-args=-Dlapack=MKL is apparently built with 64 bit integer support by default

That can't be (at least without further manual twekas), since support for that current isn't implemented in SciPy's meson.build files.

The use of =MKL (upper-case) means the detection is going through CMake, and I don't know off the top of my head what that defaults to - I'll check. It can be seen from the link arguments for MKL in build/build.ninja.

I'm not really sure where to go from here.

I expect that the problem is linking MKL in two different ways in numpy and scipy, and then MKL being unhappy after both packages get imported.

Can you add how you installed MKL? It matters, because various MKL distributions are broken in different ways. I tried to exclude picking those up in the NumPy build, but going via CMake bypasses that logic. If you use lower-case =mkl, that may help.

@stuart-nolan
Copy link
Author

stuart-nolan commented Oct 15, 2023

use lower-case =mkl

After building numpy with

python -m build -Csetup-args=-Dblas=mkl -Csetup-args=-Dlapack=mkl
Note scipy built with upper case MKL
python -m build -Csetup-args=-Dblas=MKL -Csetup-args=-Dlapack=MKL

using a lower case mkl with scipy gives the build time error:

../scipy/meson.build:161:9: ERROR: Dependency "mkl" not found, tried pkgconfig and cmake

I still see the same seg fault with scipy's qr (numpy's qr works as does sparseqr).

I cannot rule out user (aka my) ignorance about properly configuring numpy/scipy/MKL with the latest build system changes. As I have multiple options to build and install functional numpy/scipy/MKL, I don't want to create too much noise here.

That said, I am willing to provide logs or additional troubleshooting if it helps. E.g.:

Warnings observed during building scipy

[292/1628] Generating scipy/linalg/fblas_module with a custom command
[293/1628] Compiling C object scipy/linalg/fblas.cpython-311-x86_64-linux-gnu.so.p/meson-generated..__fblasmodule.c.o
scipy/linalg/_fblasmodule.c: In function ‘complex_double_from_pyobj’:
scipy/linalg/_fblasmodule.c:221:47: warning: passing argument 1 of ‘PyArray_DATA’ from incompatible pointer type [-Wincompatible-pointer-types]
221 | (*v).r = ((npy_cdouble *)PyArray_DATA(arr))->real;
| ^~~
| |
| PyObject * {aka struct _object *}
In file included from ../../../build-env-qszcrox7/lib/python3.11/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from ../../../build-env-qszcrox7/lib/python3.11/site-packages/numpy/core/include/numpy/arrayobject.h:5,
from ../../../build-env-qszcrox7/lib/python3.11/site-packages/numpy/f2py/src/fortranobject.h:13,
from scipy/linalg/_fblasmodule.c:23:
../../../build-env-qszcrox7/lib/python3.11/site-packages/numpy/core/include/numpy/ndarraytypes.h:1532:29: note: expected ‘PyArrayObject *’ {aka ‘struct tagPyArrayObject *’} but argument is of type ‘PyObject *’ {aka ‘struct _object *’}
1532 | PyArray_DATA(PyArrayObject *arr)
| ~~~~~~~~~~~~~~~^~~
scipy/linalg/_fblasmodule.c:222:47: warning: passing argument 1 of ‘PyArray_DATA’ from incompatible pointer type [-Wincompatible-pointer-types]
222 | (*v).i = ((npy_cdouble *)PyArray_DATA(arr))->imag;
| ^~~
| |
| PyObject * {aka struct _object *}
In file included from ../../../build-env-qszcrox7/lib/python3.11/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from ../../../build-env-qszcrox7/lib/python3.11/site-packages/numpy/core/include/numpy/arrayobject.h:5,
from ../../../build-env-qszcrox7/lib/python3.11/site-packages/numpy/f2py/src/fortranobject.h:13,
from scipy/linalg/_fblasmodule.c:23:
../../../build-env-qszcrox7/lib/python3.11/site-packages/numpy/core/include/numpy/ndarraytypes.h:1532:29: note: expected ‘PyArrayObject *’ {aka ‘struct tagPyArry/ndarraytypes.h:1532:29: note: expected ‘PyArrayObject *’ {aka ‘struct tagPyArrayObject *’} but argument is of type ‘PyObject *’ {aka ‘struct _object *’}
1532 | PyArray_DATA(PyArrayObject *arr)

Can you add how you installed MKL? It matters

Same as posted above:

MKL install on ubuntu 22.04 lts

MKL is installed via: sudo apt install intel-oneapi-mkl intel-oneapi-mkl-devel

...
the bash shell environment is set up with

       intel_setvars_sh="/opt/intel/oneapi/setvars.sh"                           
       if [ -f ${intel_setvars_sh} ]; then                                       
          source ${intel_setvars_sh} >/dev/null                                  
          export MKL_THREADING_LAYER=GNU                                         
          export MKL_INTERFACE_LAYER=GNU                                         
       fi                                                                        

If the above responses are missing something you would like to see wrt the MKL install, please let me know.

FWIW, the bash environment is set both for building numpy/scipy and run time/testing. I've tried unsetting the MKL_THREADING_LAYER and MKL_INTERFACE_LAYER variables at run time and I still observe the segfault. I do need to review why I set these - I seem to recall it's from past troubleshooting building scipy with MKL but I'm not sure about that.

That can't be (at least without further manual twekas), since support for that current isn't implemented in SciPy's meson.build files.

My comment about scipy "defaulting to using MKL 64 bit integers" is based on

cd ~/.pyenv/versions/p3114.mkl/lib/python3.11/site-packages/scipy/linalg/
ldd _fblas.cpython-311-x86_64-linux-gnu.so _flapack.cpython-311-x86_64-linux-gnu.so | grep ilp64
        libmkl_gf_ilp64.so.2 => /opt/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_gf_ilp64.so.2 (0x00007fccf5600000)
        libmkl_gf_ilp64.so.2 => /opt/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_gf_ilp64.so.2 (0x00007faa4e000000)

I'll defer to your greater knowledge and experience here and I'm also happy to grep through build/build.ninja if it comes to that.

@rgommers
Copy link
Member

MKL is installed via: sudo apt install intel-oneapi-mkl intel-oneapi-mkl-devel

That isn't part of the default Ubuntu package repository, so I think that's coming from https://apt.repos.intel.com/oneapi (as documented in this Intel install guide). That should be fine - and it's up-to-date (2023.2.0).

My comment about scipy "defaulting to using MKL 64 bit integers" is based on

Thanks. I'm surprised that that worked - guess there must be 32-bit symbols in that library too somehow.

Warnings observed during building scipy

Those warnings don't look related.

As I have multiple options to build and install functional numpy/scipy/MKL, I don't want to create too much noise here.

No worries at all - this is very useful feedback. Given how many ways there are to install MKL and that they're semi-broken in various ways, I'd like to make this as robust as possible.

This is all still in motion, so I understand it's hard to figure out what options to pass to select MKL. The goal here is ensuring that NumPy and SciPy are built with the exact same MKL config. There should be two ways right now:

  1. If you build numpy with -Dblas=mkl (and same for lapack) or you don't provide any -D flag and rely on auto-detection, then you want to build SciPy with -Dblas=mkl-dynamic-lp64-seq
  2. Or, build both numpy and scipy with =mkl-sdl

@stuart-nolan
Copy link
Author

stuart-nolan commented Oct 16, 2023

@rgommers

I suspect the scipy.linalg.qr segfault is due to MKL threading "interface". Using the hint you provided above for scipy/numpy/MKL build commands and inspecting meson_options.txt in the numpy build directory, I used:

numpy 1.26.1

python -m build -Csetup-args=-Dblas=mkl -Csetup-args=-Dlapack=mkl -Csetup-args=-Dmkl-threading=gomp

scipy 1.11.3

python -m build -Csetup-args=-Dblas=mkl-dynamic-lp64-gomp -Csetup-args=-Dlapack=mkl-dynamic-lp64-gomp

scipy.linalg.qr no longer segfaults.

Before, my builds above for both numpy and scipy were using iomp. For reasons I do not want to discus, I don't want intel threading (or 64 bit integers) but I tolerated them temporarily while working through this thinking it should not matter at this point. Unfortunately, threading matters.

More importantly, I most definitely want to know how to configure numpy/scipy/MKL for threading and 32/64 bit integers should my requirements change. I understand that this is a work in progress. I hope the final result is a consistent "MKL configuration interface" between numpy and scipy.

Thank you again for your help and the effort you are putting into this.

EDIT: should anyone else find their way here, the build command

python -m build -Csetup-args=-Dblas=mkl-dynamic-lp64-gomp -Csetup-args=-Dlapack=mkl-dynamic-lp64-gomp

currently works for both scipy and numpy. I prefer using mkl-dynamic-lp64-gomp since it explicitly states the entire intent in a compact way (and it's what numpy.show_config() outputs).

Lastly, it is possible the scipy.linalg.qr segfault I experienced is due to "user error." When building with iomp, I likely still exported -fopenmp in the cflags - I'm not sure if this is a compatible combination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants