-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: mkl errors and segfaults with numpy 1.26.0 on i9-10900K #24846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
HI @stuart-nolan after Running the same code in colab i got the following output.
Also if you wanted to run code here is link for colab notebook https://colab.research.google.com/drive/1hOXH9qPc0MU4RseMYJVCbmhuGTljus8q?usp=sharing |
@kernel-loophole |
You should tell us how exactly you installed numpy and MKL, since we do not distribute NumPy linked against MKL. One possibility is that MKL might offer a 64bit and 32bit lapack and if you dynamically swap things out, weird things may happen (you seem to use a 64bit one, but not sure about symbol suffix). If we had a real issue in NumPy, I would expect more reports about it, so I think that is more likely and you need to clarify your setup. You can also try using: |
That would indeed be helpful. Up to 1.25.2, we linked MKL through the Single Dynamic Library (i.e., with |
MKL is installed via: numpy install is via a custom bash script. I'll indicate below what is different between numpy 1.25.2 and 1.26.0. the bash shell environment is set up with
excerpts from the "custom bash script"
For numpy 1.26.0 (on both the i5-4300U and the i9-10900K):
For numpy 1.25.2 (on both the i5-4300U and the i9-10900K):
I'll check if mkl is using 64bit and 32bit lapack, if it is different between the i5-4300U and the i9-10900K, and report back in a bit. The mkl install does not change on the i9-10900K between the functional numpy 1.25.2 and non- functional numpy 1.26.0. |
|
give me a bit. I do need to check without pkg-config should be installed. I'll try |
Using FWIW, removing I have seen your comment in #24808
and would like to use something like that. Past experience is that having 3 blas libraries installed (no amount of EDIT: the i5-4300U device is indeed using mkl_rt with numpy 1.26.0 despite the same (non mkl-sdl) build arguments. Scipy also now uses meson (I adapted my original numpy/meson/mkl based build commands from that), and on the i5-4300U does not default to mkl_rt (as it did before meson). I had not noticed the difference nor have I observed issues with scipy yet. The short of it is, this is beyond my ability to troubleshoot or fix. If there is something you would like me to test please let me know. Thank you for your fast and helpful responses. |
Great - should arrive soon (working on it right now), I hope days rather than weeks. Thanks for digging into this @stuart-nolan. It sounds like you're good for now - I'll finish my updates to this detection machinery first, and will then circle back to this. I think for the default |
I'd like to be able to show what MKL is doing at run time. Perhaps there are clues in the MKL_VERBOSE output; however I don't have the knowledge at this time to decode that. For now, if I
So all links go back to the Thanks for the tip on threadpoolctl and for responding. |
This should be fixed now with gh-24893. It also adds a CI job with MKL that passed with layered LP64, layered ILP64 and LP64 SDL. |
First, ty for the above. If I build numpy 1.26.1 with However, if I build numpy 1.26.1 with
(64 bit integers do not seem to be the issue here. I rebuilt numpy w/ and wo/ -Duse-ilp64=true. Scipy 1.11.3 built with I get a (new) segfualt from ipython after:
I'm not really sure where to go from here. Please advise. |
Thanks for testing @stuart-nolan.
That can't be (at least without further manual twekas), since support for that current isn't implemented in SciPy's The use of
I expect that the problem is linking MKL in two different ways in numpy and scipy, and then MKL being unhappy after both packages get imported. Can you add how you installed MKL? It matters, because various MKL distributions are broken in different ways. I tried to exclude picking those up in the NumPy build, but going via CMake bypasses that logic. If you use lower-case |
After building numpy with
Note scipy built with upper case MKL
using a lower case mkl with scipy gives the build time error:
I still see the same seg fault with scipy's qr (numpy's qr works as does sparseqr). I cannot rule out user (aka my) ignorance about properly configuring numpy/scipy/MKL with the latest build system changes. As I have multiple options to build and install functional numpy/scipy/MKL, I don't want to create too much noise here. That said, I am willing to provide logs or additional troubleshooting if it helps. E.g.: Warnings observed during building scipy[292/1628] Generating scipy/linalg/fblas_module with a custom command
Same as posted above: MKL install on ubuntu 22.04 lts
If the above responses are missing something you would like to see wrt the MKL install, please let me know. FWIW, the bash environment is set both for building numpy/scipy and run time/testing. I've tried unsetting the MKL_THREADING_LAYER and MKL_INTERFACE_LAYER variables at run time and I still observe the segfault. I do need to review why I set these - I seem to recall it's from past troubleshooting building scipy with MKL but I'm not sure about that.
My comment about scipy "defaulting to using MKL 64 bit integers" is based on
I'll defer to your greater knowledge and experience here and I'm also happy to grep through |
That isn't part of the default Ubuntu package repository, so I think that's coming from https://apt.repos.intel.com/oneapi (as documented in this Intel install guide). That should be fine - and it's up-to-date (2023.2.0).
Thanks. I'm surprised that that worked - guess there must be 32-bit symbols in that library too somehow.
Those warnings don't look related.
No worries at all - this is very useful feedback. Given how many ways there are to install MKL and that they're semi-broken in various ways, I'd like to make this as robust as possible. This is all still in motion, so I understand it's hard to figure out what options to pass to select MKL. The goal here is ensuring that NumPy and SciPy are built with the exact same MKL config. There should be two ways right now:
|
I suspect the scipy.linalg.qr segfault is due to MKL threading "interface". Using the hint you provided above for scipy/numpy/MKL build commands and inspecting numpy 1.26.1
scipy 1.11.3
scipy.linalg.qr no longer segfaults. Before, my builds above for both numpy and scipy were using iomp. For reasons I do not want to discus, I don't want intel threading (or 64 bit integers) but I tolerated them temporarily while working through this thinking it should not matter at this point. Unfortunately, threading matters. More importantly, I most definitely want to know how to configure numpy/scipy/MKL for threading and 32/64 bit integers should my requirements change. I understand that this is a work in progress. I hope the final result is a consistent "MKL configuration interface" between numpy and scipy. Thank you again for your help and the effort you are putting into this. EDIT: should anyone else find their way here, the build command
currently works for both scipy and numpy. I prefer using Lastly, it is possible the scipy.linalg.qr segfault I experienced is due to "user error." When building with |
Describe the issue:
This is cpu architecture and numpy version dependent. I observe these issues on an i9-10900K with numpy 1.26.0. I do not observe an issue on the same i9-10900K and numpy 1.25.2 - all other factors unchanged. I do not observe any issues on an i5-4300U laptop (same version for os, python, numpy - 1.25.2 or 1.26.0, mkl, and test script). See context below for additional detail.
os: Ubuntu 22.04.3 LTS
python (in a virtual env): Python 3.11.4
note that I use "-march=native" amongst other flags for both python and numpy builds on both cpu's.
mkl: 2023.2.0
Reproduce the code example:
Error message:
Runtime information:
MKL_VERBOSE oneMKL 2023.0 Update 2 Product build 20230613 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.70GHz ilp64 intel_thread
MKL_VERBOSE SDOT(2,0x5568bfa04550,1,0x5568bfa04550,1) 723.91us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
1.26.0
3.11.4 (main, Jul 26 2023, 10:09:23) [GCC 11.3.0]
Context for the issue:
Note that suitesparse 7.2.0 demos (not python) built with the same mkl installation all complete without error on the i9-10900K. scikit-sparse and scikits.odes tests (python based and linked against the same suitesparse libs), also now error and seg fault with numpy 1.26.0 but not numpy 1.25.2.
EDIT: different errors for scikit-sprase. e.g.
nose2 -v sksparse
... output truncated ...
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360b8,4,0x55820160dee8,4) 91ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360c0,4,0x55820160def0,4) 61ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360c8,4,0x55820160def8,4) 50ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
ok
sksparse.test_cholmod.test_cholesky_matrix_market ... /home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:189: FutureWarning:
rcond
parameter will change to the default of machine precision timesmax(M, N)
where M and N are the input matrix dimensions.To use the future default and silence this warning we advise to pass
rcond=None
, to keep using the old, explicitly passrcond=-1
.answer = np.linalg.lstsq(X.todense(), y)[0]
Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELSD.
MKL_VERBOSE DGELSD(1374389535753,4294967616,72151610872037377,0x7f0d8b96c010,1033,0x7f0d8bbf1a10,140728898421769,0x7f0d8bbf3a58,0x7f0da1f65d80,139696528745944,0x7ffee9469e40,139698106269695,0x7ffee9469de0,-5) 31.53us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
init_gelsd failed init
FAIL
sksparse.test_cholmod.test_cholesky_smoke_test ... /home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:66: CholmodTypeConversionWarning: converting matrix of class dia_matrix to CSC format
f = cholesky(sparse.eye(10, 10))
dense
sparse
MKL_VERBOSE DAXPY(20,0x7ffee946acc8,0x5582018e0040,1,0x558201858840,1) 1.05us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
csr
/home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:76: CholmodTypeConversionWarning: converting matrix of class csr_matrix to CSC format
assert sparse.issparse(f(s_csr))
/home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:77: CholmodTypeConversionWarning: converting matrix of class csr_matrix to CSC format
assert_allclose(f(s_csr).todense(), s_csr.todense())
MKL_VERBOSE DAXPY(20,0x7ffee946acc8,0x5582018e0040,1,0x558201858840,1) 292ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
extract
ok
sksparse.test_cholmod.test_complex ... MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de50,4,0x55820160dee0,4) 341ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de58,4,0x55820160dee8,4) 147ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de60,4,0x55820160def0,4) 37ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de68,4,0x55820160def8,4) 37ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
Segmentation fault (core dumped)
The text was updated successfully, but these errors were encountered: