Skip to content

Power9 instruction gets executed on Power8 #28124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
limburgher opened this issue Jan 7, 2025 · 10 comments
Closed

Power9 instruction gets executed on Power8 #28124

limburgher opened this issue Jan 7, 2025 · 10 comments
Labels
00 - Bug 57 - Close? Issues which may be closable unless discussion continued component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@limburgher
Copy link
Contributor

Describe the issue:

From: https://bugzilla.redhat.com/show_bug.cgi?id=2336127

In 2024-12-21, we started to see an MLIR test failure on LLVM daily snapshots running on Rawhide on Power8.
We can only reproduce this issue on Rawhide.

After investigating, I found a numpy function using a Power9/Power ISA 3.0 instruction (mtvsrws):

Disassembly:
(gdb) disas
Dump of assembler code for function HALF_exp2(char**, npy_intp const*, npy_intp const*, void*):
...
0x00007fffe3bad210 <+160>: addis r9,r9,14336
0x00007fffe3bad214 <+164>: add r7,r7,r9
=> 0x00007fffe3bad218 <+168>: mtvsrws vs1,r7
0x00007fffe3bad21c <+172>: xscvspdpn vs1,vs1
0x00007fffe3bad220 <+176>: bl 0x7fffe38b3580 <0000001a.plt_call.exp2f@@GLIBC_2.27>

Backtrace:
(gdb) bt
#0 HALF_exp2 (args=, dimensions=, steps=, __NPY_UNUSED_TAGGEDdata=)
at ../numpy/_core/src/umath/loops_umath_fp.dispatch.c.src:182
#1 0x00007fffe3af615c in generic_wrapped_legacy_loop (__NPY_UNUSED_TAGGEDcontext=, data=, dimensions=,
strides=, auxdata=) at ../numpy/_core/src/umath/legacy_array_method.c:98
#2 0x00007fffe3b0d2f0 in try_trivial_single_output_loop (context=0x7fffffff8410, op=0x7fffffff8b30, order=,
errormask=) at ../numpy/_core/src/umath/ufunc_object.c:969
#3 PyUFunc_GenericFunctionInternal (ufunc=, ufuncimpl=, operation_descrs=0x7fffffff8730, op=0x7fffffff8b30,
casting=NPY_SAME_KIND_CASTING, order=, wheremask=0x0) at ../numpy/_core/src/umath/ufunc_object.c:2237
#4 ufunc_generic_fastcall (ufunc=, args=, len_args=, kwnames=, outer=)
at ../numpy/_core/src/umath/ufunc_object.c:4530
#5 0x00007ffff79e9e30 in PyObject_Vectorcall () from /lib64/libpython3.13.so.1.0

Reproducible: Always

Reproduce the code example:

N/A

Error message:

No response

Python and NumPy Versions:

Python 3.13, NumPy 2.2.1

Runtime Environment:

No response

Context for the issue:

No response

@seberg seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jan 8, 2025
@seberg
Copy link
Member

seberg commented Jan 8, 2025

MLIR test failure on LLVM daily snapshots

Haven't looked into it closely, thought it was SIMD related first, but not sure now.

Unless I am missing build related changes (totally possible), there were no 2.2.1 changes in this code and the code isn't hand-vectorized (very few changes in 2.2.1 and this didn't even change since 2.1 at least).

So if you only build with "LLVM daily" when there is a new release of NumPy, then the change here may also be the LLVM daily part?

@tuliom
Copy link

tuliom commented Jan 8, 2025

@seberg The code with this instruction is in Numpy.
Keep in mind that LLVM daily is not being used to build Numpy.
It's still possible the compiler used to build Numpy (gcc 14.2.1) generated that instruction by mistake. We need more information before taking conclusions.

Another important thing: numpy on Fedora got updated from version 1.26.4 to version to 2.2.0 around December 12th. Has there been any changes to this code between those versions?

@limburgher would you be able to run a numpy scratch build on Fedora with more verbosity so that it prints the entire compiler commands, please? I'd like to see which parameters are being passed to GCC when compiling numpy/_core/src/umath/loops_umath_fp.dispatch.c.src.

@seberg
Copy link
Member

seberg commented Jan 8, 2025

from version 1.26.4 to version to 2.2.0 around December 12th

Ah, that is a big step and changes seem very plausible. So SIMD seems not unlikely (i.e. VSX3 is being used now), @seiko2plus would probably know without looking around.

If you can easily, it would help to to print:

np.show_runtime()

and you could also try to run with NPY_DISABLE_CPU_FEATURES=VSX3 (note that I don't know if this is a VSX3 instruction, but it would make sense), to see if we are dynamically dispatching to an incorrect target.

@tuliom
Copy link

tuliom commented Jan 8, 2025

@seberg It crashed in both cases:

$ NPY_DISABLE_CPU_FEATURES=VSX3 PYTHONPATH=/build/llvm-build/tools/mlir/python_packages/mlir_core  /usr/bin/python3.13 /root/src/llvm-project/mlir/test/python/pass_manager.py
/root/src/llvm-project/mlir/test/python/pass_manager.py:397: SyntaxWarning: invalid escape sequence '\-'
  connector = "\-- " if i == len(entries) - 1 else "|-- "
Illegal instruction (core dumped)
$ cat test.py 
#!/usr/bin/python3

import numpy

numpy.show_runtime()

numpy.show_config()
$ NPY_DISABLE_CPU_FEATURES=VSX3 ./test.py 
Illegal instruction (core dumped)

I wonder if there are Power9 instructions being executed very early.

@tuliom
Copy link

tuliom commented Jan 8, 2025

@limburgher I learned how to build numpy with higher verbosity. I apologize for the noise.

The log is available at https://kojipkgs.fedoraproject.org//work/tasks/1824/127671824/build.log

Most, if not all files, are built with -mcpu=power9 despite the this distribution being expected to work on Power8.

@seberg
Copy link
Member

seberg commented Jan 8, 2025

Nice, I think that narrows it down. It would seem that these tests for CPU capabilities misfire for some reason:

VSX3 = mod_features.new(
  'VSX3', 3, implies: VSX2, args: {'val': '-mcpu=power9', 'match': '.*[mcpu=|vsx].*'},
  detect: {'val': 'VSX3', 'match': 'VSX.*'},
  test_code: files(source_root + '/numpy/distutils/checks/cpu_vsx3.c')[0],
  extra_tests: {
    'VSX3_HALF_DOUBLE': files(source_root + '/numpy/distutils/checks/extra_vsx3_half_double.c')[0]
  }
)

unless this is cross compiling on a power9 machine and the baseline architecture needs to be explicitly set?

@limburgher
Copy link
Contributor Author

@tuliom no worries. :)
We do explicitly set power9 in the spec for ppc64le, but only for rhel10+.

@tuliom
Copy link

tuliom commented Jan 8, 2025

@limburgher Thanks!

I believe I found the issue. That rhel10 code is also being executed on Fedora.
I proposed a fix downstream here: https://src.fedoraproject.org/rpms/numpy/pull-request/51

I believe we can fix this upstream issue.

@seberg seberg added the 57 - Close? Issues which may be closable unless discussion continued label Jan 8, 2025
@limburgher
Copy link
Contributor Author

Excellent, thank you! I'll keep and eye on that PR.

@ngoldbaum
Copy link
Member

Closing since this came down to a bug in fedora. Please reopen if this is a bug in NumPy and I'm misunderstanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 57 - Close? Issues which may be closable unless discussion continued component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

No branches or pull requests

4 participants