-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
test_capi.test_basic_loop(): _PyInstruction_GetLength() assertion error on s390x Fedora Clang 3.x buildbot #107082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This issue looks like issue #107083. |
@vstinner Thanks, I think I have a fix. Should I trigger the buildbots once CI passes, to verify? (It might waste fewer resources if you trigger that particular buildbot manually? I don't recall how.) |
Since there are many broken buildbots, I failed to identify on which buildbots this error occurs. I don't recall if "s390x Fedora Clang 3.x" is connected to the GitHub "buildbot" checks. Well, you can try to add the buildbot label, and so if it's run or not :-) I don'thave access to this machine to validate a fix manually :-( |
My theory about how the debug fragment However, maybe the culprit is byte order. The machine where this fails is big-endian. IIRC all machines in regular CI are little-endian (Intel, ARM). I think this structure defined in Include/cpython/code.h is at fault:
In particular, the CC @markshannon |
Is the issue fixed or not? This issue was closed. |
Can you just exchange code and arg depending on the endianness? You can use PY_BIG_ENDIAN and PY_LITTLE_ENDIAN macros (they are equal to 0 or 1). |
The buildbot still fails, so I reopen the issue: https://buildbot.python.org/all/#/builders/3/builds/4314 |
…pi.test_misc (python#107085) (Even though it doesn't look like it fixes pythongh-107082 -- see discussion there -- it still removes debug code that should never have been committed.)
…pi.test_misc (python#107085) (Even though it doesn't look like it fixes pythongh-107082 -- see discussion there -- it still removes debug code that should never have been committed.)
Thanks to the GCC Farm, I got access to a PPC64 big endian machine. I hacked PyCodeObject to dump the optimized bytecode:
The assertion occurs in PyUnstable_GetExecutor() at the |
On little endian x86-64, I get:
Here in the optimized code, the |
Thanks, that's super helpful! It seems that in |
On ppc64, when _PyOptimizer_BackEdge() is called, It seems like the code was modified already and when POP_TOP was written, it was written in the wrong endian: {0, POP_TOP} instead of {POP_TOP, 0}. Which code is responsible to write POP_TOP? |
There is not really a On a big-endian machine, the high byte of the counter is seen as the opcode, and as it is zero it triggers the assert. The length calculation doesn't understand (yet) that it should special-case |
Ah right. Using hardware watchpoint in gdb I found |
I found other code that could potentially run into the same problem. There's a lot of code that walks over code, instruction by instruction, using |
Co-authored-by: Victor Stinner <vstinner@python.org>
Fixed by 233b878 |
…ython#107256) Co-authored-by: Victor Stinner <vstinner@python.org>
s390x Fedora Clang 3.x: https://buildbot.python.org/all/#/builders/3/builds/4312
Differences between the two builds:
Error:
cc @gvanrossum
Linked PRs
The text was updated successfully, but these errors were encountered: