Skip to content

test_capi.test_basic_loop(): _PyInstruction_GetLength() assertion error on s390x Fedora Clang 3.x buildbot #107082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vstinner opened this issue Jul 22, 2023 · 15 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Jul 22, 2023

s390x Fedora Clang 3.x: https://buildbot.python.org/all/#/builders/3/builds/4312

Differences between the two builds:

commit e1d45b8ed43e1590862319fec33539f8adbc0849
Author: Guido van Rossum <guido@python.org>
Date:   Thu Jul 6 16:46:06 2023 -0700

     gh-104584: Handle EXTENDED_ARG in superblock creation (#106489)
    
    With test.

commit c60df361ce2d734148d503f4a711e67c110fe223
Author: Gregory P. Smith <greg@krypto.org>
Date:   Thu Jul 6 15:46:50 2023 -0700

    gh-90876: Restore the ability to import multiprocessing when `sys.executable` is `None` (#106464)
    
    Prevent `multiprocessing.spawn` from failing to *import* in environments
    where `sys.executable` is `None`.  This regressed in 3.11 with the addition
    of support for path-like objects in multiprocessing.
    
    Adds a test decorator to have tests only run when part of test_multiprocessing_spawn to `_test_multiprocessing.py` so we can start to avoid re-running the same not-global-state specific test in all 3 modes when there is no need.

commit 76fac7bce55302a8e9a524d72f5384fd89e6dfde
Author: Guido van Rossum <guido@python.org>
Date:   Thu Jul 6 15:45:56 2023 -0700

    gh-104584: Clean up and fix uops tests and fix crash (#106492)
    
    The uops test wasn't testing anything by default,
    and was failing when run with -Xuops.
    
    Made the two executor-related context managers global,
    so TestUops can use them (notably `with temporary_optimizer(opt)`).
    
    Made clear_executor() a little more thorough.
    
    Fixed a crash upon finalizing a uop optimizer,
    by adding a `tp_dealloc` handler.

Error:

0:05:26 load avg: 7.08 [435/447/1] test_capi crashed (Exit code -6) -- running: (...)
python: Python/instrumentation.c:262: int _PyInstruction_GetLength(PyCodeObject *, int): Assertion `opcode != 0' failed.
Fatal Python error: Aborted

Current thread 0x000003ff990770a0 (most recent call first):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.clang/build/Lib/test/test_capi/test_misc.py", line 2441 in get_first_executor
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.clang/build/Lib/test/test_capi/test_misc.py", line 2459 in test_basic_loop
(...)

0:07:22 load avg: 3.60 Re-running test_capi in verbose mode
(...)
test_gilstate_ensure_no_deadlock (test.test_capi.test_misc.TestThreadState.test_gilstate_ensure_no_deadlock) ... ok
test_gilstate_matches_current (test.test_capi.test_misc.TestThreadState.test_gilstate_matches_current) ... ok
test_thread_state (test.test_capi.test_misc.TestThreadState.test_thread_state) ... ok
python: Python/instrumentation.c:262: int _PyInstruction_GetLength(PyCodeObject *, int): Assertion `opcode != 0' failed.
Fatal Python error: Aborted

Current thread 0x000003ff854770a0 (most recent call first):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.clang/build/Lib/test/test_capi/test_misc.py", line 2441 in get_first_executor
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.clang/build/Lib/test/test_capi/test_misc.py", line 2459 in test_basic_loop
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.clang/build/Lib/unittest/case.py", line 589 in _callTestMethod
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.clang/build/Lib/unittest/case.py", line 634 in run
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.clang/build/Lib/unittest/case.py", line 690 in __call__
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.clang/build/Lib/unittest/suite.py", line 122 in run
(...)

cc @gvanrossum

Linked PRs

@vstinner
Copy link
Member Author

This issue looks like issue #107083.

@gvanrossum
Copy link
Member

@vstinner Thanks, I think I have a fix. Should I trigger the buildbots once CI passes, to verify? (It might waste fewer resources if you trigger that particular buildbot manually? I don't recall how.)

@vstinner
Copy link
Member Author

Should I trigger the buildbots once CI passes, to verify?

Since there are many broken buildbots, I failed to identify on which buildbots this error occurs. I don't recall if "s390x Fedora Clang 3.x" is connected to the GitHub "buildbot" checks. Well, you can try to add the buildbot label, and so if it's run or not :-)

I don'thave access to this machine to validate a fix manually :-(

@gvanrossum
Copy link
Member

My theory about how the debug fragment or 1 would cause the C assertion failure is flawed -- the code in PyUnstable_GetExecutor that calls _PyInstruction_GetLength always starts at the first instruction and only ever calls that function with a valid starting point, so opcode should never be zero.

However, maybe the culprit is byte order. The machine where this fails is big-endian. IIRC all machines in regular CI are little-endian (Intel, ARM).

I think this structure defined in Include/cpython/code.h is at fault:

typedef union {
    uint16_t cache;
    struct {
        uint8_t code;
        uint8_t arg;
    } op;
} _Py_CODEUNIT;

In particular, the struct is supposed to always place code in the first byte and arg in the second byte. But maybe Clang on s390x somehow has its own ideas? If somehow the bytes were swapped (and arg came before code, or some weird padding occurred) I could easily see how we'd end up with opcode == 0, failing the assert.

CC @markshannon

@vstinner
Copy link
Member Author

Is the issue fixed or not? This issue was closed.

@vstinner
Copy link
Member Author

I think this structure defined in Include/cpython/code.h is at fault:

Can you just exchange code and arg depending on the endianness? You can use PY_BIG_ENDIAN and PY_LITTLE_ENDIAN macros (they are equal to 0 or 1).

@vstinner
Copy link
Member Author

The buildbot still fails, so I reopen the issue: https://buildbot.python.org/all/#/builders/3/builds/4314

@vstinner vstinner reopened this Jul 23, 2023
jtcave pushed a commit to jtcave/cpython that referenced this issue Jul 23, 2023
…pi.test_misc (python#107085)

(Even though it doesn't look like it fixes pythongh-107082 -- see discussion there -- it still removes debug code that should never have been committed.)
mementum pushed a commit to mementum/cpython that referenced this issue Jul 23, 2023
…pi.test_misc (python#107085)

(Even though it doesn't look like it fixes pythongh-107082 -- see discussion there -- it still removes debug code that should never have been committed.)
vstinner added a commit to vstinner/cpython that referenced this issue Jul 25, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Jul 25, 2023
@vstinner
Copy link
Member Author

vstinner commented Jul 25, 2023

Thanks to the GCC Farm, I got access to a PPC64 big endian machine. I hacked PyCodeObject to dump the optimized bytecode:

test_pop_jump_if_not_none (test.test_capi.test_misc.TestUops.test_pop_jump_if_not_none) ...

co_code:
          0 RESUME                   0                                            
          2 LOAD_FAST                0                                            
          4 GET_ITER                                                              
    >>    6 FOR_ITER                 9 (to 28)                        
         10 STORE_FAST               1                                            
         12 LOAD_FAST                1                                            
         14 POP_JUMP_IF_NOT_NONE     2 (to 20)                            
         16 JUMP_BACKWARD            7 (to 6)                         
    >>   20 LOAD_CONST               1                                            
         22 STORE_FAST               1                                            
         24 JUMP_BACKWARD           11 (to 6)                             
    >>   28 END_FOR                                                               
         30 RETURN_CONST             0                                            
co_code_opt:                                                                      
          0 RESUME                   0                                                                                                                              
          2 LOAD_FAST                0                                                                                                                              
          4 GET_ITER                                                                                                                                                
    >>    6 FOR_ITER_LIST            9 (to 28)                                                                                                                      
         10 STORE_FAST               1                                                                                                                              
         12 LOAD_FAST                1                                                                                                                              
         14 POP_JUMP_IF_NOT_NONE     2 (to 20)                                                                                                                      
         16 JUMP_BACKWARD            7 (to 6)                                                                                                                       
    >>   20 LOAD_CONST               1                                                                                                                              
         22 STORE_FAST               1                                            
         24 ENTER_EXECUTOR           0 (to 26)                                
    >>   26 CACHE                                                                 
    >>   28 END_FOR                                                               
         30 RETURN_CONST             0                                            
GetExecutor: 0x7fff8bd17e40 - query offset=16                                
GetExecutor: i=0, i*2=0, code=151, arg=0
GetExecutor: i=1, i*2=2, code=124, arg=0                                                                                                                            
GetExecutor: i=2, i*2=4, code=68, arg=0
GetExecutor: i=3, i*2=6, code=111, arg=9
GetExecutor: i=5, i*2=10, code=125, arg=1 
GetExecutor: i=6, i*2=12, code=124, arg=1 
GetExecutor: i=7, i*2=14, code=128, arg=2 
GetExecutor: i=8, i*2=16, code=140, arg=7 
GetExecutor: i=10, i*2=20, code=100, arg=1
GetExecutor: i=11, i*2=22, code=125, arg=1
GetExecutor: i=12, i*2=24, code=230, arg=0
GetExecutor: i=13, i*2=26, code=0, arg=1
python: Python/instrumentation.c:262: _PyInstruction_GetLength: Assertion `opcode != 0' failed.

The assertion occurs in PyUnstable_GetExecutor() at the CACHE instruction (i=13, i*2=26, code=0, arg=1).

@vstinner
Copy link
Member Author

On little endian x86-64, I get:

test_pop_jump_if_not_none (test.test_capi.test_misc.TestUops.test_pop_jump_if_not_none) ... co_code:
          0 RESUME                   0
          2 LOAD_FAST                0
          4 GET_ITER
    >>    6 FOR_ITER                 9 (to 28)
         10 STORE_FAST               1
         12 LOAD_FAST                1
         14 POP_JUMP_IF_NOT_NONE     2 (to 20)
         16 JUMP_BACKWARD            7 (to 6)
    >>   20 LOAD_CONST               1
         22 STORE_FAST               1
         24 JUMP_BACKWARD           11 (to 6)
    >>   28 END_FOR
         30 RETURN_CONST             0
co_code_opt:
          0 RESUME                   0
          2 LOAD_FAST                0
          4 GET_ITER
    >>    6 FOR_ITER_LIST            9 (to 28)
         10 STORE_FAST               1
         12 LOAD_FAST                1
         14 POP_JUMP_IF_NOT_NONE     2 (to 20)
         16 JUMP_BACKWARD            7 (to 6)
    >>   20 LOAD_CONST               1
         22 STORE_FAST               1
         24 ENTER_EXECUTOR           0 (to 26)
    >>   26 POP_TOP
    >>   28 END_FOR
         30 RETURN_CONST             0
GetExecutor: 0x7fea3b387e40 - query offset=16
GetExecutor: i=0, i*2=0, code=151, arg=0
GetExecutor: i=1, i*2=2, code=124, arg=0
GetExecutor: i=2, i*2=4, code=68, arg=0
GetExecutor: i=3, i*2=6, code=111, arg=9
GetExecutor: i=5, i*2=10, code=125, arg=1
GetExecutor: i=6, i*2=12, code=124, arg=1
GetExecutor: i=7, i*2=14, code=128, arg=2
GetExecutor: i=8, i*2=16, code=140, arg=7
GetExecutor: i=10, i*2=20, code=100, arg=1
GetExecutor: i=11, i*2=22, code=125, arg=1
GetExecutor: i=12, i*2=24, code=230, arg=0
GetExecutor: i=13, i*2=26, code=1, arg=0
GetExecutor: i=14, i*2=28, code=4, arg=0
GetExecutor: i=15, i*2=30, code=121, arg=0

Here in the optimized code, the ENTER_EXECUTOR instruction is followed by: POP_TOP, END_FOR and RETURN_CONST. There is no CACHE instruction.

@gvanrossum
Copy link
Member

gvanrossum commented Jul 25, 2023

Thanks, that's super helpful! It seems that in _PyInstruction_GetLength() the instruction length for the instruction containing ENTER_EXECUTOR should dig the underlying JUMP_BACKWARD instruction and base the instruction width on that. This is a bit unfortunate but we have to do it.

@vstinner
Copy link
Member Author

On ppc64, when _PyOptimizer_BackEdge() is called, code->co_code_adaptive[24] is 140 (JUMP_BACKWARD) as expected, but code->co_code_adaptive[26] is 0 and code->co_code_adaptive[27] is 1 (can it be the POP_TOP instruction?).

It seems like the code was modified already and when POP_TOP was written, it was written in the wrong endian: {0, POP_TOP} instead of {POP_TOP, 0}.

Which code is responsible to write POP_TOP?

@gvanrossum
Copy link
Member

There is not really a POP_TOP instruction. There is a cache entry immediately after the JUMP_BACKWARD instruction containing a 16-bit counter that has the value 1. This cache entry is for counting how many times the jump has been executed, to decide whether to replace the JUMP_BACKWARD instruction with an ENTER_EXECUTOR instruction. On little-endian machines, the low byte of the counter is seen as the opcode, which is typically a valid instruction (but nonsensical in context). 1 is indeed POP_TOP (see opcode.py).

On a big-endian machine, the high byte of the counter is seen as the opcode, and as it is zero it triggers the assert.

The length calculation doesn't understand (yet) that it should special-case ENTER_EXECUTOR, so the length of the ENTER_EXECUTOR is seen as 1 unit rather than 2 (the length calculated must include the cache size), and on the next call we are trying to decode the cache entry as if it was an instruction. See my fix in gh-107256 for details.

@vstinner
Copy link
Member Author

There is not really a POP_TOP instruction. There is a cache entry immediately after the JUMP_BACKWARD instruction containing a 16-bit counter that has the value 1.

Ah right. Using hardware watchpoint in gdb I found here[1].cache += (1 << OPTIMIZER_BITS_IN_COUNTER); write in TARGET(JUMP_BACKWARD). I understood that it's not really an instruction and I noticed that dis.dis() gently hides this cache :-)

@gvanrossum
Copy link
Member

I found other code that could potentially run into the same problem. There's a lot of code that walks over code, instruction by instruction, using opcode = _PyOpcode_Deopt[opcode] and then index += _PyOpcode_Caches[opcode]. I opened gh-107265 for this.

gvanrossum added a commit that referenced this issue Jul 25, 2023
Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner
Copy link
Member Author

Fixed by 233b878

jtcave pushed a commit to jtcave/cpython that referenced this issue Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants