Description
Bug report
Bug description:
Support for inlining list/dict/set comprehensions in c3b595e introduced a CO_FAST_HIDDEN
, which is applied in combination with a different type code, for example CO_FAST_LOCAL
. However, when the code
object is copied via code.replace()
function call, this additional flag is lost; consequently, execution of the returned code object results in a bizarre-looking error.
Example:
Consider the following example program
# program.py
import sys
if len(sys.argv) != 2:
print(f"usage: {sys.argv[0]} <dir|locals|globals>")
sys.exit(1)
mode = sys.argv[1]
# The comprehension must use same variable name as the code that attempts `del`.
_allvalues = ''.join([myobj for myobj in ['a', 'b', 'c']])
myobj = None # for del below
if mode == 'dir':
print("DIR():", dir())
elif mode == 'locals':
print("LOCALS():", locals())
elif mode == 'globals':
print("GLOBALS():", globals())
del myobj
and the following script that compiles the program to byte-code .pyc:
# compile_script.py
import sys
import os
import struct
import marshal
import importlib.util
if len(sys.argv) < 3:
print(f"usage: {sys.argv[0]} <source> <dest> [0|1]")
sys.exit(1)
filename = sys.argv[1]
out_filename = sys.argv[2]
strip_co = False if len(sys.argv) < 4 else sys.argv[3] != '0'
with open(filename, 'rb') as fp:
src = fp.read()
co = compile(src, filename, 'exec')
if strip_co:
co = co.replace() # In real use-case, we would be replacing filename here
with open(out_filename, 'wb') as fp:
fp.write(importlib.util.MAGIC_NUMBER)
fp.write(struct.pack('<I', 0b01)) # PEP-552: hash-based pyc, check_source=False
fp.write(b'\00' * 8) # Zero the source hash
marshal.dump(co, fp)
For some context, the above example is a distilled reproduction of what is going in PyInstaller
and scipy.stats._distn_infrastructure
module in pyinstaller/pyinstaller#7992: the collected module is byte-compiled, and the absolute filename in the code-object is anonymized into environment-relative path via co.replace()
(see here for details).
But in the above example, no replacement is done, and so one would expect of co.replace()
to return an identical code object.
However, this is not the case (even though co == co.replace()
in python claims that they are identical):
$ python3.12 compile_script.py program.py compiled-orig.pyc 0 # Compile without co.replace()
$ python3.12 compile_script.py program.py compiled-copy.pyc 1 # Compile with co.replace()
$ sha256sum *.pyc
2e03af03bcbb41b3a6cc6f592f5143acf7d82edc089913504c1f8446764795e1 compiled-copy.pyc
5034955819efba0dc7ff3ee94101c1f6dfe33b102d547efc77577d77a99f1732 compiled-orig.pyc
Running the original version:
$ python3.12 compiled-orig.pyc globals
GLOBALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7fe7fb327830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-orig.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'globals', '_allvalues': 'abc', 'myobj': None}
$ python3.12 compiled-orig.pyc dir
DIR(): ['__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_allvalues', 'mode', 'myobj', 'sys']
$ python3.12 compiled-orig.pyc locals
LOCALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7f2846527830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-orig.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'locals', '_allvalues': 'abc', 'myobj': None}
Running the version with co.replace()
:
$ python3.12 compiled-copy.pyc globals
GLOBALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7fd7f1b27830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-copy.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'globals', '_allvalues': 'abc', 'myobj': None}
$ python3.12 compiled-copy.pyc dir
DIR(): ['__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_allvalues', 'mode', 'sys']
Traceback (most recent call last):
File "program.py", line 20, in <module>
del myobj
^^^^^
NameError: name 'myobj' is not defined
$ python3.12 compiled-copy.pyc locals
LOCALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7f8a35d27830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-copy.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'locals', '_allvalues': 'abc'}
Traceback (most recent call last):
File "program.py", line 20, in <module>
del myobj
^^^^^
NameError: name 'myobj' is not defined
Comparing the compiled-orig.pyc and compiled-copy.pyc in a hex editor, there is one byte of difference; its position corresponds to marshaled co_localspluskinds
, and the value is 0x30 (CO_FAST_LOCAL | CO_FAST_HIDDEN
) in original and 0x20 (CO_FAST_LOCAL
) in copy variant.
CPython versions tested on:
3.12
Operating systems tested on:
Linux, Windows