Skip to content

code.replace() fails to preserve CO_FAST_HIDDEN flag on locals #110543

Open
@rokm

Description

@rokm

Bug report

Bug description:

Support for inlining list/dict/set comprehensions in c3b595e introduced a CO_FAST_HIDDEN, which is applied in combination with a different type code, for example CO_FAST_LOCAL. However, when the code object is copied via code.replace() function call, this additional flag is lost; consequently, execution of the returned code object results in a bizarre-looking error.

Example:

Consider the following example program

# program.py
import sys

if len(sys.argv) != 2:
    print(f"usage: {sys.argv[0]} <dir|locals|globals>")
    sys.exit(1)
mode = sys.argv[1]

# The comprehension must use same variable name as the code that attempts `del`.
_allvalues = ''.join([myobj for myobj in ['a', 'b', 'c']])

myobj = None  # for del below
if mode == 'dir':
    print("DIR():", dir())
elif mode == 'locals':
    print("LOCALS():", locals())
elif mode == 'globals':
    print("GLOBALS():", globals())

del myobj

and the following script that compiles the program to byte-code .pyc:

# compile_script.py
import sys
import os
import struct
import marshal
import importlib.util

if len(sys.argv) < 3:
    print(f"usage: {sys.argv[0]} <source> <dest> [0|1]")
    sys.exit(1)

filename = sys.argv[1]
out_filename = sys.argv[2]

strip_co = False if len(sys.argv) < 4 else sys.argv[3] != '0'

with open(filename, 'rb') as fp:
    src = fp.read()

co = compile(src, filename, 'exec')
if strip_co:
    co = co.replace()  # In real use-case, we would be replacing filename here

with open(out_filename, 'wb') as fp:
    fp.write(importlib.util.MAGIC_NUMBER)
    fp.write(struct.pack('<I', 0b01))  # PEP-552: hash-based pyc, check_source=False
    fp.write(b'\00' * 8)  # Zero the source hash
    marshal.dump(co, fp)

For some context, the above example is a distilled reproduction of what is going in PyInstaller and scipy.stats._distn_infrastructure module in pyinstaller/pyinstaller#7992: the collected module is byte-compiled, and the absolute filename in the code-object is anonymized into environment-relative path via co.replace() (see here for details).

But in the above example, no replacement is done, and so one would expect of co.replace() to return an identical code object.

However, this is not the case (even though co == co.replace() in python claims that they are identical):

$ python3.12 compile_script.py program.py compiled-orig.pyc 0  # Compile without co.replace()
$ python3.12 compile_script.py program.py compiled-copy.pyc 1  # Compile with co.replace()
$ sha256sum *.pyc
2e03af03bcbb41b3a6cc6f592f5143acf7d82edc089913504c1f8446764795e1  compiled-copy.pyc
5034955819efba0dc7ff3ee94101c1f6dfe33b102d547efc77577d77a99f1732  compiled-orig.pyc

Running the original version:

$ python3.12 compiled-orig.pyc globals
GLOBALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7fe7fb327830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-orig.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'globals', '_allvalues': 'abc', 'myobj': None}

$ python3.12 compiled-orig.pyc dir
DIR(): ['__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_allvalues', 'mode', 'myobj', 'sys']

$ python3.12 compiled-orig.pyc locals
LOCALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7f2846527830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-orig.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'locals', '_allvalues': 'abc', 'myobj': None}

Running the version with co.replace():

$ python3.12 compiled-copy.pyc globals
GLOBALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7fd7f1b27830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-copy.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'globals', '_allvalues': 'abc', 'myobj': None}

$ python3.12 compiled-copy.pyc dir
DIR(): ['__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_allvalues', 'mode', 'sys']
Traceback (most recent call last):
  File "program.py", line 20, in <module>
    del myobj
        ^^^^^
NameError: name 'myobj' is not defined

$ python3.12 compiled-copy.pyc locals
LOCALS(): {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourcelessFileLoader object at 0x7f8a35d27830>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '[...]/compiled-copy.pyc', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'mode': 'locals', '_allvalues': 'abc'}
Traceback (most recent call last):
  File "program.py", line 20, in <module>
    del myobj
        ^^^^^
NameError: name 'myobj' is not defined

Comparing the compiled-orig.pyc and compiled-copy.pyc in a hex editor, there is one byte of difference; its position corresponds to marshaled co_localspluskinds, and the value is 0x30 (CO_FAST_LOCAL | CO_FAST_HIDDEN) in original and 0x20 (CO_FAST_LOCAL) in copy variant.

CPython versions tested on:

3.12

Operating systems tested on:

Linux, Windows

Linked PRs

Metadata

Metadata

Assignees

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions