Skip to content

Pickle BINSTRING incorrect data type for size #135321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Legoclones opened this issue Jun 10, 2025 · 0 comments
Open

Pickle BINSTRING incorrect data type for size #135321

Legoclones opened this issue Jun 10, 2025 · 0 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@Legoclones
Copy link
Contributor

Legoclones commented Jun 10, 2025

Bug report

Bug description:

The BINSTRING opcode in Pickle has two arguments: a "4-byte little-endian signed int" length, and a string of that many bytes (see the code comment for BINSTRING in pickletools). Since it is signed, this means that any provided value over 0x7fffffff would be interpreted as a negative number. The Python pickle implementation specifically treats it as a signed 32-bit length and checks to see if the length is < 0:

cpython/Lib/pickle.py

Lines 1454 to 1458 in 754e7c9

def load_binstring(self):
# Deprecated BINSTRING uses signed 32-bit length
len, = unpack('<i', self.read(4))
if len < 0:
raise UnpicklingError("BINSTRING pickle has negative byte count")

However, the C pickle implementation runs calc_binsize(s, 4) for BINSTRING and returns a Py_ssize_t. Since Py_ssize_t data types are the same size as the compiler's size_t type (PEP0353), this means a Py_ssize_t is 64-bits long on 64-bit systems. Since the size variable here is also a Py_ssize_t, that means the threshold for negative values is much higher.

cpython/Modules/_pickle.c

Lines 5546 to 5558 in a58026a

Py_ssize_t size;
char *s;
if (_Unpickler_Read(self, st, &s, nbytes) < 0)
return -1;
size = calc_binsize(s, nbytes);
if (size < 0) {
PyErr_Format(st->UnpicklingError,
"BINSTRING exceeds system's maximum size of %zd bytes",
PY_SSIZE_T_MAX);
return -1;
}

This is all just the background to illustrate that because size is not an int, a pickle with the BINSTRING opcode using a length > 0x7fffffff will fail in the Python implementation (since it's negative), but deserialize just fine in the C implementation.

The following payload will demonstrate the difference:

payload:      b'T\x00\x00\x00\x80'+b'a'*0x80000000 + b'.'

pickle:       FAILURE BINSTRING pickle has negative byte count
_pickle.c:    aaaaaaaaaaaaaaaaaaa....
pickletools:  pickletools failed string4 byte count < 0: -2147483648

The required payload does need to be 2GB in size which is very large, but not impossible on modern systems.

Note that the LONG4 opcode is in a similar situation, except the output for calc_binint() is an int data type so this issue does not exist there.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

@Legoclones Legoclones added the type-bug An unexpected behavior, bug, or error label Jun 10, 2025
@picnixz picnixz added the stdlib Python modules in the Lib dir label Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

2 participants