Skip to content

Bug: binascii.a2b_uu incorrectly assumes padded bytes are always whitespace #100308

Open
@ajmedeio

Description

@ajmedeio

Bug Description

I was decoding some UUEncoded data when I encountered a 'Trailing Garbage' error from the binascii.a2b_uu function. After digging into Linux's uu decode implementation(L248) and other resources (linked below) I'm decently certain the python implementation is bugged.

The following is what I tried:

from binascii import a2b_uu
s = '%-@     !'
decoded = a2b_uu(s)

The expected output is:

print(decoded)  # b'6\x00\x00\x00\x00'

The actual output is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
binascii.Error: Trailing garbage

Notice there are 5 bytes in the expected output (b'6\x00\x00\x00\x00') because the % (first byte of input string, s) means 5 bytes of data follow (ascii code 37 - 32 = 5). UUEncoding requires output be divisible by 3 bytes so an extra padding character is added. In this case it's an !.

The python implementation assumes the padding is always whitespace. Different uuencoders will use different characters for padding though. I've seen three so far: , `, and !.

The following several lines of code are the issue

Proposed fix

Simply remove the following lines (279 - 296). Or if we really want the verification of padding we can include the '!' in the condition of valid padding chars. (The linked linux implementation does not verify padding, however.) And based on my research, there isn't a well defined padding character so we will be jumping to the same potentially false conclusion that we have here: believing we've accounted for all the padding characters that exist in the wild.

/*
** Finally, check that if there's anything left on the line
** that it's whitespace only.
*/
while( ascii_len-- > 0 ) {
    this_ch = *ascii_data++;
    /* Extra '`' may be written as padding in some cases */
    if ( this_ch != ' ' && this_ch != ' '+64 &&
         this_ch != '\n' && this_ch != '\r' ) {
        state = get_binascii_state(module);
        if (state == NULL) {
            return NULL;
        }
        PyErr_SetString(state->Error, "Trailing garbage");
        Py_DECREF(rv);
        return NULL;
    }
}

Problematically, this bug propagated up to the uu_codec decode implementation as well. See the following code

A comment indicates the caught exception and "workaround" are due to broken uuencoders. According to what I've read, it's the broken python binascii.a2b_uu that incorrectly assumes any padding bytes are or `.

Here are the sources for my understanding of uu encoding:
Examples of non whitespace padding
Wikipedia uuencoding
Busybox uudecode implementation

Following is an illustration that helped me find a sense of understanding:
uuencode-bug-explanation

[1] I couldn't find an RFC or other standards document so I looked for the earliest implementation I could find (1983 Linux implementation) along with the wikipedia entry.

In the meantime

If others encounter this issue I'm using the following workaround:

import binascii
from binascii import a2b_uu
from io import BytesIO

my_bytes = BytesIO()
line_bytes = b'%-@     !'
line = line_bytes.decode(encoding='ascii')
try:
    my_bytes.write(a2b_uu(line))
except binascii.Error as err:
    if 'trailing garbage' in str(err).lower():
        n_bytes = line_bytes[0] - 32
        assert n_bytes <= 45 and n_bytes <= len(line[1:])
        workaround_line = f'M{line[1:]}'  # replace first byte of UUEncoded line with max length specifier (M)
        data = a2b_uu(workaround_line)[:n_bytes]
        my_bytes.write(data)
    else:
        raise err

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions