Skip to content

Clang undefined behavior sanitizer diagnostics (mostly uninteresting??) #17924

@jepler

Description

@jepler

Port, board and/or hardware

unix port, coverage build, x86_64 linux, clang-19

MicroPython version

v1.27.0-preview-15-g744270ac1b

Reproduction

perform the undefined behavior sanitizer build but with CC=clang, then try doing pretty much anything (such as starting micropython to the repl)

Expected behaviour

It works and is essentially free of undefined behavior diagnostics.

Observed behaviour

Several classes of diagnostic appear almost immediately.

I investigated two main classes of diagnostic:

  • Applying zero offsets to NULL pointers
  • Calling functions without exactly matching prototypes

Here's an example of each kind:

../../py/map.c:193:37: runtime error: applying zero offset to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../py/map.c:193:37 
../../py/stream.c:60:28: runtime error: call to function vfs_posix_file_write through pointer to incorrect function type 'unsigned long (*)(void *, void *, unsigned long, int *)'
/home/jepler/src/micropython/ports/unix/../../extmod/vfs_posix_file.c:129: note: vfs_posix_file_write defined here

These are both classes of "technically forbidden per the C specification but work fine almost always in practice".

The first can be replaced by an extra guard check, but at the possible cost of code. For example,

-    const mp_obj_t *kwargs = args + n_args;
+    const mp_obj_t *kwargs = args ? args + n_args : NULL;

As discussed in the old sanitizer threads, I think this specific behavior is set to become defined ( (NULL+0 is NULL) in a future C standard.

The second is harder to resolve. For instance, this technically means the trick of calling either a read or write func through a function pointer with the read type is incorrect (the prototypes differ only by whether the data argument is const:

    if (flags & MP_STREAM_RW_WRITE) {
        io_func = (io_func_t)stream_p->write;
    } else {
        io_func = stream_p->read;                                  
    }
... mp_uint_t out_sz = io_func(stream, buf, size, errcode); ...

I didn't find a fine grained method to turn off these diagnostics. For instance, the first one is under the general umbrella of "pointer overflow" checks, which includes actual overflow in pointer arithmetic like uint32_t *ptr; ptr[large] when large * sizeof(uint32_t) makes the address wrap around.

Additional Information

I was interested in clang ubsan because the AFLplusplus fuzzer can be run in a mode where it treats sanitizer diagnostics as crashes. However, it defaulted to using clang rather than gcc, so I discovered that it really doesn't like the current state of micropython and so it can't make any interesting findings.

Oh here's a bonus that I found when preparing this issue. It occurs when building an empty list (and, probably, tuple). It results because unsigned subtraction is being used but the intent is to grow the stack by an element. Technically it is an overflowed subtraction so it is undefined behavior. but not interesting. More uninteresting signed overflows appear in vm.c and touching any of them is likely to cause code growth without benefit.

Starting program: /home/jepler/src/micropython/ports/unix/build-coverage/micropython -c '[]'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
../../py/vm.c:832:24: runtime error: subtraction of unsigned offset from 0x7fffffffd920 overflowed to 0x7fffffffd928
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../py/vm.c:832:24 

Code of Conduct

Yes, I agree

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions