Skip to content

gh-132097: allow AC to disable fastcall convention to avoid UBSan failures #131605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 18, 2025

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Mar 23, 2025

We need a way to explicitly disable fastcall convention for non-new methods. Let me explain why. Consider the following case:

/*[clinic input]
@classmethod
_testclinic.TestClass.__new__ as varpos_no_fastcall
    *args: tuple
[clinic start generated code]*/

What clinic generates is two signatures:

PyObject *varpos_no_fastcall(PyTypeObject *type, PyObject *args, PyObject *kwargs);
PyObject *varpos_no_fastcall_impl(PyTypeObject *type, PyObject *args);

At runtime, the function being called will be varpos_no_fastcall. In particular, we need to make it a METH_VARARGS | METH_KEYWORD | METH_CLASS. However, and this is where the runtime UB happens, the fact that we are having METH_VARARGS in methodobject.c::cfunction_call() implies that it will be used as follows:

static PyObject *
cfunction_call(PyObject *func, PyObject *args, PyObject *kwargs)
{
    assert(kwargs == NULL || PyDict_Check(kwargs));

    PyThreadState *tstate = _PyThreadState_GET();
    assert(!_PyErr_Occurred(tstate));

    int flags = PyCFunction_GET_FLAGS(func);
    if (!(flags & METH_VARARGS)) {
        /* If this is not a METH_VARARGS function, delegate to vectorcall */
        return PyVectorcall_Call(func, args, kwargs);
    }

    /* For METH_VARARGS, we cannot use vectorcall as the vectorcall pointer
     * is NULL. This is intentional, since vectorcall would be slower. */
    PyCFunction meth = PyCFunction_GET_FUNCTION(func);
    PyObject *self = PyCFunction_GET_SELF(func);

    PyObject *result;
    if (flags & METH_KEYWORDS) {
        result = _PyCFunctionWithKeywords_TrampolineCall(
            (*(PyCFunctionWithKeywords)(void(*)(void))meth),
            self, args, kwargs);
    }
    else {
        if (kwargs != NULL && PyDict_GET_SIZE(kwargs) != 0) {
            _PyErr_Format(tstate, PyExc_TypeError,
                          "%.200s() takes no keyword arguments",
                          ((PyCFunctionObject*)func)->m_ml->ml_name);
            return NULL;
        }
        result = _PyCFunction_TrampolineCall(meth, self, args);
    }
    return _Py_CheckFunctionResult(tstate, func, result, NULL);
}

The important bits are:

    if (flags & METH_KEYWORDS) {
        result = _PyCFunctionWithKeywords_TrampolineCall(
            (*(PyCFunctionWithKeywords)(void(*)(void))meth),
            self, args, kwargs);
    }

As you can see, self is being passed as a PyObject * to the called method. But, here, meth is varpos_no_fastcall, which expects a PyTypeObject * instead, hence the undefined behaviour. Now, the unfortunate part is that we must use all flags METH_VARARGS | METH_KEYWORD | METH_CLASS here.

One could wonder: why not using something else as a base classmethod instead of __new__? well.. the answer is that if a classmethod that is not __new__ is generated by AC, then that method will use a METH_FASTCALL convention, which entirely defeats the purpose of the test.

For efficiency purposes (and that's what we want obviously), if we use

/*[clinic input]
@classmethod
_testclinic.TestClass.varpos_no_fastcall as varpos_no_fastcall
    *args: tuple
[clinic start generated code]*/

AC will generate:

PyObject *varpos_no_fastcall(PyObject *type, PyObject *const *args, Py_ssize_t nargs)
PyObject *varpos_no_fastcall_impl(PyObject *type, PyObject *args)

and we will also have some autogenerated METHODDEF macro containing METH_FASTCALL|METH_CLASS flags. We also cannot just use varpos_no_fastcall without the METH_FASTCALL flag as its signature wouldn't be compatible for cfunction_call which expects either of the two forms:

PyObject *meth(PyObject *self, PyObject *args);
PyObject *meth(PyObject *self, PyObject *args, PyObject *kwargs);

So the solution I came up with is a generic @disable directive that can disable the fastcall convention explicitly even for non-__new__ class methods:

/*[clinic input]
@disable fastcall
@classmethod
_testclinic.TestClass.varpos_no_fastcall as varpos_no_fastcall
    *args: tuple
[clinic start generated code]*/

AC will generate:

PyObject *varpos_no_fastcall(PyObject *type, PyObject *args);
PyObject *varpos_no_fastcall_impl(PyTypeObject *type, PyObject *args);

and a METHODDEF macro with METH_VARARGS|METH_CLASS. Note that now varpos_no_fastcall has the expected signature and can be correctly called at runtime by cfunction_call. I've explained all of this in the comments of the tested functions (_testclinic.c). Since it's part of the tests, I think we can safely keep those comments.

@picnixz
Copy link
Member Author

picnixz commented Mar 23, 2025

cc @vstinner @erlend-aasland

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes two things: remove cast and add @disable. It changes 119 files so it's hard to review. You should split your PR int two PRs.

I'm not excited by the removal of the cast since it impacts tons of files (all of them?). I'm +0 on it.

@picnixz
Copy link
Member Author

picnixz commented Mar 24, 2025

I'm not excited by the removal of the cast since it impacts tons of files (all of them?). I'm +0 on it.

It helped me for grepping contents but I'll cancel that one in this PR and make it in a different one (after everything is done). Indeed, let's focus on actually fixing something.

@vstinner
Copy link
Member

The PR title seems to be outdated.

@picnixz picnixz changed the title gh-111178: remove un-necessary casts in clinic autogenerated code gh-111178: allow AC to disable fastcall convention to avoid UBSan failures Mar 24, 2025
@picnixz picnixz marked this pull request as ready for review March 24, 2025 10:07
@picnixz picnixz requested a review from erlend-aasland as a code owner March 24, 2025 10:07
@picnixz picnixz requested a review from vstinner March 24, 2025 11:01
@encukou
Copy link
Member

encukou commented Mar 24, 2025

It seems to me that rather than disabling fastcall, what's needed here is to force a particular calling convention. That is, if Clinic changes from fastcall to something else as the new best way of doing things, these functions will still need to stay METH_VARARGS|METH_CLASS. So I'd expect something like this, to be used whenever the generated is called by non-clinic code:

@calling_convention METH_VARARGS|METH_CLASS

Disclaimer: I haven't looked into Clinic to check how hard this would be to implement; I'm only commenting at the “API level”.
(But, the feature could be initially limited to METH_VARARGS|METH_CLASS only.)

@picnixz
Copy link
Member Author

picnixz commented Mar 24, 2025

to force a particular calling convention

That's what I first thought about but as I'm really not familiar with clinic and its assumptions, it was faster to disable fastcall explicitly as there is a fastcall boolean that is determines solely by whether it's __init__/__new__ or not. If @erlend-aasland has an idea on how to force a calling convention easily, then I'm also fine!

@vstinner
Copy link
Member

vstinner commented Apr 2, 2025

It seems like test_clinic pass on Python built with -fsanitize=function. Is this change really needed?

@picnixz
Copy link
Member Author

picnixz commented Apr 4, 2025

It seems like test_clinic pass on Python built with -fsanitize=function. Is this change really needed?

When I tested this, there was a runtime UB that I caught. I've used: -fsanitize=undefined and not just -fsanitize=function and I also used --with-undefined-behaviour for configure (maybe it's doing the same). I'll try it again this evening or tomorrow to see if this is still needed.

Note that I've caugh the UB when running the entire test suite, so I don't know if it was test_clinic that caused the UB.

@picnixz
Copy link
Member Author

picnixz commented Apr 4, 2025

I can confirm that I still have the UB on main:

./python -m test test_clinic
Using random seed: 1409063504
0:00:00 load avg: 9.45 Run 1 test sequentially in a single process
0:00:00 load avg: 9.45 [1/1] test_clinic
Objects/methodobject.c:551:18: runtime error: call to function posonly_poskw_varpos_no_fastcall through pointer to incorrect function type 'struct _object *(*)(struct _object *, struct _object *, struct _object *)'
/$HOME/lib/python/cpython/./Modules/clinic/_testclinic.c.h:4174: note: posonly_poskw_varpos_no_fastcall defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Obje

My clang version:

clang version 18.1.8
Target: x86_64-suse-linux
Thread model: posix
InstalledDir: /usr/bin

The configure command:

./configure --with-undefined-behavior-sanitizer --with-pydebug --prefix="$(pwd)/build" CC=clang LD=clang CFLAGS="-fsanitize=undefined -fno-sanitize-recover"  LDFLAGS="-fsanitize=undefined -fno-sanitize-recover"

@picnixz
Copy link
Member Author

picnixz commented Apr 4, 2025

Note that I still have a UB on test_xml_etree_c, though I don't know why:

./python -m test test_xml_etree_c
Using random seed: 644741740
0:00:00 load avg: 3.09 Run 1 test sequentially in a single process
0:00:00 load avg: 3.09 [1/1] test_xml_etree_c
Modules/expat/xmlparse.c:6779:5: runtime error: call to function expat_default_handler through pointer to incorrect function type 'void (*)(void *, const char *, int)'
/$HOME/lib/python/cpython/./Modules/_elementtree.c:3212: note: expat_default_handler defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Modules/expat/xmlparse.c:6779:

I thought you fixed this one? (I can comment on the XML PR if you prefer)

EDIT: Fixed in #132265.

@picnixz picnixz marked this pull request as draft April 4, 2025 17:15
@vstinner
Copy link
Member

vstinner commented Apr 4, 2025

Aha, I only tested -fsanitize=function, not -fsanitize=undefined. Would you mind to open a new issue, follow-up of gh-111178, for the remaining issues?

@picnixz
Copy link
Member Author

picnixz commented Apr 4, 2025

Yes, I can

@picnixz picnixz changed the title gh-111178: allow AC to disable fastcall convention to avoid UBSan failures gh-132097: allow AC to disable fastcall convention to avoid UBSan failures Apr 4, 2025
markshannon and others added 2 commits April 5, 2025 11:36
A new extension module, `_hmac`, now exposes the HACL* HMAC (formally verified) implementation.

The HACL* implementation is used as a fallback implementation when the OpenSSL implementation of HMAC
is not available or disabled. For now, only named hash algorithms are recognized and SIMD support provided
by HACL* for the BLAKE2 hash functions is not yet used.
@picnixz picnixz force-pushed the fix/ubsan/clinic-test-111178 branch from 0b34b63 to 24475b1 Compare April 5, 2025 09:36
@picnixz

This comment was marked as resolved.

@picnixz picnixz force-pushed the fix/ubsan/clinic-test-111178 branch from 24475b1 to 65c64bc Compare April 5, 2025 09:38
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@picnixz picnixz self-assigned this Apr 17, 2025
@picnixz picnixz merged commit 50e518e into python:main Apr 18, 2025
54 checks passed
@picnixz picnixz deleted the fix/ubsan/clinic-test-111178 branch April 18, 2025 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants