Skip to content

gh-127065: Make methodcaller thread-safe and re-entrant #127245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions Lib/test/test_operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -482,6 +482,8 @@ def bar(self, f=42):
return f
def baz(*args, **kwds):
return kwds['name'], kwds['self']
def return_arguments(self, *args, **kwds):
return args, kwds
a = A()
f = operator.methodcaller('foo')
self.assertRaises(IndexError, f, a)
Expand All @@ -498,6 +500,17 @@ def baz(*args, **kwds):
f = operator.methodcaller('baz', name='spam', self='eggs')
self.assertEqual(f(a), ('spam', 'eggs'))

many_positional_arguments = tuple(range(10))
many_kw_arguments = dict(zip('abcdefghij', range(10)))
f = operator.methodcaller('return_arguments', *many_positional_arguments)
self.assertEqual(f(a), (many_positional_arguments, {}))

f = operator.methodcaller('return_arguments', **many_kw_arguments)
self.assertEqual(f(a), ((), many_kw_arguments))

f = operator.methodcaller('return_arguments', *many_positional_arguments, **many_kw_arguments)
self.assertEqual(f(a), (many_positional_arguments, many_kw_arguments))

def test_inplace(self):
operator = self.module
class C(object):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Make :func:`operator.methodcaller` thread-safe and re-entrant safe.
140 changes: 66 additions & 74 deletions Modules/_operator.c
Original file line number Diff line number Diff line change
Expand Up @@ -1595,36 +1595,59 @@ static PyType_Spec attrgetter_type_spec = {
typedef struct {
PyObject_HEAD
PyObject *name;
PyObject *xargs; // reference to arguments passed in constructor
PyObject *args;
PyObject *kwds;
PyObject **vectorcall_args; /* Borrowed references */
PyObject *vectorcall_kwnames;
Comment on lines 1600 to 1601
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ownership here seems a bit complicated and I think it can be simplified. As I understand it, vectorcall_kwnames only exists to ensure that some entries in vectorcall_args stay alive.

Instead, I'd suggest:

  • Make PyObject *vectorcall_args a tuple (that holds strong references to its contents as usual)
  • Get rid of vectorcall_kwnames
  • Use _PyTuple_ITEMS for fast access to the contents of the tuple (for memcpy())
  • Visit vectorcall_args in methodcaller_traverse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vectorcall_kwnames is needed as an argument for PyObject_VectorcallMethod in methodcaller_vectorcall (https://github.com/python/cpython/blob/main/Modules/_operator.c#L1666), so we cannot get rid of it.

The ownership is not too hard I think: the objects in vectorcall_args have references borrowed from either mc->args or (the keys from) mc->kwds. I added a comment to clarify this.

Making the vectorcall_args a tuple is still an option though. It requires a bit more memory and a tiny bit of computation in the initialization. It would be the C equivalent of vectorcall_args = args + tuple(kwds). I'll work it out in a branch to see how it compares

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in that case don't worry about it unless you prefer it as a tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff between the two approaches is this:

eendebakpt/cpython@methodcaller_ft...eendebakpt:cpython:methodcaller_ft_v2

What is nice about making vectorcall_args a tuple is that if there are no keyword arguments, we can reuse mc->args. It does require more operations in the construction though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either approach is fine! My guess is that the vast majority of uses of methodcaller() do not use keyword arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running benchmarks shows what is to be expected from the implementations: using a tuple for vectorcall_args is a bit slower in initializing, except when there are no keyword arguments (since then we reuse the arg tuple). Differences are small though.

Since using a tuple leads to cleaner code and the majority of uses is without keywords I slightly prefer the tuple approach. I will open a new PR for it.

vectorcallfunc vectorcall;
} methodcallerobject;

#ifndef Py_GIL_DISABLED
static int _methodcaller_initialize_vectorcall(methodcallerobject* mc)

#define _METHODCALLER_MAX_ARGS 8

static PyObject *
methodcaller_vectorcall(methodcallerobject *mc, PyObject *const *args,
size_t nargsf, PyObject* kwnames)
{
PyObject* args = mc->xargs;
if (!_PyArg_CheckPositional("methodcaller", PyVectorcall_NARGS(nargsf), 1, 1)
|| !_PyArg_NoKwnames("methodcaller", kwnames)) {
return NULL;
}
assert(mc->vectorcall_args != NULL);

Py_ssize_t number_of_arguments = PyTuple_GET_SIZE(mc->args) +
(mc->vectorcall_kwnames ? PyTuple_GET_SIZE(mc->vectorcall_kwnames) : 0);

PyObject *tmp_args[_METHODCALLER_MAX_ARGS];
tmp_args[0] = args[0];
assert(1 + number_of_arguments <= _METHODCALLER_MAX_ARGS);
memcpy(tmp_args + 1, mc->vectorcall_args, sizeof(PyObject *) * number_of_arguments);

return PyObject_VectorcallMethod(mc->name, tmp_args,
(1 + PyTuple_GET_SIZE(mc->args)) | PY_VECTORCALL_ARGUMENTS_OFFSET,
mc->vectorcall_kwnames);
}

static int
_methodcaller_initialize_vectorcall(methodcallerobject* mc)
{
PyObject* args = mc->args;
PyObject* kwds = mc->kwds;

Py_ssize_t nargs = PyTuple_GET_SIZE(args);
assert(nargs > 0);
mc->vectorcall_args = PyMem_Calloc(
nargs + (kwds ? PyDict_Size(kwds) : 0),
sizeof(PyObject*));
if (!mc->vectorcall_args) {
PyErr_NoMemory();
return -1;
}
/* The first item of vectorcall_args will be filled with obj later */
if (nargs > 1) {
memcpy(mc->vectorcall_args, PySequence_Fast_ITEMS(args),
nargs * sizeof(PyObject*));
}
if (kwds) {
// the objects in mc->vectorcall_args have references borrowed
// from mc->args and the keys from mc->kwds
memcpy(mc->vectorcall_args, _PyTuple_ITEMS(args),
nargs * sizeof(PyObject *)); // borrowed references
if (kwds && PyDict_Size(kwds)) {
const Py_ssize_t nkwds = PyDict_Size(kwds);

mc->vectorcall_kwnames = PyTuple_New(nkwds);
if (!mc->vectorcall_kwnames) {
return -1;
Expand All @@ -1640,33 +1663,10 @@ static int _methodcaller_initialize_vectorcall(methodcallerobject* mc)
else {
mc->vectorcall_kwnames = NULL;
}
return 1;
}


static PyObject *
methodcaller_vectorcall(
methodcallerobject *mc, PyObject *const *args, size_t nargsf, PyObject* kwnames)
{
if (!_PyArg_CheckPositional("methodcaller", PyVectorcall_NARGS(nargsf), 1, 1)
|| !_PyArg_NoKwnames("methodcaller", kwnames)) {
return NULL;
}
if (mc->vectorcall_args == NULL) {
if (_methodcaller_initialize_vectorcall(mc) < 0) {
return NULL;
}
}
mc->vectorcall = (vectorcallfunc)methodcaller_vectorcall;

assert(mc->vectorcall_args != 0);
mc->vectorcall_args[0] = args[0];
return PyObject_VectorcallMethod(
mc->name, mc->vectorcall_args,
(PyTuple_GET_SIZE(mc->xargs)) | PY_VECTORCALL_ARGUMENTS_OFFSET,
mc->vectorcall_kwnames);
return 1;
}
#endif


/* AC 3.5: variable number of arguments, not currently support by AC */
static PyObject *
Expand Down Expand Up @@ -1694,25 +1694,30 @@ methodcaller_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
if (mc == NULL) {
return NULL;
}
mc->vectorcall = NULL;
mc->vectorcall_args = NULL;
mc->vectorcall_kwnames = NULL;
mc->kwds = Py_XNewRef(kwds);

Py_INCREF(name);
PyInterpreterState *interp = _PyInterpreterState_GET();
_PyUnicode_InternMortal(interp, &name);
mc->name = name;

mc->xargs = Py_XNewRef(args); // allows us to use borrowed references
mc->kwds = Py_XNewRef(kwds);
mc->vectorcall_args = 0;

mc->args = PyTuple_GetSlice(args, 1, PyTuple_GET_SIZE(args));
if (mc->args == NULL) {
Py_DECREF(mc);
return NULL;
}

#ifdef Py_GIL_DISABLED
// gh-127065: The current implementation of methodcaller_vectorcall
// is not thread-safe because it modifies the `vectorcall_args` array,
// which is shared across calls.
mc->vectorcall = NULL;
#else
mc->vectorcall = (vectorcallfunc)methodcaller_vectorcall;
#endif
Py_ssize_t vectorcall_size = PyTuple_GET_SIZE(args)
+ (kwds ? PyDict_Size(kwds) : 0);
if (vectorcall_size < (_METHODCALLER_MAX_ARGS)) {
if (_methodcaller_initialize_vectorcall(mc) < 0) {
Py_DECREF(mc);
return NULL;
}
}

PyObject_GC_Track(mc);
return (PyObject *)mc;
Expand All @@ -1722,11 +1727,11 @@ static void
methodcaller_clear(methodcallerobject *mc)
{
Py_CLEAR(mc->name);
Py_CLEAR(mc->xargs);
Py_CLEAR(mc->args);
Py_CLEAR(mc->kwds);
if (mc->vectorcall_args != NULL) {
PyMem_Free(mc->vectorcall_args);
mc->vectorcall_args = 0;
mc->vectorcall_args = NULL;
Py_CLEAR(mc->vectorcall_kwnames);
}
}
Expand All @@ -1745,7 +1750,7 @@ static int
methodcaller_traverse(methodcallerobject *mc, visitproc visit, void *arg)
{
Py_VISIT(mc->name);
Py_VISIT(mc->xargs);
Py_VISIT(mc->args);
Py_VISIT(mc->kwds);
Py_VISIT(Py_TYPE(mc));
return 0;
Expand All @@ -1765,15 +1770,7 @@ methodcaller_call(methodcallerobject *mc, PyObject *args, PyObject *kw)
if (method == NULL)
return NULL;


PyObject *cargs = PyTuple_GetSlice(mc->xargs, 1, PyTuple_GET_SIZE(mc->xargs));
if (cargs == NULL) {
Py_DECREF(method);
return NULL;
}

result = PyObject_Call(method, cargs, mc->kwds);
Py_DECREF(cargs);
result = PyObject_Call(method, mc->args, mc->kwds);
Py_DECREF(method);
return result;
}
Expand All @@ -1791,7 +1788,7 @@ methodcaller_repr(methodcallerobject *mc)
}

numkwdargs = mc->kwds != NULL ? PyDict_GET_SIZE(mc->kwds) : 0;
numposargs = PyTuple_GET_SIZE(mc->xargs) - 1;
numposargs = PyTuple_GET_SIZE(mc->args);
numtotalargs = numposargs + numkwdargs;

if (numtotalargs == 0) {
Expand All @@ -1807,7 +1804,7 @@ methodcaller_repr(methodcallerobject *mc)
}

for (i = 0; i < numposargs; ++i) {
PyObject *onerepr = PyObject_Repr(PyTuple_GET_ITEM(mc->xargs, i+1));
PyObject *onerepr = PyObject_Repr(PyTuple_GET_ITEM(mc->args, i));
if (onerepr == NULL)
goto done;
PyTuple_SET_ITEM(argreprs, i, onerepr);
Expand Down Expand Up @@ -1859,14 +1856,14 @@ methodcaller_reduce(methodcallerobject *mc, PyObject *Py_UNUSED(ignored))
{
if (!mc->kwds || PyDict_GET_SIZE(mc->kwds) == 0) {
Py_ssize_t i;
Py_ssize_t newarg_size = PyTuple_GET_SIZE(mc->xargs);
PyObject *newargs = PyTuple_New(newarg_size);
Py_ssize_t callargcount = PyTuple_GET_SIZE(mc->args);
PyObject *newargs = PyTuple_New(1 + callargcount);
if (newargs == NULL)
return NULL;
PyTuple_SET_ITEM(newargs, 0, Py_NewRef(mc->name));
for (i = 1; i < newarg_size; ++i) {
PyObject *arg = PyTuple_GET_ITEM(mc->xargs, i);
PyTuple_SET_ITEM(newargs, i, Py_NewRef(arg));
for (i = 0; i < callargcount; ++i) {
PyObject *arg = PyTuple_GET_ITEM(mc->args, i);
PyTuple_SET_ITEM(newargs, i + 1, Py_NewRef(arg));
}
return Py_BuildValue("ON", Py_TYPE(mc), newargs);
}
Expand All @@ -1884,12 +1881,7 @@ methodcaller_reduce(methodcallerobject *mc, PyObject *Py_UNUSED(ignored))
constructor = PyObject_VectorcallDict(partial, newargs, 2, mc->kwds);

Py_DECREF(partial);
PyObject *args = PyTuple_GetSlice(mc->xargs, 1, PyTuple_GET_SIZE(mc->xargs));
if (!args) {
Py_DECREF(constructor);
return NULL;
}
return Py_BuildValue("NO", constructor, args);
return Py_BuildValue("NO", constructor, mc->args);
}
}

Expand Down
Loading