Skip to content

gh-112075: Make some dict operations thread-safe without GIL #112247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

chgnrdv
Copy link
Contributor

@chgnrdv chgnrdv commented Nov 18, 2023

#112075

For dict.__len__, _Py_atomic_load_ssize_relaxed is used to access PyDictObject ma_used field.

For the following methods

  • dict.fromkeys
  • dict.copy
  • dict_richcompare
  • dict.clear
  • dict.__sizeof__
  • dict.__or__
  • dict.__ior__
  • dict.__reversed__
  • dict.keys
  • dict.items
  • dict.values
  • dict.update (in dict_update_common)
  • dict.__init__ (in dict_update_common)
  • dict.__repr__

the critical section API is used, either in form of AC directive or macro.

For `dict.__len__`, use `_Py_atomic_load_ssize_relaxed` to access `PyDictObject` `ma_used` field.

For the following methods
* `dict.fromkeys`
* `dict.copy`
* `dict_richcompare`
* `dict.clear`
* `dict.__sizeof__`
* `dict.__or__`
* `dict.__ior__`
* `dict.__reversed__`
* `dict.keys`
* `dict.items`
* `dict.values`

use critical section API, either in form of AC directive or macro.
@@ -117,6 +117,7 @@ As a consequence of this, split keys have a maximum size of 16.
#include "pycore_call.h" // _PyObject_CallNoArgs()
#include "pycore_ceval.h" // _PyEval_GetBuiltin()
#include "pycore_code.h" // stats
#include "pycore_critical_section.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment like: #include "pycore_critical_section.h" // Py_BEGIN_CRITICAL_SECTION()

@@ -2746,6 +2755,7 @@ dict_items(PyDictObject *mp)
}

/*[clinic input]
@critical_section
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict_fromkeys_impl should not have a critical_section. This locks the type object, which isn't necessary.

We may need critical sections for some code paths of _PyDict_FromKeys, but that will need to be within _PyDict_FromKeys.

@@ -4610,14 +4641,15 @@ PyTypeObject PyDictRevIterKey_Type = {


/*[clinic input]
@critical_section
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want a critical section here. Creating the reverse iterator is thread-safe without any special handling, although iterating over the reverse iterator may require special handling.

return _PyDictView_New(dict, &PyDictKeys_Type);
PyObject *view;
Py_BEGIN_CRITICAL_SECTION(dict);
view = _PyDictView_New(dict, &PyDictKeys_Type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No critical section here. Creating the view objects, like _PyDictView_New (and the reverse iterator above) do not access any of the dictionary internals so they don't need to lock the dictionary.

return _PyDictView_New(dict, &PyDictItems_Type);
PyObject *view;
Py_BEGIN_CRITICAL_SECTION(dict);
view = _PyDictView_New(dict, &PyDictItems_Type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No critical section here.

return _PyDictView_New(dict, &PyDictValues_Type);
PyObject *view;
Py_BEGIN_CRITICAL_SECTION(dict);
view = _PyDictView_New(dict, &PyDictValues_Type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No critical section here

@@ -3511,7 +3529,9 @@ dict_setdefault_impl(PyDictObject *self, PyObject *key,
static PyObject *
dict_clear(PyDictObject *mp, PyObject *Py_UNUSED(ignored))
{
Py_BEGIN_CRITICAL_SECTION(mp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need public APIs like PyDict_Clear() to be thread-safe. The critical sections need to be pushed down into that API.

I suggest doing something like in colesbury/nogil-3.12@d896dfc8db:

  • Rename the current implementation of PyDict_Clear() and make it static (e.g. static void _dict_clear(...))
  • Make PyDict_Clear() a simple wrapper around _dict_clear() that adds critical section calls.
  • Just call PyDict_Clear() from this function dict_clear(...).

Py_ReprLeave((PyObject *)mp);
return PyUnicode_FromString("{}");
repr = PyUnicode_FromString("{}");
goto end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thin this may incorrectly call _PyUnicodeWriter_Dealloc() on an uninitialized writer below if repr is NULL (i.e., if PyUnicode_FromString returns an error).

For these functions with complex control flow, I think it's better to add a wrapper function than try to work Py_BEGIN_CRITICAL_SECTION() calls into the existing control flow. In other words:

  • Rename dict_repr to e.g. dict_repr_impl.
  • Add a new dict_repr that just calls dict_repr_impl under a Py_BEGIN_CRITICAL_SECTION() calls.

@erlend-aasland
Copy link
Contributor

@colesbury, is this PR still relevant, given @DinoV's recent work on dict?

@colesbury
Copy link
Contributor

I think dict thread-safety has been addressed by Dino's PRs and this PR can be closed now. Thanks for your work @chgnrdv.

@colesbury colesbury closed this Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants