Skip to content

gh-112087: Make list_repr and list_length to be thread-safe #114582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 26, 2024

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Jan 26, 2024

@corona10
Copy link
Member Author

corona10 commented Jan 26, 2024

I just worked on 2 methods because list_length is the first method that requires _Py_atomic_load_ssize_relaxed which creates fragmented implementation. until now, just using Py_BEGIN_CRITICAL_SECTION API or @critical_section annotation was enough, but from now it's not.

So from now, I would like to check other core devs opinions about this.

I expect that @serhiy-storchaka can propose a better solution :)
Also, @ambv can have other ideas about this.
The reference implementation is based on colesbury/nogil-3.12@df4c51f82b

Once we decide about the fragmented implementations with atomic API, we can start to implement the rest of things in a unified and satisfied way.

@corona10 corona10 requested a review from ambv January 26, 2024 08:16
@encukou
Copy link
Member

encukou commented Jan 26, 2024

This looks maintainable enough to me.

I can't quite check if it's “thread safe” -- AFAIK we don't really define what that is, yet.
One thing that's not clear to me is whether Py_SIZE itself should be atomic, or if all calls to it should be replaced by an atomic load (or in critical sections).

static Py_ssize_t
list_length(PyObject *a)
{
#ifdef Py_GIL_DISABLED
return _Py_atomic_load_ssize_relaxed(&(_PyVarObject_CAST(a)->ob_size));
#else
return Py_SIZE(a);
Copy link
Member Author

@corona10 corona10 Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@encukou said that making Py_SIZE to be thread-safe for free-threaded build would be beneficial rather than implementing list_length with macro if possible :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could. I don't know enough about the grand plan to know for sure :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pushing the atomic load down to Py_SIZE() is reasonable and will probably simplify call sites.

That was something I tried and then abandoned in nogil-3.9 because at the time the macro was also used as an l-value like Py_SIZE(ob) = 1. Fortunately, that's no longer a concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could just do the atomic load in PyList_GET_SIZE() and consistently use that within this file. I think I'd prefer that.

There are some loops that use Py_SIZE() on immutable objects that might slow down if we make Py_SIZE use an atomic load. For example, in codeobject.c:

while (entry_point < Py_SIZE(co) &&
_PyCode_CODE(co)[entry_point].op.code != RESUME) {
entry_point++;
}

The compiler will currently lift the Py_SIZE() load outside the loop. But it won't do that optimization with an atomic load.

It's hard to know if these sorts of things will make a difference overall, but it's often easier to avoid the potential performance issues in the first place than trying to find, benchmark, and fix the issues later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my opinion keeps changing as I read through listobject.c. Changing all the Py_SIZE() calls to PyList_GET_SIZE() seems like it would be a lot of noise. To be honest, I think there are a lot of reasonable approaches, and whatever you decide to do is fine with me. Here is my current thinking:

  • Most of the Py_SIZE() calls are reads from functions that will be within critical sections in the future. Those are fine as is.
  • If we read ob_size outside of a critical section, it should use an atomic load. Either directly or indirectly, such as by calling list_length().

For list_repr, my inclination would be to push the zero-size check down into the critical section, but using an atomic load also seems fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately, that's no longer a concern.

Thanks to @vstinner for fixing this! 👏

Copy link
Member Author

@corona10 corona10 Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. well, from the view of maintaining implementation detail as not changed as possible for the list object itself, updating PyList_GET_SIZE() to be an atomic operation is worth doing. It's good negotiation point I guess.

If we need to make Py_SIZE() as atomic operation, we can handle it later.

@serhiy-storchaka
Copy link
Member

I think that either all reads and writes of ob_size should be atomic, or they all should be in critical sections.

@corona10
Copy link
Member Author

corona10 commented Jan 26, 2024

@colesbury
What about making Py_SIZE() to be atomic instead of touching this from runtime implementation?
A side effect is that there will be some minor effect to something like a tuple, but not that huge IIUC.

If we are okay with this, I will submit a new PR.

@corona10
Copy link
Member Author

@colesbury

Updated! The implementations become more clear and easy to maintain.
Now I understand our consensus about how to handle atomic operations in runtime.
I will submit PRs with similar approaches.

@colesbury
Copy link
Contributor

@serhiy-storchaka wrote:

I think that either all reads and writes of ob_size should be atomic, or they all should be in critical sections.

I think we should follow slightly different pattern in list:

  • Writes to ob_size should be atomic AND in critical sections
  • Reads from ob_size should be either atomic OR in critical sections

Exceptions: initialization, deallocation, and certain special functions can use plain accesses without critical sections (e.g., PyList_New(), list_dealloc, list_traverse).

The motivation behind this is that we will have some accesses outside of critical sections (see "Optimistic dict and list Access Summary" in PEP 703), so we can't use critical sections everywhere, but we will use them for most operations.

At the same time, I think we should prefer plain, non-atomic reads from within critical sections. We could conservatively use atomic reads and still be correct, but my experience is that adding atomic operations where they are not necessary tends to hide data races from thread sanitizer.

Note that writes to ob_size have to be at atomic even if they are within critical sections because there might be concurrent reads.

@serhiy-storchaka
Copy link
Member

You at least need to make writing the list size atomic.

@colesbury
Copy link
Contributor

You at least need to make writing the list size atomic.

I agree, but I don't think we should do that in this PR. I think it's easier to write and review the PRs organized around functions than around fields (i.e., "fix list_repr" instead of "fix ob_size"). It keeps the changes smaller and close together.

The current changes look sufficient to me if we later address all the functions that write to ob_size.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM

@corona10
Copy link
Member Author

The current changes look sufficient to me if we later address all the functions that write to ob_size.

Yeah, I will prepare another PR for this.

@corona10
Copy link
Member Author

Thank you @serhiy-storchaka, @encukou and @colesbury about discussing how we handle atomic operation issue :)
It's now become clear :)

@corona10 corona10 merged commit f9c5056 into python:main Jan 26, 2024
@corona10 corona10 deleted the gh-112087-repr-length branch January 26, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants