Skip to content

gh-137583: Only lock the SSL context, not the SSL socket #137588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 10, 2025

Conversation

ZeroIntensity
Copy link
Member

@ZeroIntensity ZeroIntensity commented Aug 9, 2025

Remove locking around SSLSocket calls (and more importantly, calls to send and recv). I think they were unnecessary, as OpenSSL seems to be fine with those being called concurrently.

The test here is likely a bit flaky and I'm not really sure why. It deadlocks on the current main, but also seems to have periodic deadlocks even when e047a35 is not applied. It's either a bug in OpenSSL, an existing bug in ssl, or most likely, a bug in the ThreadedEchoServer used for the tests. I've added a time.sleep(0) to intentionally yield the GIL, and that seems to somewhat fix the race condition. (Test now works correctly without flakiness.)

@ZeroIntensity
Copy link
Member Author

!buildbot ^((?!refleak).)* PR$

@bedevere-bot

This comment was marked as outdated.

@picnixz
Copy link
Member

picnixz commented Aug 9, 2025

(You'll get failures due to the buildbots without any built-in hashes and with FIPS mode on)

I spent all that time trying to make sure the test worked and of course
it was the blurb that was wrong too.
@picnixz
Copy link
Member

picnixz commented Aug 9, 2025

The blurb was ok. We just don't document those methods because I think they are actually available simply as SSLSocket.send. Actually, I think they are wrongly documented (they are documented as methods of the module, not the class)

@ZeroIntensity
Copy link
Member Author

That's frustrating. I can't reproduce it emitting warnings locally, so I thought d45f6a8 fixed it.

@picnixz
Copy link
Member

picnixz commented Aug 9, 2025

Oh no, actually, recv() doesn't exist on SSLSockets. It's socket.socket.recv(). Same for send(). If you want you can do:

:meth:`SSLSocket.recv <socket.socket.recv>`

@ZeroIntensity
Copy link
Member Author

We'll deal with the blurb later, I avoided the reference altogether now. There's an actual test failure here: https://github.com/python/cpython/actions/runs/16849797228/job/47734333301

I think this is that stupid race condition. I'll look into fixing ThreadedEchoServer or switching the test to something more robust.

Apparently, the thread can start waiting at a point where the server
isn't ready to receive messages, which caused it to block indefinitely
for some reason. Before we create any threads or set any events, do a
simple exchange with the server to ensure that it's in its message loop.
This took way too damn long to figure out.
@ZeroIntensity
Copy link
Member Author

There we go, that race is finally fixed. Let's try the buildbots again.

@ZeroIntensity ZeroIntensity added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 9, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit 38e29f9 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F137588%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 9, 2025
@@ -0,0 +1,4 @@
Fix a deadlock introduced in the :mod:`ssl` module when a call to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention the specific Python 3.13 version the deadlock was introduced in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and 3.14.0rc1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backport didn't make it to 3.14, so it's fine there. Only 3.13.6 is affected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gpshead gpshead self-assigned this Aug 9, 2025
Apparently, settimeout() implicitly makes the socket non-blocking.
@ZeroIntensity
Copy link
Member Author

!buildbot windows

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit 6da074e 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F137588%2Fmerge

The command will test the builders whose names match following regular expression: windows

The builders matched are:

  • AMD64 Windows11 Refleaks PR
  • AMD64 Windows PGO NoGIL PR
  • AMD64 Windows11 Non-Debug PR
  • AMD64 Windows Server 2022 NoGIL PR
  • ARM64 Windows PR
  • AMD64 Windows11 Bigmem PR
  • AMD64 Windows10 PR
  • AMD64 Windows PGO PR
  • ARM64 Windows Non-Debug PR

@ZeroIntensity
Copy link
Member Author

@picnixz Did you want a chance to review this, or are you fine with me merging it?

@picnixz
Copy link
Member

picnixz commented Aug 10, 2025

Go ahead, I don't have a lot of time today and I'm leaving tomorrow for 10 days

Fix a deadlock introduced in 3.13.6 when a call to
:meth:`ssl.SSLSocket.recv <socket.socket.recv>` was blocked in one thread, while another
attempted to call :meth:`ssl.SSLSocket.send <socket.socket.send>` to unblock it in another
thread.
Copy link

@aaugustin aaugustin Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd simplify this as follows:

Fix a regression in 3.13.6 when a call to
:meth:`ssl.SSLSocket.recv <socket.socket.recv>` in a thread would block calling
:meth:`ssl.SSLSocket.send <socket.socket.send>` in another thread.

Indeed, the problem happens even without a dependency between recv and send.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we even need to mention send at all. If recv was blocked, it's not possible to access most methods on an SSLSocket, because they'll also try to acquire that same lock.

@ZeroIntensity ZeroIntensity enabled auto-merge (squash) August 10, 2025 13:33
@ZeroIntensity ZeroIntensity merged commit 55788a9 into python:main Aug 10, 2025
42 checks passed
@miss-islington-app
Copy link

Thanks @ZeroIntensity for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @ZeroIntensity, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 55788a90967e82a9ea05b45c06a293b46ec53d72 3.14

@miss-islington-app
Copy link

Sorry, @ZeroIntensity, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 55788a90967e82a9ea05b45c06a293b46ec53d72 3.13

@ZeroIntensity ZeroIntensity deleted the ssl-deadlock branch August 10, 2025 14:56
ZeroIntensity added a commit to ZeroIntensity/cpython that referenced this pull request Aug 10, 2025
ZeroIntensity added a commit to ZeroIntensity/cpython that referenced this pull request Aug 10, 2025
…pythonGH-137588)

Fixes a deadlock in 3.13.6.
(cherry picked from commit 55788a9)

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Aug 10, 2025

GH-137613 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Aug 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs backport to 3.14 bugs and security fixes release-blocker topic-SSL type-bug An unexpected behavior, bug, or error
Projects
Development

Successfully merging this pull request may close these issues.

5 participants