-
Notifications
You must be signed in to change notification settings - Fork 1.4k
DP-238 Test a fix for race condition with closed connection #1746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DP-238 Test a fix for race condition with closed connection #1746
Conversation
self.in_flight_requests[correlation_id] = (future, sent_time) | ||
with self._ifr_lock: | ||
if self.disconnected(): | ||
log.warning("%s: Race condition: connection already closed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are actually 2 problems:
- Connection is closed with "Protocol out of sync". I suspect I know the root cause but I am intentionally not fixing it now. When I deploy this version, this log message will tell me whether I'm right about it and I will know for sure. That's when I fix it.
- When connection is closed, it leaves in_flight_requests hanging.
The timeline is as follows:
- send() is called, sends a message
- before future is saved to
in_flight_requests
, another thread callsclose()
(such as because of problem number 1) close()
terminates all futures inin_flight_requests
, but there are none there- thread which called
send()
stores the future to disconnected connection'sin_flight_requests
- the future remains unresolved forever = worker deadlocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will deploy this on production and check the behavior. If my theory is right, I will eventually get to see this sequence of events:
- "protocol out of sync error"
- Connection close
- Race condition warning
- Worker reestabilishing connection successfully
Interesting theory -- let us know the results of your test! |
Reproduced this (while testing together with #1752): it's exactly as I suspected.
The worker would have deadlocked if the future was not closed by the new I'm going to improve this a little and then reopen in a new PR. |
Fantastic sleuthing |
Closing, resolved in #1757 |
This change is