-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Fix lambda timeout race condition #12465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LocalStack Community integration with Pro 2 files ± 0 2 suites ±0 1h 27m 58s ⏱️ - 23m 41s Results for commit 46754dc. ± Comparison against base commit 9383d50. This pull request removes 1166 tests.
♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Glad we caught this race condition and good logging improvements 📈
"Keepalive timer passed, but current runtime status is %s. Aborting keepalive stop.", | ||
self.status, | ||
) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch ✨
@@ -274,6 +275,14 @@ def keepalive_passed(self) -> None: | |||
self.id, | |||
self.function_version.qualified_arn, | |||
) | |||
with self.status_lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs: Would it make sense to add some explanation about the motivation why this is needed to guard against the race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, added!
Motivation
We recently discovered that a race condition is possible, killing in-progress invocations of lambda functions if the environment is scheduled for removal due to inactivity at the same time.
This will lead to different errors (depending on the phase of the execution), and has to be fixed.
Also, the error messages we return to the caller in case of internal errors during invocations do not convey the actual error - making it harder to trace the error, especially for internal calls.
Here we fix this, to clearly communicate the error to the caller.
Old error message:
New error message (containing request id and function name + qualifier):
The error logging in various places has also been improved to log the exception type as well as the message.
Changes