-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
LDM as provisioner of debug-enabled containers #12851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
90030a5
to
6bc6c40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring to centralize the scattered LDM changes and enable multi-invoke debugging with the same debug connection 👏👏 .
I tested these changes with the base scenario of the adjusted pro sample in and they work as expected localstack-samples/localstack-pro-samples#258
I raised a couple of clarification questions, most seem already addressed/answered.
@@ -79,10 +75,7 @@ def get_environment( | |||
|
|||
try: | |||
yield execution_environment | |||
if is_lambda_debug_timeout_enabled_for(lambda_arn=function_version.qualified_arn): | |||
self.stop_environment(execution_environment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Nice simplification and quality of life improvement to not stop the LDM-enabled container anymore 👍
@@ -139,7 +135,7 @@ def get_environment_variables(self) -> Dict[str, str]: | |||
# AWS_LAMBDA_DOTNET_PREJIT | |||
"TZ": ":UTC", | |||
# 2) Public AWS RIE interface: https://github.com/aws/aws-lambda-runtime-interface-emulator | |||
"AWS_LAMBDA_FUNCTION_TIMEOUT": self._get_execution_timeout_seconds(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Is a debug-enabled Lambda function killed pre-maturely by the Golang code if we don't extend this timeout anymore?
Background: AWS_LAMBDA_FUNCTION_TIMEOUT
is used by the Golang Lambda RIE here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that LDM overwrites this later here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, like you pointed out, this value continues to be refined by the LDM through the debug-enabled execution environment
@@ -191,6 +192,28 @@ def invoke(self, *, invocation: Invocation) -> InvocationResult: | |||
LOG.warning(message) | |||
raise ServiceException(message) | |||
|
|||
if debug_execution_environment := LDM.get_execution_environment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs(nit): It might be worth adding some docstring here, clarifying that debug-enabled execution environments don't consider Lambda quotas
|
||
LOG = logging.getLogger(__name__) | ||
|
||
# Specifies the fault timeout value in seconds to be used by time restricted workflows when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo (nit): fault -> default
with self._mutex: | ||
if self._debug_execution_environment is not None: | ||
return | ||
self.stop_debug_enabled_execution_environment() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why are we stopping the environment here? Can this happen given the return
above if there is an existing environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, yes this is indeed redundant, I removed this statement
lambda_function_debug_config=self.lambda_function_debug_config, | ||
on_timeout=self._on_execution_environment_timeout, | ||
) | ||
LOG.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Shouldn't that info log happen after starting the environment? Like status == RuntimeStatus.READY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle yes, I would like this Log to occur after the container is READY. However, due to the debug layer, the docker container will not notify LS until the user has connected the debug client. I think this is something we should refine soon: I added a todo.
@@ -66,6 +66,7 @@ class Invocation: | |||
# = invocation_id | |||
request_id: str | |||
trace_context: dict | |||
user_agent: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: The Invocation
is used in the event manager for async invokes. The optional new field should be ok, given it initializes with a default
@@ -0,0 +1,178 @@ | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: refactoring where code is moved from https://github.com/localstack/localstack/pull/12851/files#diff-2266754b538bc83024fa9e051519928adf58f50124f471c186512e25731e8736L1
def get_execution_environment(self) -> DebugEnabledExecutionEnvironment: | ||
# TODO: add support for concurrent invokes, such as invoke object queuing, new container spinup | ||
with self._mutex: | ||
# TODO: move this start-up logic to lambda function creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (regression): Does this limitation introduce a regression such that LDM is not applied to Lambda functions created or updated AFTER adding an LDM configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing showed that this is not the case.
Brainstorming the list of change points potentially to consider (in the future):
- CreateFunction: Marco pointed out that function creation is handled here
- UpdateFunction: doesn't change ARN, should be fine
- UpdateFunctionConfiguration: doesn't change ARN, should be fine
- PublishVersion: creates a new function version, which might require spawning a debug-enabled container if the new function version ARN matches a debug config
- CreateAlias: creates a new function ARN via the alias
Name
, which could match a debug config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this list, it's very helpful. I'll refer to this to refine the revision of this logic we already implemented in follow-up changes to these.
def _on_execution_environment_timeout( | ||
self, version_manager_id: str, environment_id: str | ||
) -> None: | ||
# TODO: add support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: What does "add support" include (e.g., raising aws-validated timeout error)?
Does this happen if DEFAULT_LAMBDA_DEBUG_MODE_TIMEOUT_SECONDS
(i.e., 1h) is exceeded or enforce-timeout
is disabled and then running into a short function timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality should not be applicable for debug-enabled containers. It is used in the release of on-demand containers, however these are provisioned-concurrency
. I revised the comment to be more informative on the situation.
Motivation
The current LDM implementation depends heavily on the Lambda service’s container provisioning logic, injecting LDM-specific parameter overrides throughout various parts of the service to support debug-enabled containers. This fragmented integration makes LDM difficult to maintain and limits control over the lifecycle of debug-enabled containers. This refactor centralizes LDM responsibilities by transforming it into a dedicated provisioner for debug-enabled containers. As a result, the Lambda service is relieved of most LDM-related logic. Additionally, these changes ensure that debug-enabled containers persist across multiple invocations, allowing the debug client to remain connected or reconnect after disconnections.
Changes