-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
X-Ray trace header parsing and Lambda tracing propagation #11708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
S3 Image Test Results (AMD64 / ARM64) 2 files 2 suites 3m 34s ⏱️ Results for commit 006b34b. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to hear your feedback about X-Ray trace propagation (presented in today's demo) :)
@@ -45,6 +45,7 @@ def __init__(self, service_manager: ServiceManager = None) -> None: | |||
handlers.add_region_from_header, | |||
handlers.rewrite_region, | |||
handlers.add_account_id, | |||
handlers.parse_trace_context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good! At first, I thought it'd be nicer to have it before the edge routes, because we could also make use of this in API Gateway execute-api
routes for example. However, we do not pass the RequestContext
to the edge routes at the moment. (@thrau laments this design decision still today 😄) so there is no point in making it happen early for now. We can always move it later on if anything changes! But for now, this looks to me 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Feel free to change later :)
@@ -57,6 +58,11 @@ class SQSInvocation: | |||
exception_retries: int = 0 | |||
|
|||
def encode(self) -> str: | |||
# Encode TraceHeader | |||
aws_trace_header_str = TraceHeader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dfangl not super happy about this encoding/decoding hack. Feel free to share your suggestion.
This doesn't go well if aws_trace_header
is not of type TraceHeader
d35c6d4
to
8fbd77c
Compare
d68d520
to
5a9693f
Compare
This fixes the warning "Subsegment ## handler discarded due to Lambda worker still initializing" Fixes the regression for the ext test: `tests.aws.services.lambda_.test_lambda_xray.TestLambdaXrayIntegration.test_basic_xray_integration`
0a44ee9
to
006b34b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, this looks great! 🚀 I've only reviewed the parts that are not related to lambda, as I'm lacking a bit of context and knowledge there.
The handler looks great, I only have a few nits in the code taken from the AWS Python X-Ray SDK, as I think we could simplify it a bit by using new available Python standard lib utils and f-strings. Great work! 🚀
@@ -45,6 +45,7 @@ def __init__(self, service_manager: ServiceManager = None) -> None: | |||
handlers.add_region_from_header, | |||
handlers.rewrite_region, | |||
handlers.add_account_id, | |||
handlers.parse_trace_context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good! At first, I thought it'd be nicer to have it before the edge routes, because we could also make use of this in API Gateway execute-api
routes for example. However, we do not pass the RequestContext
to the edge routes at the moment. (@thrau laments this design decision still today 😄) so there is no point in making it happen early for now. We can always move it later on if anything changes! But for now, this looks to me 👌
Generate a random 16-digit hex str. | ||
This is used for generating segment/subsegment id. | ||
""" | ||
return binascii.b2a_hex(os.urandom(8)).decode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is taken from AWS code, but I think we could use some Python utils to make it a bit nicer.
from secrets import token_hex
return binascii.b2a_hex(os.urandom(8)).decode("utf-8") | |
return token_hex(8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice suggestions ✨ This looks much cleaner 🤩
I fully agree on a technical level. For licensing and maintenance reasons, I opted to minimize changes to the original AWS code here (it is easier to match the authoritative AWS implementation regarding header parsing). Do you think it's worth deviating here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose not, we could also keep the code as it is, if it makes it easier to port new changes as they come (not sure how often this happens?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also crying every time I see %s%s%s%s%s
😭 . Nevertheless, I tend to prefer keeping the original for full parity (it's easier to relate to the AWS repo). Yeah, I don't expect direct changes here, maybe indirect ones for other X-Ray parts from the same source repo.
Generate a random trace id. | ||
""" | ||
self.start_time = int(time.time()) | ||
self.__number = binascii.b2a_hex(os.urandom(12)).decode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not sure about the double __
here, should we just use a single underscore?
And we can also use token_hex
instead of binascii
and os.random
return "%s%s%s%s%s" % ( | ||
TraceId.VERSION, | ||
TraceId.DELIMITER, | ||
format(self.start_time, "x"), | ||
TraceId.DELIMITER, | ||
self.__number, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use f-strings here, so we could even avoid the call to format
:
return "%s%s%s%s%s" % ( | |
TraceId.VERSION, | |
TraceId.DELIMITER, | |
format(self.start_time, "x"), | |
TraceId.DELIMITER, | |
self.__number, | |
) | |
return f"{TraceId.VERSION}{TraceId.DELIMITER}{self.start_time:x}{TraceId.DELIMITER}{self._number}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, most points have already been discussed and the current state seems very well documented and thought through. There's a few things we still need to tackle in the future like the sampling strategies but I actually quite like that we leave that out for now and can delegate this in the future.
Great work on re-using existing components. It makes a lot of sense to stay close to the official tooling here to make sure our behavior isn't diverging from AWS.
In terms of adjusting the internalized code from the xray lib, I'd generally prefer to stay closer to the original code if possible but as long as it doesn't impede porting new changes too much, it's of course also ok to modernize it a bit.
) | ||
# The Lambda RIE requires a full tracing header including Root, Parent, and Samples. Otherwise, tracing fails | ||
# with the warning "Subsegment ## handler discarded due to Lambda worker still initializing" | ||
aws_trace_header.ensure_sampled_exists() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if I understand that right, for lambda it's always Sampled=1
while other services will by default get a Sampled=0
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, the sampled field is missing altogether (i.e., not present rather than sampled=0) for other services if no incoming tracing header is present with the sampled field.
That's not ideal, but this depends on the (service-dependent) sampling strategies and other services. We're lacking parity tests here, so idk what's the best behavior. Just leaving it open for now ...
# TODO: replace this random parent id with actual parent segment created within the Lambda provider using X-Ray | ||
aws_trace_header.ensure_parent_exists() | ||
# TODO: test and implement Active and PassThrough tracing and sampling decisions. | ||
# TODO: implement Lambda lineage: https://docs.aws.amazon.com/lambda/latest/dg/invocation-recursion.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for leaving the TODOs here. Those seem like really interesting tasks for engineers getting started at LocalStack 😉
:param payload: Invocation payload | ||
:return: The invocation result | ||
""" | ||
# NOTE: consider making the trace_context mandatory once we update all usages (should be easier after v4.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, wanted to point out the same here 🙌
root_trace_id = "1-3152b799-8954dae64eda91bc9a23a7e8" | ||
xray_trace_header = TraceHeader(root=root_trace_id, parent="7fa8c0f79203be72", sampled=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this present any issues when testing repeatedly against AWS with the same trace id / parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, works fine.
AWS does not perform any magic validation trying to re-construct the implicit start timestamp encoded here :)
PS: I fixed the repeated AWS validation issues in other X-Ray tests in this ext PR (awaiting review, low prio): https://github.com/localstack/localstack-ext/pull/3596
Motivation
We do not parse incoming X-Ray trace headers. Sample trace header (with Lambda lineage):
Changes
trace_context
field to the LocalStackRequestContext
trace_context
through the Lambda service (including encoding and decoding hacks for async invokes)test_xray_trace_propagation
TraceHeader
helper class implementation based on the open source AWS Python X-Ray SDK (Apache2.0 license-attributed): https://github.com/aws/aws-xray-sdk-python/blob/master/aws_xray_sdk/core/models/trace_header.pyTraceId
helper class implementation based on the open source AWS Python X-Ray SDK (Apache2.0 license-attributed): https://github.com/aws/aws-xray-sdk-python/blob/master/aws_xray_sdk/core/models/traceid.pyTesting
Usage
Use
context.trace_context
within any provider to access the parsed X-Ray trace header. Check out theTraceHeader
class for more details.Create new TraceHeader: