Skip to content

improve X-Amz-Trace-Id for APIGW NG #11327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 8, 2024
Merged

improve X-Amz-Trace-Id for APIGW NG #11327

merged 2 commits into from
Aug 8, 2024

Conversation

bentsku
Copy link
Contributor

@bentsku bentsku commented Aug 7, 2024

Motivation

As talked about in #11253 (comment), we could use the same trace-id throughout the invocation, this was left as a TODO. I have a bit of time, so I implemented it quickly.

It seems the Lambda integration is overwriting the trace header for now, so it is hard to test. It might be done with a regular AWS integration, to SQS for example. We can work on this as a follow-up.

The implementation is pretty simple for now, I've written an AWS test to check how AWS populate the trace-id in a naive way.

Changes

  • implement a trace-id field to the context so that it is accessible throughout the invocation
  • add a util to generate a trace and parse a trace and some unit tests around it
  • adding the newly generated trace-id field where we would generate a random id before
  • add the logic to populate the missing part of an incoming trace if there is one, or generate a full one

@bentsku bentsku added aws:apigateway Amazon API Gateway semver: patch Non-breaking changes which can be included in patch releases labels Aug 7, 2024
@bentsku bentsku self-assigned this Aug 7, 2024
@bentsku bentsku requested a review from cloutierMat as a code owner August 7, 2024 13:54
@bentsku bentsku changed the title improve X-Amz-Trace-Id to APIGW NG improve X-Amz-Trace-Id for APIGW NG Aug 7, 2024
@peter-smith-phd
Copy link
Contributor

Thanks for implementing this so quickly! For my use case, it would be super helpful to pass the caller's X-Amzn-Trace-Id, if one is provided. This allows me to have part of my application running in LocalStack, while keeping the trace ID consistent with the rest of my application that has already generated the ID.

Copy link

github-actions bot commented Aug 7, 2024

LocalStack Community integration with Pro

  2 files  ±    0    2 suites  ±0   24m 35s ⏱️ - 1h 10m 42s
780 tests  - 2 529  727 ✅  - 2 188  53 💤  - 341  0 ❌ ±0 
782 runs   - 2 529  727 ✅  - 2 188  55 💤  - 341  0 ❌ ±0 

Results for commit 735d404. ± Comparison against base commit cc6b849.

This pull request removes 2530 and adds 1 tests. Note that renamed tests count towards both.
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_lambda_dynamodb
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_opensearch_crud
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_search_books
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_setup
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_destination_sns
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_infra
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_prefill_dynamodb_table
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input0-SUCCEEDED]
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input1-SUCCEEDED]
…
tests.aws.services.apigateway.test_apigateway_common.TestApiGatewayCommon ‑ test_invocation_trace_id
This pull request removes 342 skipped tests and adds 1 skipped test. Note that renamed tests count towards both.
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input4-FAILED]
tests.aws.scenario.mythical_mysfits.test_mythical_misfits.TestMythicalMisfitsScenario ‑ test_deployed_infra_state
tests.aws.scenario.mythical_mysfits.test_mythical_misfits.TestMythicalMisfitsScenario ‑ test_populate_data
tests.aws.scenario.mythical_mysfits.test_mythical_misfits.TestMythicalMisfitsScenario ‑ test_user_clicks_are_stored
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_api_exceptions
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_create_exceptions
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_create_invalid_desiredstate
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_double_create_with_client_token
tests.aws.services.cloudcontrol.test_cloudcontrol_api.TestCloudControlResourceApi ‑ test_lifecycle
…
tests.aws.services.apigateway.test_apigateway_common.TestApiGatewayCommon ‑ test_invocation_trace_id

♻️ This comment has been updated with latest results.

@bentsku
Copy link
Contributor Author

bentsku commented Aug 7, 2024

@northvankiwiguy thanks a lot, I'll update the PR with a test and this functionality. Thanks for the quick feedback!

Copy link
Contributor

@cloutierMat cloutierMat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this!

Nice and clean implementation! Another proof of the strength of the NG provider!

I just left a question regarding the location of the helper methods!

Comment on lines +120 to +141
def generate_trace_id():
"""https://docs.aws.amazon.com/xray/latest/devguide/xray-api-sendingdata.html#xray-api-traceids"""
original_request_epoch = int(time.time())
timestamp_hex = hex(original_request_epoch)[2:]
version_number = "1"
unique_id = token_hex(12)
return f"{version_number}-{timestamp_hex}-{unique_id}"


def generate_trace_parent():
return token_hex(8)


def parse_trace_id(trace_id: str) -> dict[str, str]:
split_trace = trace_id.split(";")
trace_values = {}
for trace_part in split_trace:
key_value = trace_part.split("=")
if len(key_value) == 2:
trace_values[key_value[0].capitalize()] = key_value[1]

return trace_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do you think we should put those method in a shared utils for more services to have access to? I feel like they will mostly be the same for everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! Not sure where it should live for now 🤔 we might keep it here for now, and if we hear about more x-ray traces, move it somewhere else?

I believe lambda might have a similar implementation..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Lambda is only using long_uid(), unless I missed it.

I was thinking that it could make sense in localstack-core/utils maybe create a trace.py but not super convinced either.

I think it is fine with it here for now, but we should stay alert to prevent this behavior to be implemented many times in the code base!

@bentsku bentsku merged commit 2bef1bd into master Aug 8, 2024
36 checks passed
@bentsku bentsku deleted the apigw-ng-trace-id branch August 8, 2024 12:03
@peter-smith-phd
Copy link
Contributor

Thanks for the bug fix! I'm glad you handle both Active and Passive cases.

However... right now I'm not able to see the HTTP header being passed to Lambda. I know that Lambda incorrectly overwrites the value, but I added LOG.debug("Headers in Lambda: %s", context.request.headers) in the Lambda service provider.py, and I get the following output (which doesn't include X-Amzn-Trace-Id).

There's a chance I'm looking in the wrong place, and I'll keep debugging it, but I thought I'd mention it in case you immediately knew the problem.

Accept-Encoding: identity
X-Amz-Invocation-Type: RequestResponse
User-Agent: Boto3/1.34.149 md/Botocore#1.34.149 md/awscrt#0.21.2 ua/2.0 os/macos#23.5.0 md/arch#x86_64 lang/python#3.12.4 md/pyimpl#CPython cfg/retry-mode#legacy Botocore/1.34.149
x-localstack-data: {"source_arn":"arn:aws:execute-api:us-west-2:000000000000:kvlt04h4ft/prod/GET/orders","service_principal":"apigateway"}
host: localhost:4566
X-Amz-Date: 20240808T142740Z
Authorization: AWS4-HMAC-SHA256 Credential=__internal_call__/20240808/us-west-2/lambda/aws4_request, SignedHeaders=host;x-amz-date;x-amz-invocation-type;x-localstack-data, Signature=136d245635dbc9dae16f42734841dae0bcefe4c0377c640123e4eaccd2829df5
amz-sdk-invocation-id: 4fccd077-753a-4c10-a106-af1cebb4d06f
amz-sdk-request: attempt=1
Content-Length: 1680
x-moto-account-id: 000000000000

I have your latest commit in my workspace: improve X-Amz-Trace-Id for APIGW NG (#11327)

@bentsku
Copy link
Contributor Author

bentsku commented Aug 8, 2024

Yes, it is currently not working for AWS_PROXY integration type. We are properly passing it in the event object passed to Lambda, but I'm not sure we have a proper way to pass it through our custom client for now (the connect_to client we use for internal requests between services). I'll see with the owner of this particular piece of code, but I'm not sure it will be available soon, sorry 😕 I'll come back to you with more info!

@peter-smith-phd
Copy link
Contributor

Thanks, I forgot I was using Proxy Integration. I double-checked and the header value is correctly being passed into the event object, and my application (using Fastify) is correctly seeing the header (in request.headers['x-amzn-trace-id']).

However, the Lambda functions _X_AMZN_TRACE_ID is being set incorrectly, but I realize this is because the Lambda service is not correctly respecting the existing header, and it goes ahead and generates a new Trace ID. Is this the problem you mentioned you were looking into, or should I open a separate issue for Lambda not supporting Active Tracing?

Again... thanks for adding the API Gateway feature so quickly!

@bentsku
Copy link
Contributor Author

bentsku commented Aug 9, 2024

It is somewhat similar, yes! There are 2 issues:

  • our call to Lambda in the case of AWS_PROXY is done via an internal boto3 call, where we don't pass the x-amzn-trace-id header like you showed with your log statement, so even if Lambda was not overriding the trace-id, it would not get it
  • but I did not focus too much on the point above during implementation because Lambda does not set _X_AMZN_TRACE_ID as you said, and it overrides anything that is passed.

The point I was talking about above was the point 1, but we need to address point 2 first. It is still good to think about how to address passing the trace id along!

@peter-smith-phd
Copy link
Contributor

Thank you, that makes sense. For Point 2 (Lambda overriding the passed-in trace ID), will I need to open a new issue to get that fixed?

@bentsku
Copy link
Contributor Author

bentsku commented Aug 9, 2024

Yes, I think that would be good!
edit: for Point 2, the opened issue is #11336

incoming_parent = incoming_trace.get("Parent")
parent = incoming_parent or generate_trace_parent()
sampled = incoming_trace.get("Sampled", "1" if incoming_parent else "0")
# TODO: lineage? not sure what it related to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lineage is related to detecting recursive Lambda loops (July 2023): https://aws.amazon.com/blogs/compute/detecting-and-stopping-recursive-loops-in-aws-lambda-functions/

They extended support for S3, SQS, and SNS in Aug 2024: https://aws.amazon.com/blogs/compute/aws-lambda-introduces-recursive-loop-detection-apis/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:apigateway Amazon API Gateway semver: patch Non-breaking changes which can be included in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants