Skip to content

[ESM] Fix constantly triggering SQS back-off #12319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 3, 2025

Conversation

gregfurman
Copy link
Contributor

@gregfurman gregfurman commented Feb 28, 2025

Motivation

This fixes an issue that causes our SQS poller to always trigger exponential backoff between poller calls instead of only when a response is empty or exception is raised. Currently, this bug is slowing down performance and should be merged ASAP.

Why did this happen?

GitHub's automatic rebase (which did not raise a merge conflict) caused our if-else conditional to change into a for-else. Since the else block was where we raised a backoff exception, this has caused our poller to always raise an exception and signals the ESM worker to constantly back-off.

See:

And:

Changes

  • Moves the backoff exception from always being raised to instead be within the conditional check for no records being found.

@gregfurman gregfurman added semver: patch Non-breaking changes which can be included in patch releases aws:lambda:event-source-mapping AWS Lambda Event Source Mapping (ESM) labels Feb 28, 2025
@gregfurman gregfurman self-assigned this Feb 28, 2025
@gregfurman gregfurman added this to the 4.3 milestone Feb 28, 2025
Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for jumping on this one quickly @gregfurman

It's unfortunate that GitHub didn't raise a conflict here. Next time, we can try to anticipate potentially conflicting changes (in this case the messages logic changes significantly in #12002, which required a manual rebase and logic adjustment).

This fix addresses a performance issue affecting all Lambda ESM SQS users: Users might experience slow SQS polling with Lambda ESM because we are adding backoff (up to 10 seconds) after every polling iteration (in addition to the 1s inter-poll waiting time).

Copy link

github-actions bot commented Feb 28, 2025

LocalStack Community integration with Pro

    2 files  ±  0      2 suites  ±0   1h 29m 12s ⏱️ - 22m 46s
3 110 tests  - 994  2 890 ✅  - 882  220 💤  - 112  0 ❌ ±0 
3 112 runs   - 994  2 890 ✅  - 882  222 💤  - 112  0 ❌ ±0 

Results for commit 55bb047. ± Comparison against base commit 4285fb1.

This pull request removes 994 tests.
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_lambda_dynamodb
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_opensearch_crud
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_search_books
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_setup
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_destination_sns
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_infra
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_prefill_dynamodb_table
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input0-SUCCEEDED]
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input1-SUCCEEDED]
…

♻️ This comment has been updated with latest results.

@gregfurman
Copy link
Contributor Author

gregfurman commented Mar 3, 2025

@joe4dev There is a flaky test (both against AWS and LocalStack) that can be explained by this line in the docs:

If you're using a batch window and your SQS queue contains very low traffic, Lambda might wait for up to 20 seconds before invoking your function. This is true even if you set a batch window lower than 20 seconds.

# TODO: flaky against AWS
@markers.aws.validated
def test_report_batch_item_failures(

Basically, we cannot assume that an ESM will always invoke a target Lambda with a very precise timeframe -- where AWS (and now LocalStack) dynamically adjusts this frequency based on traffic to the data source.

I think the behavioural changes should remain, but we should keep an eye out for tests that assume invocations happen at precise intervals.

@gregfurman gregfurman force-pushed the fix/esm/sqs-slow-polling branch from 352915b to 55bb047 Compare March 3, 2025 11:54
@gregfurman gregfurman marked this pull request as ready for review March 3, 2025 13:08
@gregfurman gregfurman merged commit 288a34a into master Mar 3, 2025
33 checks passed
@gregfurman gregfurman deleted the fix/esm/sqs-slow-polling branch March 3, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:lambda:event-source-mapping AWS Lambda Event Source Mapping (ESM) semver: patch Non-breaking changes which can be included in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants