Skip to content

Conversation

gregfurman
Copy link
Contributor

@gregfurman gregfurman commented Feb 28, 2025

Motivation

This fixes an issue that causes our SQS poller to always trigger exponential backoff between poller calls instead of only when a response is empty or exception is raised. Currently, this bug is slowing down performance and should be merged ASAP.

Why did this happen?

GitHub's automatic rebase (which did not raise a merge conflict) caused our if-else conditional to change into a for-else. Since the else block was where we raised a backoff exception, this has caused our poller to always raise an exception and signals the ESM worker to constantly back-off.

See:

And:

Changes

  • Moves the backoff exception from always being raised to instead be within the conditional check for no records being found.

@gregfurman gregfurman added semver: patch Non-breaking changes which can be included in patch releases aws:lambda:event-source-mapping AWS Lambda Event Source Mapping (ESM) labels Feb 28, 2025
@gregfurman gregfurman self-assigned this Feb 28, 2025
@gregfurman gregfurman added this to the 4.3 milestone Feb 28, 2025
Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for jumping on this one quickly @gregfurman

It's unfortunate that GitHub didn't raise a conflict here. Next time, we can try to anticipate potentially conflicting changes (in this case the messages logic changes significantly in #12002, which required a manual rebase and logic adjustment).

This fix addresses a performance issue affecting all Lambda ESM SQS users: Users might experience slow SQS polling with Lambda ESM because we are adding backoff (up to 10 seconds) after every polling iteration (in addition to the 1s inter-poll waiting time).

Copy link

github-actions bot commented Feb 28, 2025

LocalStack Community integration with Pro

    2 files  ±  0      2 suites  ±0   1h 29m 12s ⏱️ - 22m 46s
3 110 tests  - 994  2 890 ✅  - 882  220 💤  - 112  0 ❌ ±0 
3 112 runs   - 994  2 890 ✅  - 882  222 💤  - 112  0 ❌ ±0 

Results for commit 55bb047. ± Comparison against base commit 4285fb1.

This pull request removes 994 tests.
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_lambda_dynamodb
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_opensearch_crud
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_search_books
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_setup
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_destination_sns
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_infra
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_prefill_dynamodb_table
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input0-SUCCEEDED]
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input1-SUCCEEDED]
…

♻️ This comment has been updated with latest results.

@gregfurman
Copy link
Contributor Author

gregfurman commented Mar 3, 2025

@joe4dev There is a flaky test (both against AWS and LocalStack) that can be explained by this line in the docs:

If you're using a batch window and your SQS queue contains very low traffic, Lambda might wait for up to 20 seconds before invoking your function. This is true even if you set a batch window lower than 20 seconds.

# TODO: flaky against AWS
@markers.aws.validated
def test_report_batch_item_failures(

Basically, we cannot assume that an ESM will always invoke a target Lambda with a very precise timeframe -- where AWS (and now LocalStack) dynamically adjusts this frequency based on traffic to the data source.

I think the behavioural changes should remain, but we should keep an eye out for tests that assume invocations happen at precise intervals.

@gregfurman gregfurman force-pushed the fix/esm/sqs-slow-polling branch from 352915b to 55bb047 Compare March 3, 2025 11:54
@gregfurman gregfurman marked this pull request as ready for review March 3, 2025 13:08
@gregfurman gregfurman merged commit 288a34a into master Mar 3, 2025
33 checks passed
@gregfurman gregfurman deleted the fix/esm/sqs-slow-polling branch March 3, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:lambda:event-source-mapping AWS Lambda Event Source Mapping (ESM) semver: patch Non-breaking changes which can be included in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants