Skip to content

Step Functions: Increase Retry Attempts on Service Integrations for Resilience Against Transient Network Errors #12512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 11, 2025

Conversation

MEPalma
Copy link
Contributor

@MEPalma MEPalma commented Apr 10, 2025

Motivation

To temporarily reduce the impact of transient network instability in concurrent Lambda executions, this PR increases the total number of retry attempts for the boto client from 1 to 5. Due to occasional “connection refused” errors, the previous setting would immediately trigger retry workflows in the state machine. This could significantly increase the runtime of the program if large backoff rates or wait settings were used in case of failures #12399. This change does not affect the semantics of evaluation. Catch and Retry blocks still function as intended, but it makes the state machine slightly more resilient against non-service errors such as transient connection failures.

Changes

  • increase the total_max_attempts of service integrations

@MEPalma MEPalma added the semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases label Apr 10, 2025
@MEPalma MEPalma requested a review from dfangl April 10, 2025 13:54
@MEPalma MEPalma self-assigned this Apr 10, 2025
Copy link

LocalStack Community integration with Pro

    2 files  ±    0      2 suites  ±0   35m 3s ⏱️ - 1h 17m 47s
1 469 tests  - 2 881  1 396 ✅  - 2 587  73 💤  - 294  0 ❌ ±0 
1 471 runs   - 2 881  1 396 ✅  - 2 587  75 💤  - 294  0 ❌ ±0 

Results for commit dbfc57c. ± Comparison against base commit 073eab9.

This pull request removes 2881 tests.
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_lambda_dynamodb
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_opensearch_crud
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_search_books
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_setup
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_destination_sns
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_infra
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_prefill_dynamodb_table
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input0-SUCCEEDED]
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input1-SUCCEEDED]
…

Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a sensible approach to temporarily work around a known issue.

Any idea how we can debug this to discover the root cause? 🤔

@MEPalma MEPalma merged commit ab2d6a4 into master Apr 11, 2025
33 checks passed
@MEPalma MEPalma deleted the MEP-SFN-temp_increase_integration_max_attempts branch April 11, 2025 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants