-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
fix put-metric-alarm test failure rate #12598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LocalStack Community integration with Pro 2 files ± 0 2 suites ±0 43m 16s ⏱️ - 1h 0m 22s Results for commit 9c95b14. ± Comparison against base commit a7b4250. This pull request removes 3386 tests.
♻️ This comment has been updated with latest results. |
@@ -403,7 +403,7 @@ | |||
"StateUpdatedTimestamp": "timestamp", | |||
"StateValue": "INSUFFICIENT_DATA", | |||
"Statistic": "Average", | |||
"Threshold": 2.0, | |||
"Threshold": 21.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good adjustment of threshold to be equal to one of the data points! 👍 Indeed since metric data is added sequentially in a loop and alarm evaluation happens in a separate thread it is likely that in those 14% of cases evaluation happened after 21.0 was added but before 22.0. Since 21 was already bigger than 2 alarm was triggered. Now it shouldn't be the case - alarm needs both data points to be triggered.
@@ -1338,8 +1338,8 @@ def test_put_metric_alarm( | |||
retry( | |||
_sqs_messages_snapshot, | |||
retries=60, | |||
sleep=3 if is_aws_cloud() else 1, | |||
sleep_before=5 if is_aws_cloud() else 0, | |||
sleep=3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here retries are applied to receiving messages from SQS. Increasing sleep won't change the resulting alarm. Currently alarm has period of 30 seconds, 60 retries wit sleep 1 covers it more than enough. Locally test succeeds exactly after 30 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, this made me think that we can reduce period to a minimum value of 10 and therefore shave 20 seconds off test execution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed changes to the branch - reverted sleeps and reduced period to 10. Test now consistently executes successfully locally in just over 10 seconds. If you're happy with those changes please merge the PR! Otherwise happy to revert if there is any consideration.
The changing of the period is a better change than the edition of retry. Thanks @tiurin. 👍 |
Motivation
As discussed internally, the test_put_metric_alarm test currently has a failure rate of around 14%, which makes it frustrating during full test suite runs. The root cause appears to be that the alarm-triggering value can vary in LocalStack—sometimes evaluating to 21 instead of the expected average 21.5 (which is consistently correct on AWS). This is likely due to timing differences or slight inconsistencies in how LocalStack handles metric ingestion and evaluation windows.
Changes
Testing