Record validated tests duration #12638

tiurin · 2025-05-19T08:40:39Z

Motivation

Have more insights into how long does it take to run a parity test against AWS. Have more insights for each test execution phase.

Changes

Add a pytest hook that collects duration of each test execution phase - setup, call, teardown and writes the result into *.validation.json, along with already existing last validation date.
Remove the old hook and some unused code.
Re-validate one existing test to demonstrate the example. It is especially interesting to see how teardown can be almost half of total duration in some cases. This is the case when one sees test passed in PyCharm but the wheels are still spinning for a while:

Considerations

Machines on which tests are executed can vary significantly in processing power. However, in case of AWS validated tests significant amount of time is spent in I/O waiting for AWS endpoint responses. That can also be affected by geographic location. A quick test on a small sample showed under 2% difference on the test in this PR on 3 different machines between Spain and South Africa (all tests were executed against us-east-1 region). Repetitive runs in different times of day on one machine were also similarly stable.

In any case, durations are meant to give an idea about how long the test runs rather than exact numbers.

tests/aws/services/lambda_/event_source_mapping/test_cfn_resource.validation.json

anisaoshafi

Looks good to me, tried it out and works like a charm ✨
Thanks for tackling this Misha. 👏🏼

github-actions · 2025-05-19T10:46:11Z

LocalStack Community integration with Pro

2 files ±0 2 suites ±0 1h 44m 45s ⏱️ -39s
4 468 tests ±0 4 080 ✅ ±0 388 💤 ±0 0 ❌ ±0
4 470 runs ±0 4 080 ✅ ±0 390 💤 ±0 0 ❌ ±0

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

joe4dev

Love it ❤️ Thank you @tiurin for the making this happen 🚀

I can confirm that time reporting works as expected, both in localstack and localstack-ext. I also tested that failing tests or non-AWS test executions are ignored.

joe4dev · 2025-05-19T14:53:49Z

localstack-core/localstack/testing/pytest/validation_tracking.py

-    if not is_aws_cloud() or outcome.excinfo:
-        return
+        # For json.dump sorted test entries enable consistent diffs.
+        # But test execution data is more readable in insert order for each step (setup, call, teardown).


praise: neat attention to detail ✨

localstack-core/localstack/testing/pytest/validation_tracking.py

github-actions · 2025-05-23T11:02:20Z

Test Results - Preflight, Unit

21 579 tests ±0 19 927 ✅ ±0 6m 16s ⏱️ ±0s
1 suites ±0 1 652 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

github-actions · 2025-05-23T11:11:05Z

Test Results (amd64) - Acceptance

7 tests ±0 5 ✅ ±0 3m 5s ⏱️ -2s
1 suites ±0 2 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

github-actions · 2025-05-23T11:15:24Z

Test Results - Alternative Providers

597 tests ±0 420 ✅ ±0 14m 56s ⏱️ +4s
4 suites ±0 177 💤 ±0
4 files ±0 0 ❌ ±0

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

github-actions · 2025-05-23T12:00:49Z

Test Results (amd64) - Integration, Bootstrap

5 files ±0 5 suites ±0 2h 22m 38s ⏱️ -12s
4 823 tests ±0 4 282 ✅ ±0 541 💤 ±0 0 ❌ ±0
4 829 runs ±0 4 282 ✅ ±0 547 💤 ±0 0 ❌ ±0

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

dominikschubert

Gave it a quick glance and just had some minor suggestions.

Thanks for tackling this! One somewhat immediate improvement I could think of is having a pytest flag that would allow you to print statistics of these execution times or at least collect them for each known pytest node id (i.e. for each test). We could use this to track how far along we are in collecting execution data over the whole test suite as well. 🤔

Just some thoughts though, nothing immediately actionable besides the import comment 👍

dominikschubert · 2025-05-27T09:49:14Z

docs/testing/parity-testing/README.md

+
+When a test runs successfully against AWS, its last validation date and duration are recorded in a corresponding ***.validation.json** file.
+The validation date is recorded precisely, while test durations can vary between runs.
+For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures.


For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures.

Can't we track that anyway and capture it? That way we would avoid flipping between potentially minutes of setup time and only a few microseconds otherwise 🤔

One thing I thought about was to get test collection name via item.session.config.args, e.g. ['aws/services/lambda_/event_source_mapping']. It can be used as a part of key, or a unique property, so that test duration is updated if the test has been validated within the same test collection.
However, such usage means we should record durations for each new test collection, which is confusing. Or, record only for a predefined collections, e.g. only for individual runs, or only for class runs. Which is also confusing and can be opaque ("why my durations haven't been updated?"). Also, test ordering might come into play for collections. Plus, need to sanitize args as they may contain full path and reveal local setup details. Quite hard to factor in due to many unknown details and their unknown impact.

I'd bet now on simplicity, see if durations actually flip a lot (somewhat good sign, means tests are re-validated, hehe) and learn how we can adapt a format if needed.

localstack-core/localstack/testing/pytest/validation_tracking.py

validation entry is generated with durations TODO - don't sort keys in json.dumps - have phases ordered from first to last tests should still be ordered by name (nodeid) - format floats - run on existing AWS tests - remove dummy test

Alphabetic sorting between test as before but insert order for data inside.

Better readability.

setup and teardown are always successful

Writing file 3 different times as before could lead to inconsistencies if test failed or was interrupted. Only write validation data once and when sure the test has passed. Use test item's stash mechanism to store data between phases.

This reverts commit e2ced0a.

Another wrapper for pytest_runtest_makereport hook is defined in localstack-snapshot using "old-style" hookwrapper. It is not recommended to mix new and old-style wrappers in the same plugin, see a note here: https://pluggy.readthedocs.io/en/latest/index.html#wrappers

- Add time unit to key name - Move total to the same object as phases

Co-authored-by: Dominik Schubert <dominik.schubert91@gmail.com>

tiurin

One somewhat immediate improvement I could think of is having a pytest flag that would allow you to print statistics of these execution times or at least collect them for each known pytest node id (i.e. for each test). We could use this to track how far along we are in collecting execution data over the whole test suite as well.

@dominikschubert I think this call for reporting rather than printing, maybe a dashboard? I'll think about it.

tiurin · 2025-05-28T16:54:17Z

docs/testing/parity-testing/README.md

+
+When a test runs successfully against AWS, its last validation date and duration are recorded in a corresponding ***.validation.json** file.
+The validation date is recorded precisely, while test durations can vary between runs.
+For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures.


One thing I thought about was to get test collection name via item.session.config.args, e.g. ['aws/services/lambda_/event_source_mapping']. It can be used as a part of key, or a unique property, so that test duration is updated if the test has been validated within the same test collection.
However, such usage means we should record durations for each new test collection, which is confusing. Or, record only for a predefined collections, e.g. only for individual runs, or only for class runs. Which is also confusing and can be opaque ("why my durations haven't been updated?"). Also, test ordering might come into play for collections. Plus, need to sanitize args as they may contain full path and reveal local setup details. Quite hard to factor in due to many unknown details and their unknown impact.

I'd bet now on simplicity, see if durations actually flip a lot (somewhat good sign, means tests are re-validated, hehe) and learn how we can adapt a format if needed.

tiurin requested review from simonrw and anisaoshafi May 19, 2025 08:40

tiurin requested review from joe4dev, dominikschubert, dfangl and gregfurman as code owners May 19, 2025 08:40

tiurin added semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases area: testing Testing Localstack labels May 19, 2025

bblommers reviewed May 19, 2025

View reviewed changes

tests/aws/services/lambda_/event_source_mapping/test_cfn_resource.validation.json Outdated Show resolved Hide resolved

anisaoshafi approved these changes May 19, 2025

View reviewed changes

joe4dev approved these changes May 19, 2025

View reviewed changes

tiurin force-pushed the tests/test-duration branch from 95dd620 to dd929d6 Compare May 23, 2025 10:49

tiurin requested review from thrau and HarshCasper as code owners May 23, 2025 10:49

tiurin force-pushed the tests/test-duration branch from dd929d6 to a1477d0 Compare May 23, 2025 10:51

tiurin assigned simonrw May 27, 2025

tiurin added the review: merge when ready Signals to the reviewer that a PR can be merged if accepted label May 27, 2025

dominikschubert approved these changes May 27, 2025

View reviewed changes

tiurin added 7 commits May 28, 2025 10:15

WIP add test duration

d740c10

validation entry is generated with durations TODO - don't sort keys in json.dumps - have phases ordered from first to last tests should still be ordered by name (nodeid) - format floats - run on existing AWS tests - remove dummy test

Truncate file before writing new data

197f45f

Add consistent order

9020ccd

Alphabetic sorting between test as before but insert order for data inside.

Remove unused code

1a9a5cf

Add is_aws and exception check back

ab67b97

Round durations to 2-digit precision

7ec5f65

Better readability.

Update validation timestamp only on successful test call

2483cdf

setup and teardown are always successful

tiurin added 13 commits May 28, 2025 10:15

Move is_aws and exception check to the beginning

7d858b7

Re-validate an existing snapshot test

8ce5d11

Delete dummy tests used for development

af49541

Remove logging after development

73b1aa4

Update file only on teardown phase

25de430

Writing file 3 different times as before could lead to inconsistencies if test failed or was interrupted. Only write validation data once and when sure the test has passed. Use test item's stash mechanism to store data between phases.

Revert "Delete dummy tests used for development"

d6d73aa

This reverts commit e2ced0a.

Get test outcome only on call phase

5bec439

Break down by execution phase

bab340d

Remove dummy development tests

c4f1150

Re-validate ESM CFN tests after hook refactoring

1628cb0

Reorganize durations object

56fb4b0

- Add time unit to key name - Move total to the same object as phases

Add note about duration recordings to parity testing readme

b6cb8a6

tiurin force-pushed the tests/test-duration branch from fe12552 to 6b207eb Compare May 28, 2025 08:15

Use public imports for TestReport and StashKey

49adf9d

Co-authored-by: Dominik Schubert <dominik.schubert91@gmail.com>

tiurin force-pushed the tests/test-duration branch from 6b207eb to 49adf9d Compare May 28, 2025 08:34

tiurin commented May 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Record validated tests duration #12638

Record validated tests duration #12638

tiurin commented May 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

anisaoshafi left a comment

Uh oh!

github-actions bot commented May 19, 2025 •

edited

Loading

Uh oh!

joe4dev left a comment

Uh oh!

joe4dev May 19, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented May 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 23, 2025 •

edited

Loading

Uh oh!

dominikschubert left a comment

Uh oh!

dominikschubert May 27, 2025

Uh oh!

tiurin May 28, 2025

Uh oh!

Uh oh!

tiurin left a comment •

edited

Loading

Uh oh!

tiurin May 28, 2025

Uh oh!

Uh oh!

Uh oh!

Record validated tests duration #12638

Are you sure you want to change the base?

Record validated tests duration #12638

Conversation

tiurin commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Considerations

Uh oh!

Uh oh!

anisaoshafi left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LocalStack Community integration with Pro

Uh oh!

joe4dev left a comment

Choose a reason for hiding this comment

Uh oh!

joe4dev May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results - Preflight, Unit

Uh oh!

github-actions bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results (amd64) - Acceptance

Uh oh!

github-actions bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results - Alternative Providers

Uh oh!

github-actions bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results (amd64) - Integration, Bootstrap

Uh oh!

dominikschubert left a comment

Choose a reason for hiding this comment

Uh oh!

dominikschubert May 27, 2025

Choose a reason for hiding this comment

Uh oh!

tiurin May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tiurin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tiurin May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tiurin commented May 19, 2025 •

edited

Loading

github-actions bot commented May 19, 2025 •

edited

Loading

github-actions bot commented May 23, 2025 •

edited

Loading

github-actions bot commented May 23, 2025 •

edited

Loading

github-actions bot commented May 23, 2025 •

edited

Loading

github-actions bot commented May 23, 2025 •

edited

Loading

tiurin left a comment •

edited

Loading