Skip to content

Record validated tests duration #12638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

Record validated tests duration #12638

wants to merge 21 commits into from

Conversation

tiurin
Copy link
Contributor

@tiurin tiurin commented May 19, 2025

Motivation

Have more insights into how long does it take to run a parity test against AWS. Have more insights for each test execution phase.

Changes

  • Add a pytest hook that collects duration of each test execution phase - setup, call, teardown and writes the result into *.validation.json, along with already existing last validation date.
  • Remove the old hook and some unused code.
  • Re-validate one existing test to demonstrate the example. It is especially interesting to see how teardown can be almost half of total duration in some cases. This is the case when one sees test passed in PyCharm but the wheels are still spinning for a while:
    image

Considerations

Machines on which tests are executed can vary significantly in processing power. However, in case of AWS validated tests significant amount of time is spent in I/O waiting for AWS endpoint responses. That can also be affected by geographic location. A quick test on a small sample showed under 2% difference on the test in this PR on 3 different machines between Spain and South Africa (all tests were executed against us-east-1 region). Repetitive runs in different times of day on one machine were also similarly stable.

In any case, durations are meant to give an idea about how long the test runs rather than exact numbers.

@tiurin tiurin requested review from simonrw and anisaoshafi May 19, 2025 08:40
@tiurin tiurin added semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases area: testing Testing Localstack labels May 19, 2025
Copy link
Contributor

@anisaoshafi anisaoshafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, tried it out and works like a charm ✨
Thanks for tackling this Misha. 👏🏼

Copy link

github-actions bot commented May 19, 2025

LocalStack Community integration with Pro

    2 files  ±0      2 suites  ±0   1h 44m 45s ⏱️ -39s
4 468 tests ±0  4 080 ✅ ±0  388 💤 ±0  0 ❌ ±0 
4 470 runs  ±0  4 080 ✅ ±0  390 💤 ±0  0 ❌ ±0 

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it ❤️ Thank you @tiurin for the making this happen 🚀

I can confirm that time reporting works as expected, both in localstack and localstack-ext. I also tested that failing tests or non-AWS test executions are ignored.

if not is_aws_cloud() or outcome.excinfo:
return
# For json.dump sorted test entries enable consistent diffs.
# But test execution data is more readable in insert order for each step (setup, call, teardown).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: neat attention to detail ✨

@tiurin tiurin force-pushed the tests/test-duration branch from 95dd620 to dd929d6 Compare May 23, 2025 10:49
@tiurin tiurin requested review from thrau and HarshCasper as code owners May 23, 2025 10:49
@tiurin tiurin force-pushed the tests/test-duration branch from dd929d6 to a1477d0 Compare May 23, 2025 10:51
Copy link

github-actions bot commented May 23, 2025

Test Results - Preflight, Unit

21 579 tests  ±0   19 927 ✅ ±0   6m 16s ⏱️ ±0s
     1 suites ±0    1 652 💤 ±0 
     1 files   ±0        0 ❌ ±0 

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented May 23, 2025

Test Results (amd64) - Acceptance

7 tests  ±0   5 ✅ ±0   3m 5s ⏱️ -2s
1 suites ±0   2 💤 ±0 
1 files   ±0   0 ❌ ±0 

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented May 23, 2025

Test Results - Alternative Providers

597 tests  ±0   420 ✅ ±0   14m 56s ⏱️ +4s
  4 suites ±0   177 💤 ±0 
  4 files   ±0     0 ❌ ±0 

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented May 23, 2025

Test Results (amd64) - Integration, Bootstrap

    5 files  ±0      5 suites  ±0   2h 22m 38s ⏱️ -12s
4 823 tests ±0  4 282 ✅ ±0  541 💤 ±0  0 ❌ ±0 
4 829 runs  ±0  4 282 ✅ ±0  547 💤 ±0  0 ❌ ±0 

Results for commit 49adf9d. ± Comparison against base commit 433aeff.

♻️ This comment has been updated with latest results.

@tiurin tiurin added the review: merge when ready Signals to the reviewer that a PR can be merged if accepted label May 27, 2025
Copy link
Member

@dominikschubert dominikschubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a quick glance and just had some minor suggestions.

Thanks for tackling this! One somewhat immediate improvement I could think of is having a pytest flag that would allow you to print statistics of these execution times or at least collect them for each known pytest node id (i.e. for each test). We could use this to track how far along we are in collecting execution data over the whole test suite as well. 🤔

Just some thoughts though, nothing immediately actionable besides the import comment 👍


When a test runs successfully against AWS, its last validation date and duration are recorded in a corresponding ***.validation.json** file.
The validation date is recorded precisely, while test durations can vary between runs.
For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures.

Can't we track that anyway and capture it? That way we would avoid flipping between potentially minutes of setup time and only a few microseconds otherwise 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I thought about was to get test collection name via item.session.config.args, e.g. ['aws/services/lambda_/event_source_mapping']. It can be used as a part of key, or a unique property, so that test duration is updated if the test has been validated within the same test collection.
However, such usage means we should record durations for each new test collection, which is confusing. Or, record only for a predefined collections, e.g. only for individual runs, or only for class runs. Which is also confusing and can be opaque ("why my durations haven't been updated?"). Also, test ordering might come into play for collections. Plus, need to sanitize args as they may contain full path and reveal local setup details. Quite hard to factor in due to many unknown details and their unknown impact.

I'd bet now on simplicity, see if durations actually flip a lot (somewhat good sign, means tests are re-validated, hehe) and learn how we can adapt a format if needed.

tiurin added 7 commits May 28, 2025 10:15
validation entry is generated with durations

TODO
- don't sort keys in json.dumps - have phases ordered from first to last
tests should still be ordered by name (nodeid)

- format floats
- run on existing AWS tests
- remove dummy test
Alphabetic sorting between test as before but insert order for data inside.
setup and teardown are always successful
tiurin added 13 commits May 28, 2025 10:15
Writing file 3 different times as before could lead to inconsistencies if test failed or was interrupted.
Only write validation data once and when sure the test has passed.

Use test item's stash mechanism to store data between phases.
Another wrapper for pytest_runtest_makereport hook
is defined in localstack-snapshot using "old-style" hookwrapper.

It is not recommended to mix new and old-style wrappers in the same plugin, see a note here:
https://pluggy.readthedocs.io/en/latest/index.html#wrappers
- Add time unit to key name
- Move total to the same object as phases
@tiurin tiurin force-pushed the tests/test-duration branch from fe12552 to 6b207eb Compare May 28, 2025 08:15
Co-authored-by: Dominik Schubert <dominik.schubert91@gmail.com>
@tiurin tiurin force-pushed the tests/test-duration branch from 6b207eb to 49adf9d Compare May 28, 2025 08:34
Copy link
Contributor Author

@tiurin tiurin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One somewhat immediate improvement I could think of is having a pytest flag that would allow you to print statistics of these execution times or at least collect them for each known pytest node id (i.e. for each test). We could use this to track how far along we are in collecting execution data over the whole test suite as well.

@dominikschubert I think this call for reporting rather than printing, maybe a dashboard? I'll think about it.


When a test runs successfully against AWS, its last validation date and duration are recorded in a corresponding ***.validation.json** file.
The validation date is recorded precisely, while test durations can vary between runs.
For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I thought about was to get test collection name via item.session.config.args, e.g. ['aws/services/lambda_/event_source_mapping']. It can be used as a part of key, or a unique property, so that test duration is updated if the test has been validated within the same test collection.
However, such usage means we should record durations for each new test collection, which is confusing. Or, record only for a predefined collections, e.g. only for individual runs, or only for class runs. Which is also confusing and can be opaque ("why my durations haven't been updated?"). Also, test ordering might come into play for collections. Plus, need to sanitize args as they may contain full path and reveal local setup details. Quite hard to factor in due to many unknown details and their unknown impact.

I'd bet now on simplicity, see if durations actually flip a lot (somewhat good sign, means tests are re-validated, hehe) and learn how we can adapt a format if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: testing Testing Localstack review: merge when ready Signals to the reviewer that a PR can be merged if accepted semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants