Skip to content

Record validated tests duration #12638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

Record validated tests duration #12638

wants to merge 20 commits into from

Conversation

tiurin
Copy link
Contributor

@tiurin tiurin commented May 19, 2025

Motivation

Have more insights into how long does it take to run a parity test against AWS. Have more insights for each test execution phase.

Changes

  • Add a pytest hook that collects duration of each test execution phase - setup, call, teardown and writes the result into *.validation.json, along with already existing last validation date.
  • Remove the old hook and some unused code.
  • Re-validate one existing test to demonstrate the example. It is especially interesting to see how teardown can be almost half of total duration in some cases. This is the case when one sees test passed in PyCharm but the wheels are still spinning for a while:
    image

Considerations

Machines on which tests are executed can vary significantly in processing power. However, in case of AWS validated tests significant amount of time is spent in I/O waiting for AWS endpoint responses. That can also be affected by geographic location. A quick test on a small sample showed under 2% difference on the test in this PR on 3 different machines between Spain and South Africa (all tests were executed against us-east-1 region). Repetitive runs in different times of day on one machine were also similarly stable.

In any case, durations are meant to give an idea about how long the test runs rather than exact numbers.

@tiurin tiurin requested review from simonrw and anisaoshafi May 19, 2025 08:40
@tiurin tiurin added semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases area: testing Testing Localstack labels May 19, 2025
Copy link
Contributor

@anisaoshafi anisaoshafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, tried it out and works like a charm ✨
Thanks for tackling this Misha. 👏🏼

Copy link

github-actions bot commented May 19, 2025

LocalStack Community integration with Pro

    2 files  ±0      2 suites  ±0   1h 43m 37s ⏱️ +50s
4 464 tests ±0  4 075 ✅ ±0  389 💤 ±0  0 ❌ ±0 
4 466 runs  ±0  4 075 ✅ ±0  391 💤 ±0  0 ❌ ±0 

Results for commit a1477d0d. ± Comparison against base commit 893018f.

♻️ This comment has been updated with latest results.

Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it ❤️ Thank you @tiurin for the making this happen 🚀

I can confirm that time reporting works as expected, both in localstack and localstack-ext. I also tested that failing tests or non-AWS test executions are ignored.

if not is_aws_cloud() or outcome.excinfo:
return
# For json.dump sorted test entries enable consistent diffs.
# But test execution data is more readable in insert order for each step (setup, call, teardown).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: neat attention to detail ✨

tiurin added 18 commits May 23, 2025 12:49
validation entry is generated with durations

TODO
- don't sort keys in json.dumps - have phases ordered from first to last
tests should still be ordered by name (nodeid)

- format floats
- run on existing AWS tests
- remove dummy test
Alphabetic sorting between test as before but insert order for data inside.
setup and teardown are always successful
Writing file 3 different times as before could lead to inconsistencies if test failed or was interrupted.
Only write validation data once and when sure the test has passed.

Use test item's stash mechanism to store data between phases.
Another wrapper for pytest_runtest_makereport hook
is defined in localstack-snapshot using "old-style" hookwrapper.

It is not recommended to mix new and old-style wrappers in the same plugin, see a note here:
https://pluggy.readthedocs.io/en/latest/index.html#wrappers
- Add time unit to key name
- Move total to the same object as phases
@tiurin tiurin force-pushed the tests/test-duration branch from 95dd620 to dd929d6 Compare May 23, 2025 10:49
@tiurin tiurin requested review from thrau and HarshCasper as code owners May 23, 2025 10:49
@tiurin tiurin force-pushed the tests/test-duration branch from dd929d6 to a1477d0 Compare May 23, 2025 10:51
Copy link

Test Results - Preflight, Unit

21 601 tests  ±0   19 955 ✅ ±0   6m 53s ⏱️ +45s
     1 suites ±0    1 646 💤 ±0 
     1 files   ±0        0 ❌ ±0 

Results for commit a1477d0d. ± Comparison against base commit 893018f.

Copy link

Test Results (amd64) - Acceptance

7 tests  ±0   5 ✅ ±0   3m 7s ⏱️ -2s
1 suites ±0   2 💤 ±0 
1 files   ±0   0 ❌ ±0 

Results for commit a1477d0d. ± Comparison against base commit 893018f.

Copy link

Test Results - Alternative Providers

597 tests   419 ✅  15m 3s ⏱️
  4 suites  178 💤
  4 files      0 ❌

Results for commit a1477d0d.

Copy link

Test Results (amd64) - Integration, Bootstrap

    5 files      5 suites   2h 21m 21s ⏱️
4 819 tests 4 277 ✅ 542 💤 0 ❌
4 825 runs  4 277 ✅ 548 💤 0 ❌

Results for commit a1477d0d.

@tiurin tiurin added the review: merge when ready Signals to the reviewer that a PR can be merged if accepted label May 27, 2025
Copy link
Member

@dominikschubert dominikschubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a quick glance and just had some minor suggestions.

Thanks for tackling this! One somewhat immediate improvement I could think of is having a pytest flag that would allow you to print statistics of these execution times or at least collect them for each known pytest node id (i.e. for each test). We could use this to track how far along we are in collecting execution data over the whole test suite as well. 🤔

Just some thoughts though, nothing immediately actionable besides the import comment 👍


When a test runs successfully against AWS, its last validation date and duration are recorded in a corresponding ***.validation.json** file.
The validation date is recorded precisely, while test durations can vary between runs.
For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures.

Can't we track that anyway and capture it? That way we would avoid flipping between potentially minutes of setup time and only a few microseconds otherwise 🤔

Comment on lines +15 to +16
from _pytest.reports import TestReport
from _pytest.stash import StashKey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from _pytest.reports import TestReport
from _pytest.stash import StashKey
from pytest import TestReport, StashKey

these are exposed as public API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: testing Testing Localstack review: merge when ready Signals to the reviewer that a PR can be merged if accepted semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants