-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Record validated tests duration #12638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
tests/aws/services/lambda_/event_source_mapping/test_cfn_resource.validation.json
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, tried it out and works like a charm ✨
Thanks for tackling this Misha. 👏🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it ❤️ Thank you @tiurin for the making this happen 🚀
I can confirm that time reporting works as expected, both in localstack and localstack-ext. I also tested that failing tests or non-AWS test executions are ignored.
if not is_aws_cloud() or outcome.excinfo: | ||
return | ||
# For json.dump sorted test entries enable consistent diffs. | ||
# But test execution data is more readable in insert order for each step (setup, call, teardown). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: neat attention to detail ✨
95dd620
to
dd929d6
Compare
dd929d6
to
a1477d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a quick glance and just had some minor suggestions.
Thanks for tackling this! One somewhat immediate improvement I could think of is having a pytest flag that would allow you to print statistics of these execution times or at least collect them for each known pytest node id (i.e. for each test). We could use this to track how far along we are in collecting execution data over the whole test suite as well. 🤔
Just some thoughts though, nothing immediately actionable besides the import comment 👍
|
||
When a test runs successfully against AWS, its last validation date and duration are recorded in a corresponding ***.validation.json** file. | ||
The validation date is recorded precisely, while test durations can vary between runs. | ||
For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures.
Can't we track that anyway and capture it? That way we would avoid flipping between potentially minutes of setup time and only a few microseconds otherwise 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I thought about was to get test collection name via item.session.config.args
, e.g. ['aws/services/lambda_/event_source_mapping']
. It can be used as a part of key, or a unique property, so that test duration is updated if the test has been validated within the same test collection.
However, such usage means we should record durations for each new test collection, which is confusing. Or, record only for a predefined collections, e.g. only for individual runs, or only for class runs. Which is also confusing and can be opaque ("why my durations haven't been updated?"). Also, test ordering might come into play for collections. Plus, need to sanitize args as they may contain full path and reveal local setup details. Quite hard to factor in due to many unknown details and their unknown impact.
I'd bet now on simplicity, see if durations actually flip a lot (somewhat good sign, means tests are re-validated, hehe) and learn how we can adapt a format if needed.
localstack-core/localstack/testing/pytest/validation_tracking.py
Outdated
Show resolved
Hide resolved
validation entry is generated with durations TODO - don't sort keys in json.dumps - have phases ordered from first to last tests should still be ordered by name (nodeid) - format floats - run on existing AWS tests - remove dummy test
Alphabetic sorting between test as before but insert order for data inside.
Better readability.
setup and teardown are always successful
Writing file 3 different times as before could lead to inconsistencies if test failed or was interrupted. Only write validation data once and when sure the test has passed. Use test item's stash mechanism to store data between phases.
This reverts commit e2ced0a.
Another wrapper for pytest_runtest_makereport hook is defined in localstack-snapshot using "old-style" hookwrapper. It is not recommended to mix new and old-style wrappers in the same plugin, see a note here: https://pluggy.readthedocs.io/en/latest/index.html#wrappers
- Add time unit to key name - Move total to the same object as phases
fe12552
to
6b207eb
Compare
Co-authored-by: Dominik Schubert <dominik.schubert91@gmail.com>
6b207eb
to
49adf9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One somewhat immediate improvement I could think of is having a pytest flag that would allow you to print statistics of these execution times or at least collect them for each known pytest node id (i.e. for each test). We could use this to track how far along we are in collecting execution data over the whole test suite as well.
@dominikschubert I think this call for reporting rather than printing, maybe a dashboard? I'll think about it.
|
||
When a test runs successfully against AWS, its last validation date and duration are recorded in a corresponding ***.validation.json** file. | ||
The validation date is recorded precisely, while test durations can vary between runs. | ||
For example, test setup time may differ depending on whether a test runs in isolation or as part of a class test suite with class-level fixtures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I thought about was to get test collection name via item.session.config.args
, e.g. ['aws/services/lambda_/event_source_mapping']
. It can be used as a part of key, or a unique property, so that test duration is updated if the test has been validated within the same test collection.
However, such usage means we should record durations for each new test collection, which is confusing. Or, record only for a predefined collections, e.g. only for individual runs, or only for class runs. Which is also confusing and can be opaque ("why my durations haven't been updated?"). Also, test ordering might come into play for collections. Plus, need to sanitize args as they may contain full path and reveal local setup details. Quite hard to factor in due to many unknown details and their unknown impact.
I'd bet now on simplicity, see if durations actually flip a lot (somewhat good sign, means tests are re-validated, hehe) and learn how we can adapt a format if needed.
Motivation
Have more insights into how long does it take to run a parity test against AWS. Have more insights for each test execution phase.
Changes
*.validation.json
, along with already existing last validation date.Considerations
Machines on which tests are executed can vary significantly in processing power. However, in case of AWS validated tests significant amount of time is spent in I/O waiting for AWS endpoint responses. That can also be affected by geographic location. A quick test on a small sample showed under 2% difference on the test in this PR on 3 different machines between Spain and South Africa (all tests were executed against us-east-1 region). Repetitive runs in different times of day on one machine were also similarly stable.
In any case, durations are meant to give an idea about how long the test runs rather than exact numbers.