Skip to content

Document design decisions for new test framework #3214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fwilhe
Copy link
Member

@fwilhe fwilhe commented Jul 25, 2025

Based on #3156 and #3159

This PR documents the design decisions for the new testing framework.

@github-actions github-actions bot added the docs label Jul 25, 2025
@fwilhe fwilhe requested a review from nkraetzschmar July 25, 2025 11:06
Copy link
Contributor

@nkraetzschmar nkraetzschmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also include somewhere a section on running tests manually/locally. That should be as straight forward as possible.

Ideally both to run the tests in a local chroot/VM and running tests on a cloud VM should be straight forward as long as one brings their own credentials. So the openTofu logic etc must be easily usable outside of the github automations without complex local setups. (both on Linux and macOS dev workstations)


These approaches were not adopted due to the following limitations:
- **Software Availability:** Container runtimes and systemd are not present in all target environments.
- **Permission Requirements:** Both methods require elevated privileges, which may not be feasible or desirable in production or restricted systems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The permissions/privileges concern is not just, even not primarily, about production systems. One of the main reasons to aim for privilege less test setup and execution is to allow running tests as a developer (against a locally build artifact) without requiring any privileges beyond what's needed to build (so unprivileged user namespaces only).


## Decision

The redesigned test framework will treat the system under test as strictly read-only. Tests must not modify system state, install packages, enable services, or change configuration. The framework itself will not require SSH setup or any other mutation of the target system. All test logic must operate without side effects, ensuring that the system remains unchanged before, during, and after test execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While in principle, yes, we do want to avoid having tests modify the system state; this cannot always be avoided for all tests (e.g. testing of loading kernel modules works). So instead we should build the tests such that tests that modify global system state MUST be clearly marked via a pytest marker and skipped unless the test framework is run with an explicit arg to allow modifications.
These tests would then only ever be run on ephemeral targets, such as local test only VMs or during the platform tests.


- **Distribution Mechanisms:** The test suite may be delivered via scp, cloud-init/user_data, OCI registry/artifact, image attach, or other platform-specific methods. The suite will be packaged as a relocatable tarball or directory, and may be built on demand or pulled as a build artifact.
- **Cloud Provider Support:** Image formats and deployment workflows will be adapted for each provider (e.g., raw, vhd, qcow2), with research into automation and API integration for disk/image attachment.
- **Reporting:** Test results will be exported in a [diki](https://github.com/gardener/diki)-compatible format as part of the MVP, enabling integration with external systems and dashboards. Additional formats (e.g., JUnit XML) may be supported as needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we discuss/agree on the test result output already? My last state was that we would primarily aim for human readable test run logs with other formats as a side channel output, but I guess these details are still open for discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diki requirement is mentioned here #3156 (comment)

I mean, it could be designed in a way that new outputs formats can be extended without rewriting the tests/frameworks, so maybe this should really not be specified here

@fwilhe
Copy link
Member Author

fwilhe commented Aug 1, 2025

We should also include somewhere a section on running tests manually/locally. That should be as straight forward as possible.

Ideally both to run the tests in a local chroot/VM and running tests on a cloud VM should be straight forward as long as one brings their own credentials. So the openTofu logic etc must be easily usable outside of the github automations without complex local setups. (both on Linux and macOS dev workstations)

I tried to address this here https://github.com/gardenlinux/gardenlinux/pull/3214/files#diff-a22547a436b6499cc9954c40fe0b182df42a88cef14f1e1a89fab59a84105d69R28

@fwilhe
Copy link
Member Author

fwilhe commented Aug 1, 2025

@nkraetzschmar thanks for your review, I've addressed the comments

@fwilhe fwilhe marked this pull request as ready for review August 8, 2025 07:46
@fwilhe fwilhe requested a review from a team as a code owner August 8, 2025 07:46
@fwilhe
Copy link
Member Author

fwilhe commented Aug 8, 2025

@nkraetzschmar should be reviewable now where the initial framework is merged

@NotTheEvilOne NotTheEvilOne self-requested a review August 8, 2025 07:52
- **Distribution Mechanisms:** The test suite may be delivered via scp, cloud-init/user_data, OCI registry/artifact, image attach, or other platform-specific methods. The suite will be packaged as a relocatable tarball or directory, and may be built on demand or pulled as a build artifact.
- **Cloud Provider Support:** Image formats and deployment workflows will be adapted for each provider (e.g., raw, vhd, qcow2), with research into automation and API integration for disk/image attachment.
- **Reporting:** Test output will be flexible and allow custom formats in a plugin-based system, so that new formats are easy to add. The default output will be a human-readable text format, machine readable outputs such as a [diki](https://github.com/gardener/diki)-compatible format or JUnit xml output may be added later.
- **Backchannel for Logs:** Mechanisms such as scp, custom APIs, or direct S3 uploads will be explored for retrieving logs and results from the system under test.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part should be formulated more general. Such as: As for reporting itself the delivery of the logs and results will be implemented in a plugin-based approach to suite execution scenarios best.

- **Portability:** The test suite can run in containers, chroots, VMs, bare metal, and production systems.
- **Flexibility:** Multiple deployment mechanisms are supported; the framework is not tied to a specific transport or runtime.
- **Maintainability:** The framework is easier to reason about and maintain, as tests run in a predictable, local context.
- **Reporting:** Output can be collected via stdout/stderr, persisted as JUnit XML, or exported in other formats.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me how the output is collected via stdout/stderr. Sure, there must be a command dispatch to run the tests, but to me it seems like the new framework minimizes shell interaction. I'm probably missing something here, but maybe you could add some details to make things clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants