Document design decisions for new test framework #3214

fwilhe · 2025-07-25T11:06:28Z

This PR documents the design decisions for the new testing framework.

nkraetzschmar

We should also include somewhere a section on running tests manually/locally. That should be as straight forward as possible.

Ideally both to run the tests in a local chroot/VM and running tests on a cloud VM should be straight forward as long as one brings their own credentials. So the openTofu logic etc must be easily usable outside of the github automations without complex local setups. (both on Linux and macOS dev workstations)

nkraetzschmar · 2025-07-29T12:14:44Z

docs/architecture/decisions/0006-new-test-framework-in-place-self-contained-test-execution.md

+
+These approaches were not adopted due to the following limitations:
+- **Software Availability:** Container runtimes and systemd are not present in all target environments.
+- **Permission Requirements:** Both methods require elevated privileges, which may not be feasible or desirable in production or restricted systems.


The permissions/privileges concern is not just, even not primarily, about production systems. One of the main reasons to aim for privilege less test setup and execution is to allow running tests as a developer (against a locally build artifact) without requiring any privileges beyond what's needed to build (so unprivileged user namespaces only).

nkraetzschmar · 2025-07-29T12:19:58Z

docs/architecture/decisions/0007-non-invasive-read-only-testing.md

+
+## Decision
+
+The redesigned test framework will treat the system under test as strictly read-only. Tests must not modify system state, install packages, enable services, or change configuration. The framework itself will not require SSH setup or any other mutation of the target system. All test logic must operate without side effects, ensuring that the system remains unchanged before, during, and after test execution.


While in principle, yes, we do want to avoid having tests modify the system state; this cannot always be avoided for all tests (e.g. testing of loading kernel modules works). So instead we should build the tests such that tests that modify global system state MUST be clearly marked via a pytest marker and skipped unless the test framework is run with an explicit arg to allow modifications.
These tests would then only ever be run on ephemeral targets, such as local test only VMs or during the platform tests.

nkraetzschmar · 2025-07-29T12:23:13Z

docs/architecture/decisions/0009-flexible-distribution-and-reporting.md

+
+- **Distribution Mechanisms:** The test suite may be delivered via scp, cloud-init/user_data, OCI registry/artifact, image attach, or other platform-specific methods. The suite will be packaged as a relocatable tarball or directory, and may be built on demand or pulled as a build artifact.
+- **Cloud Provider Support:** Image formats and deployment workflows will be adapted for each provider (e.g., raw, vhd, qcow2), with research into automation and API integration for disk/image attachment.
+- **Reporting:** Test results will be exported in a [diki](https://github.com/gardener/diki)-compatible format as part of the MVP, enabling integration with external systems and dashboards. Additional formats (e.g., JUnit XML) may be supported as needed.


Did we discuss/agree on the test result output already? My last state was that we would primarily aim for human readable test run logs with other formats as a side channel output, but I guess these details are still open for discussion.

Diki requirement is mentioned here #3156 (comment)

I mean, it could be designed in a way that new outputs formats can be extended without rewriting the tests/frameworks, so maybe this should really not be specified here

fwilhe · 2025-08-01T13:10:02Z

We should also include somewhere a section on running tests manually/locally. That should be as straight forward as possible.

Ideally both to run the tests in a local chroot/VM and running tests on a cloud VM should be straight forward as long as one brings their own credentials. So the openTofu logic etc must be easily usable outside of the github automations without complex local setups. (both on Linux and macOS dev workstations)

I tried to address this here https://github.com/gardenlinux/gardenlinux/pull/3214/files#diff-a22547a436b6499cc9954c40fe0b182df42a88cef14f1e1a89fab59a84105d69R28

fwilhe · 2025-08-01T13:34:22Z

@nkraetzschmar thanks for your review, I've addressed the comments

fwilhe · 2025-08-08T07:46:29Z

@nkraetzschmar should be reviewable now where the initial framework is merged

NotTheEvilOne · 2025-08-08T09:13:55Z

docs/architecture/decisions/0009-flexible-distribution-and-reporting.md

+- **Distribution Mechanisms:** The test suite may be delivered via scp, cloud-init/user_data, OCI registry/artifact, image attach, or other platform-specific methods. The suite will be packaged as a relocatable tarball or directory, and may be built on demand or pulled as a build artifact.
+- **Cloud Provider Support:** Image formats and deployment workflows will be adapted for each provider (e.g., raw, vhd, qcow2), with research into automation and API integration for disk/image attachment.
+- **Reporting:** Test output will be flexible and allow custom formats in a plugin-based system, so that new formats are easy to add. The default output will be a human-readable text format, machine readable outputs such as a [diki](https://github.com/gardener/diki)-compatible format or JUnit xml output may be added later.
+- **Backchannel for Logs:** Mechanisms such as scp, custom APIs, or direct S3 uploads will be explored for retrieving logs and results from the system under test.


I think this part should be formulated more general. Such as: As for reporting itself the delivery of the logs and results will be implemented in a plugin-based approach to suite execution scenarios best.

Leon-hk · 2025-08-08T09:51:52Z

docs/architecture/decisions/0006-new-test-framework-in-place-self-contained-test-execution.md

+- **Portability:** The test suite can run in containers, chroots, VMs, bare metal, and production systems.
+- **Flexibility:** Multiple deployment mechanisms are supported; the framework is not tied to a specific transport or runtime.
+- **Maintainability:** The framework is easier to reason about and maintain, as tests run in a predictable, local context.
+- **Reporting:** Output can be collected via stdout/stderr, persisted as JUnit XML, or exported in other formats.


It is not clear to me how the output is collected via stdout/stderr. Sure, there must be a command dispatch to run the tests, but to me it seems like the new framework minimizes shell interaction. I'm probably missing something here, but maybe you could add some details to make things clear

Document design decisions for new test framework

356d8d5

Based on #3156 and #3159

github-actions bot added the docs label Jul 25, 2025

fwilhe requested a review from nkraetzschmar July 25, 2025 11:06

fwilhe added 2 commits July 25, 2025 13:16

[no ci] document containers/sysext

05bdfb8

[no ci] link main test framework adr

552c00e

nkraetzschmar requested changes Jul 29, 2025

View reviewed changes

fwilhe added 2 commits August 1, 2025 15:30

Address PR comments

c1a8f25

[no ci] output formats

c44a00f

fwilhe marked this pull request as ready for review August 8, 2025 07:46

fwilhe requested a review from a team as a code owner August 8, 2025 07:46

NotTheEvilOne self-requested a review August 8, 2025 07:52

NotTheEvilOne reviewed Aug 8, 2025

View reviewed changes

Leon-hk reviewed Aug 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document design decisions for new test framework #3214

Document design decisions for new test framework #3214

Uh oh!

fwilhe commented Jul 25, 2025 •

edited

Loading

Uh oh!

nkraetzschmar left a comment

Uh oh!

nkraetzschmar Jul 29, 2025

Uh oh!

nkraetzschmar Jul 29, 2025

Uh oh!

nkraetzschmar Jul 29, 2025

Uh oh!

fwilhe Aug 1, 2025

Uh oh!

fwilhe commented Aug 1, 2025

Uh oh!

fwilhe commented Aug 1, 2025

Uh oh!

fwilhe commented Aug 8, 2025

Uh oh!

NotTheEvilOne Aug 8, 2025

Uh oh!

Leon-hk Aug 8, 2025

Uh oh!

Uh oh!


		## Decision

		The redesigned test framework will treat the system under test as strictly read-only. Tests must not modify system state, install packages, enable services, or change configuration. The framework itself will not require SSH setup or any other mutation of the target system. All test logic must operate without side effects, ensuring that the system remains unchanged before, during, and after test execution.

Document design decisions for new test framework #3214

Are you sure you want to change the base?

Document design decisions for new test framework #3214

Uh oh!

Conversation

fwilhe commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nkraetzschmar left a comment

Choose a reason for hiding this comment

Uh oh!

nkraetzschmar Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

nkraetzschmar Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

nkraetzschmar Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

fwilhe Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

fwilhe commented Aug 1, 2025

Uh oh!

fwilhe commented Aug 1, 2025

Uh oh!

fwilhe commented Aug 8, 2025

Uh oh!

NotTheEvilOne Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Leon-hk Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fwilhe commented Jul 25, 2025 •

edited

Loading