chore: cache terraform providers between CI test runs #17373

hugodutka · 2025-04-12T17:58:02Z

This PR starts caching Terraform providers used by TestProvision in provisioner/terraform/provision_test.go. The goal is to improve the reliability of this test by cutting down on the number of network calls to external services. It leverages GitHub Actions cache, which on depot runners is persisted for 14 days by default.

Other than the aforementioned TestProvision, I couldn't find any other tests which depend on external terraform providers.

spikecurtis · 2025-04-23T10:58:31Z

provisioner/terraform/provision_test.go

+		file := templateFiles[fileName]
+		_, err := hasher.Write([]byte(fileName))
+		require.NoError(t, err)
+		_, err = hasher.Write([]byte(file))


You need to include some sort of delimiter or dogfood: stuff will hash the same as dog: foodstuff

provisioner/terraform/provision_test.go

spikecurtis · 2025-04-23T11:11:10Z

provisioner/terraform/provision_test.go

+
+	for fileName, file := range templateFiles {
+		filePath := filepath.Join(filesDir, fileName)
+		if _, err := os.Stat(filePath); os.IsNotExist(err) {


didn't we just check that the directory doesn't already exist? How could the file already exist if the directory doesn't?

You're right, this check is redundant. I'll remove it.

spikecurtis · 2025-04-23T11:28:48Z

provisioner/terraform/provision_test.go

+	// Each test gets a unique cache dir based on its name and template files.
+	// This ensures that tests can download providers in parallel and that they
+	// will redownload providers if the template files change.
+	hash := hashTemplateFilesAndTestName(t, t.Name(), templateFiles)


Providers are versioned, so I don't think each test or each version of each test needs its own unique mirror.

It results in multiple copies of the providers being downloaded on each system, and there doesn't seem to be any mechanism to clean up old versions of tests. The CI caches expire after 14 days, but people's personal development environments won't get cleaned up.

terraform providers mirror can be run multiple times on the same target directory and it will merge the results:

You can run terraform providers mirror again on an existing mirror directory to update it with new packages.

https://developer.hashicorp.com/terraform/cli/commands/providers/mirror

So, what do you think of doing two passes over the test cases: one serially that sets of the mirror with the totality of the modules we need, then a second, parallel pass that runs the tests?

Doing a serial pass is pretty slow if you haven't cached the providers beforehand. I just tried it, and it takes 27 seconds to cache the providers serially in my Coder workspace. In contrast, doing it in parallel only takes 5 seconds.

The cache size across all tests is currently around 70 MB - probably not a big deal. That said, it could balloon if someone starts iterating on a test locally. To avoid that, maybe we should add cleanup logic instead of doing a serial pass?

We already know the paths to provider directories based on test hashes, so at the start of TestProvision, we could check which ones are needed and remove the rest.

Cleanup logic would also be fine here, so we don't blow up people's cache.

spikecurtis · 2025-04-23T11:53:35Z

.github/actions/setup-test-cache/action.yaml

+      with:
+        path: ${{ inputs.cache-path }}
+        # The key doesn't need to include an OS name. The action already takes
+        # that into account: https://github.com/actions/cache/tree/5a3ec84eff668545956fd18022155c47e93e2684?tab=readme-ov-file#cache-version


I don't think we're entirely out of the woods here. While the different paths on Windows vs Linux might mean that those caches are separate, different architectures on the same OS don't seem like they'd have a different cache version.

Also, the paths might not be so different on macOS vs Linux, for example.

Hmm, you're right about different architectures on the same OS, I haven't thought about that. I'll generate unique cache key prefixes based on os and architecture.

spikecurtis · 2025-04-23T11:57:59Z

.github/actions/setup-test-cache/action.yaml

+      # job is running on the main branch.
+      # Without it, PRs would create one-use caches that would linger until
+      # expiration and we'd be charged for them. I evaluated a couple of options
+      # for limiting the cache to the main branch, and forking was the simplest.


Can't we handle this use case by using explicit restore and save steps? I.e. we restore in all cases, but only save on main?

https://github.com/actions/cache/blob/5a3ec84eff668545956fd18022155c47e93e2684/caching-strategies.md#saving-cache

I considered it but the GitHub Actions syntax makes it unwieldy. I wanted to package all the cache steps into a single action to avoid code duplication across the 5 workflows that need the cache. I tried doing it with restore and save, but found that I cannot add a defer-like save statement in a composite action. So I'd need to add 3 additional steps to each of the workflows: one to generate the cache keys, one restore, one save. That's a lot of code duplication - I believe the current approach is cleaner.

Forking the entire repo seems like a lot of code duplication! It's just better hidden, and means we have yet another fork to manage.

That's a good point. I'll refactor it to use the more verbose restore and save approach.

spikecurtis · 2025-04-23T12:06:27Z

.github/actions/setup-test-cache/action.yaml

+        # This prevents PRs from rebuilding the cache on the first day of the month.
+        restore-keys: |
+          ${{ inputs.key-prefix }}-${{ steps.dates.outputs.year-month }}-
+          ${{ github.ref != 'refs/heads/main' && format('{0}-{1}-', inputs.key-prefix, steps.dates.outputs.prev-year-month) || '' }}


I don't get why we need to have this special case for PRs. If we are preventing PRs from saving to the cache anyway, why do we need to restrict only PRs from restoring last month's cache if this month is not available?

Also, why are we even bothering to match on month? If today's cache isn't available, doesn't the cache action automatically match the most recent available cache based on the key-prefix alone?

This approach lets us reset the cache completely at the start of each month. We upload an updated cache daily, and each update is based on the previous day's cache. Without the monthly cut-off on main, any file added to the cache would persist indefinitely. The monthly reset ensures that old or obsolete files do not linger in the cache forever.

If you implement cleanup in the test code, which I think is a good idea, then we'll only cache what we need daily, and a monthly reset won't be required.

I'm hesitant to get rid of the monthly clean-up for two reasons:

If other tests start using the cache and don't clean up properly after themselves, the cache will grow.

If a Terraform provider that one of the tests depends on becomes permanently unavailable, we won't be able to detect it without a cache reset.

I don't think matching on the month complicates the code too much, so I'd be for keeping it.

…h-specific key prefixes

hugodutka · 2025-04-24T14:01:38Z

@spikecurtis I addressed your feedback:

started using a delimiter when calculating test hashes
skipped the os.Stat check
started cleaning up unused test caches
started using cache key prefixes that include the OS and the Arch
refactored caching to use separate restore and save steps

github-actions bot assigned hugodutka Apr 12, 2025

hugodutka force-pushed the hugodutka/terraform-providers-ci-cache branch 12 times, most recently from 9a5b5a1 to 8885000 Compare April 13, 2025 19:47

hugodutka mentioned this pull request Apr 14, 2025

flake: TestProvision/returns-modules coder/internal#322

Closed

set up a terraform providers mirror for tests

f11fea6

github-actions bot added the stale This issue is like stale bread. label Apr 21, 2025

johnstcn removed the stale This issue is like stale bread. label Apr 22, 2025

hugodutka force-pushed the hugodutka/terraform-providers-ci-cache branch 11 times, most recently from 9d41243 to 2ab657c Compare April 22, 2025 12:05

add cache to CI

3a9e0b3

hugodutka force-pushed the hugodutka/terraform-providers-ci-cache branch from 2ab657c to 3a9e0b3 Compare April 22, 2025 12:17

hugodutka marked this pull request as ready for review April 22, 2025 13:13

hugodutka requested a review from spikecurtis April 22, 2025 13:13

spikecurtis reviewed Apr 23, 2025

View reviewed changes

hugodutka added 2 commits April 24, 2025 10:55

address feedback: hashing, file exists check, cache clean up

2236dd4

address feedback: use separate save and restore steps, use os-and-arc…

d56aaa7

…h-specific key prefixes

hugodutka force-pushed the hugodutka/terraform-providers-ci-cache branch 3 times, most recently from 2c226f2 to d56aaa7 Compare April 24, 2025 12:03

spikecurtis approved these changes Apr 28, 2025

View reviewed changes

hugodutka merged commit b47d54d into main Apr 28, 2025
30 of 31 checks passed

hugodutka deleted the hugodutka/terraform-providers-ci-cache branch April 28, 2025 08:57

github-actions bot locked and limited conversation to collaborators Apr 28, 2025

chore: cache terraform providers between CI test runs #17373

chore: cache terraform providers between CI test runs #17373

Uh oh!

Conversation

hugodutka commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hugodutka commented Apr 24, 2025

Uh oh!

Uh oh!

Uh oh!

hugodutka commented Apr 12, 2025 •

edited

Loading