Skip to content

feat: implement expiration policy logic for prebuilds #17996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 26, 2025

Conversation

ssncferreira
Copy link
Contributor

@ssncferreira ssncferreira commented May 22, 2025

Summary

This PR introduces support for expiration policies in prebuilds. The TTL (time-to-live) is retrieved from the Terraform configuration (terraform-provider-coder PR):

prebuilds = {
	  instances = 2
	  expiration_policy {
		  ttl = 86400
	  }
  }

Note: Since there is no need for precise TTL enforcement down to the second, in this implementation expired prebuilds are handled in a single reconciliation cycle: they are deleted, and new instances are created only if needed to match the desired count.

Changes

  • The outcome of a reconciliation cycle is now expressed as a slice of reconciliation actions, instead of a single aggregated action.
  • Adjusted reconciliation logic to delete expired prebuilds and guarantee that the number of desired instances is correct.
  • Updated relevant data structures and methods to support expiration policies parameters.
  • Added documentation to Prebuilt workspaces page
  • Update terraform-provider-coder to version 2.5.0: https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0

Depends on: coder/terraform-provider-coder#404
Fixes: #17916

// filterExpiredWorkspaces splits running workspaces into expired and non-expired
// based on the preset's InvalidateAfterSecs TTL. If TTL is missing or zero,
// all workspaces are considered non-expired.
func filterExpiredWorkspaces(preset database.GetTemplatePresetsWithPrebuildsRow, runningWorkspaces []database.GetRunningPrebuiltWorkspacesRow) (nonExpired []database.GetRunningPrebuiltWorkspacesRow, expired []database.GetRunningPrebuiltWorkspacesRow) {
Copy link
Contributor Author

@ssncferreira ssncferreira May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s an alternative to perform this filtering directly in the database, but for now, opted for an in-memory approach to keep the implementation simpler and the PR smaller. The main trade-off here is performance at scale, while in-memory filtering is sufficient for the current scale, we may want to revisit a DB-based solution if we encounter issues at higher scale. Let me know if you think a database solution would be a better fit now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, the un-filtered data still comes directly from the database but is filtered in-memory.

From what I can tell, the changes required to move this logic inside the database would involve passing in the preset TTL and doing the TTL calculation inside the database. Is that correct?

If so, I agree that it can be kept in-memory for the moment, but we should add a follow-up issue to track this.

Copy link
Contributor Author

@ssncferreira ssncferreira May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, the un-filtered data still comes directly from the database but is filtered in-memory.

Yes, that's correct, the filtering currently happens in-memory after fetching all the running prebuilds (expired and non-expired) from the database.

From what I can tell, the changes required to move this logic inside the database would involve passing in the preset TTL and doing the TTL calculation inside the database. Is that correct?

Right, moving it into the database would require doing the TTL check within the query. We don’t need to pass the preset TTL manually since it's already stored in the database, so the query can join against the template_version_presets table and compute the expiration inline.
We’d also need to update the current GetRunningPrebuiltWorkspacesRow function to return only the non-expired running prebuilds.

If so, I agree that it can be kept in-memory for the moment, but we should add a follow-up issue to track this.

I’ll create a follow-up issue to track moving it into the database for better efficiency/scalability 👍

// - Running: prebuilds running and non-expired
// - Expired: prebuilds running and expired due to the preset's TTL
// - InProgress: prebuilds currently in progress
// - Backoff: holds failure info to decide if prebuild creation should be backed off
type PresetSnapshot struct {
Preset database.GetTemplatePresetsWithPrebuildsRow
Running []database.GetRunningPrebuiltWorkspacesRow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not a fan of the Running field name here, it’s a bit misleading since now it only includes non-expired running prebuilds. Maybe something like RunningValid would make that clearer. Open to other naming ideas too if anyone has suggestions.
I can make this change in a follow-up PR to avoid adding more entropy here.

@ssncferreira ssncferreira force-pushed the ssncferreira/feat-prebuilds-cache-invalidation branch from 5a27ee4 to ef33f48 Compare May 22, 2025 18:54
@ssncferreira ssncferreira force-pushed the ssncferreira/feat-prebuilds-cache-invalidation branch from ef33f48 to 5538be5 Compare May 22, 2025 18:59
@ssncferreira ssncferreira force-pushed the ssncferreira/feat-prebuilds-cache-invalidation branch from 5538be5 to a38ee7c Compare May 22, 2025 19:20
@ssncferreira ssncferreira changed the title feat: implement cache invalidation logic for prebuilds feat: implement expiration policy logic for prebuilds May 26, 2025
@ssncferreira ssncferreira marked this pull request as ready for review May 26, 2025 11:58
@SasSwart
Copy link
Contributor

LGTM, but given my proximity to the project, I'll defer formal approval to an independent reviewer.

// filterExpiredWorkspaces splits running workspaces into expired and non-expired
// based on the preset's InvalidateAfterSecs TTL. If TTL is missing or zero,
// all workspaces are considered non-expired.
func filterExpiredWorkspaces(preset database.GetTemplatePresetsWithPrebuildsRow, runningWorkspaces []database.GetRunningPrebuiltWorkspacesRow) (nonExpired []database.GetRunningPrebuiltWorkspacesRow, expired []database.GetRunningPrebuiltWorkspacesRow) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, the un-filtered data still comes directly from the database but is filtered in-memory.

From what I can tell, the changes required to move this logic inside the database would involve passing in the preset TTL and doing the TTL calculation inside the database. Is that correct?

If so, I agree that it can be kept in-memory for the moment, but we should add a follow-up issue to track this.

@evgeniy-scherbina
Copy link
Contributor

Despite TestExpiredPrebuilds contains good test coverage, I noticed that we don't have new tests in reconcile_test.go which tests ReconcileAll API? So we don't test how ReconcilePreset handles reconciliation based on multiple actions?

@ssncferreira
Copy link
Contributor Author

Despite TestExpiredPrebuilds contains good test coverage, I noticed that we don't have new tests in reconcile_test.go which tests ReconcileAll API? So we don't test how ReconcilePreset handles reconciliation based on multiple actions?

You’re right that TestExpiredPrebuilds covers the lower-level logic, but we currently don’t have tests directly targeting ReconcileAll or ReconcilePreset to verify the full reconciliation flow with multiple actions.

I agree it would be valuable to add tests at the ReconcileAll level to ensure the end-to-end behavior. As discussed internally, I will address those in a follow-up PR.

@ssncferreira ssncferreira merged commit 6f6e73a into main May 26, 2025
38 of 39 checks passed
@ssncferreira ssncferreira deleted the ssncferreira/feat-prebuilds-cache-invalidation branch May 26, 2025 19:31
@github-actions github-actions bot locked and limited conversation to collaborators May 26, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prebuild TTL Configuration Support
4 participants