Skip to content

feat: implement autoscaling mechanism for prebuilds #18126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

evgeniy-scherbina
Copy link
Contributor

@evgeniy-scherbina evgeniy-scherbina commented May 30, 2025

Closes coder/internal#312
Depends on coder/terraform-provider-coder#408

This PR adds support for defining an autoscaling block for prebuilds, allowing number of desired instances to scale dynamically based on a schedule.

Example usage:

data "coder_workspace_preset" "us-nix" {
  ...
  
  prebuilds = {
    instances = 0                  # default to 0 instances
    
    autoscaling = {
      timezone = "UTC"             # a single timezone is used for simplicity
      
      # Scale to 3 instances during the work week
      schedule {
        cron = "* 8-18 * * 1-5"    # from 8AM–6:59PM, Mon–Fri, UTC
        instances = 3              # scale to 3 instances
      }
      
      # Scale to 1 instance on Saturdays for urgent support queries
      schedule {
        cron = "* 8-14 * * 6"      # from 8AM–2:59PM, Sat, UTC
        instances = 1              # scale to 1 instance
      }
    }
  }
}

Behavior

  • Multiple schedule blocks per prebuilds block are supported.
  • If the current time matches any defined autoscaling schedule, the corresponding number of instances is used.
  • If no schedule matches, the default instance count (prebuilds.instances) is used as a fallback.

Why

This feature allows prebuild instance capacity to adapt to predictable usage patterns, such as:

  • Scaling up during business hours or high-demand periods
  • Reducing capacity during off-hours to save resources

Cron specification

The cron specification is interpreted as a continuous time range.

For example, the expression:

* 9-18 * * 1-5

is intended to represent a continuous range from 09:00 to 18:59, Monday through Friday.

However, due to minor implementation imprecision, it is currently interpreted as a range from 08:59:00 to 18:58:59, Monday through Friday.

This slight discrepancy arises because the evaluation is based on whether a specific point in time falls within the range, using the github.com/coder/coder/v2/coderd/schedule/cron library, which performs per-minute matching rather than strict range evaluation.

@evgeniy-scherbina evgeniy-scherbina force-pushed the yevhenii/prebuilds-autoscaling-mechanism branch from 606894f to ff0e813 Compare May 30, 2025 12:25
@evgeniy-scherbina evgeniy-scherbina changed the title Implement autoscaling mechanism for prebuilds feat: implement autoscaling mechanism for prebuilds May 30, 2025
@evgeniy-scherbina evgeniy-scherbina force-pushed the yevhenii/prebuilds-autoscaling-mechanism branch 5 times, most recently from 9af5e02 to bcfbb04 Compare June 6, 2025 19:53
@evgeniy-scherbina evgeniy-scherbina force-pushed the yevhenii/prebuilds-autoscaling-mechanism branch 4 times, most recently from 3a25178 to e0d1de7 Compare June 11, 2025 19:05
})
require.NoError(t, err, "insert preset")
return preset
}

func PresetPrebuildSchedule(t testing.TB, db database.Store, seed database.InsertPresetPrebuildScheduleParams) database.TemplateVersionPresetPrebuildSchedule {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically dbgen help funcs define default values when seed is a zero value. Can you add those please?

CronExpression: seed.CronExpression,
Instances: seed.Instances,
})
require.NoError(t, err, "insert preset")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
require.NoError(t, err, "insert preset")
require.NoError(t, err, "insert preset prebuild schedule")

@@ -0,0 +1,12 @@
-- Add autoscaling_timezone column to template_version_presets table
ALTER TABLE template_version_presets
ADD COLUMN autoscaling_timezone TEXT DEFAULT 'UTC' NOT NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a default here, if the provider is not defining a default?

id UUID PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
preset_id UUID NOT NULL,
cron_expression TEXT NOT NULL,
instances INTEGER NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: to match template_version_presets' definition.

Suggested change
instances INTEGER NOT NULL,
desired_instances INTEGER NOT NULL,

@@ -69,3 +83,6 @@ SELECT tvp.*, tv.template_id, tv.organization_id FROM
template_version_presets tvp
INNER JOIN template_versions tv ON tvp.template_version_id = tv.id
WHERE tvp.id = @preset_id;

-- name: GetPresetPrebuildSchedules :many
SELECT * FROM template_version_preset_prebuild_schedules;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to constrain this to schedules that are only associated to active template versions?

}

// Look for a schedule whose cron expression matches the provided time
for _, schedule := range p.PrebuildSchedules {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my main reason for blocking this review. We have to deal with conflicting schedules here, explicitly.

If we take this current approach, and conflicting schedules are evaluated in different orders on each reconciliation loop, then that will cause a lot of churn by spinning up and down prebuilds if those schedules' instance counts differ.

Comment on lines 1069 to 1070
spec: "* 9-18 * * 1-5",
at: mustParseTime(t, time.RFC1123, "Mon, 02 Jun 2025 8:59:00 UTC"),
Copy link
Contributor

@dannykopping dannykopping Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would violate my expectations as an operator. If the autoscaling schedule is going to kick in 1m before my given cron expression, that would be quite surprising and unintuitive.

I think we should try address this shortcoming if we can.
Edit: I looked into the implementation and made some suggestions.

This test could also be more explicit about why 8:59:00 matches 9-18.

Comment on lines +533 to +534

//replace github.com/coder/terraform-provider-coder/v2 => /home/coder/terraform-provider-coder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//replace github.com/coder/terraform-provider-coder/v2 => /home/coder/terraform-provider-coder

@@ -101,7 +101,8 @@ require (
github.com/coder/quartz v0.2.1
github.com/coder/retry v1.5.1
github.com/coder/serpent v0.10.0
github.com/coder/terraform-provider-coder/v2 v2.5.3
// TODO: release new provider version and set it here
github.com/coder/terraform-provider-coder/v2 v2.5.4-0.20250611180058-d61894db8492
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember TODO.

@spikecurtis
Copy link
Contributor

The "continuous time" interpretation of the crontab makes it, I guess, relatively simple to specify spans that start and end on the hour, but consider attempting to program a span like 7:45am - 9:10am. You'd need 3 spans

45-59 7 * * 1-5
* 8 * * 1-5
0-10 9 * * 1-5

Are we just kind of assuming that operators likely only want hourly precision?

@@ -0,0 +1,12 @@
-- Add autoscaling_timezone column to template_version_presets table
ALTER TABLE template_version_presets
ADD COLUMN autoscaling_timezone TEXT DEFAULT 'UTC' NOT NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to co-locate the timezone with the cron expression in the template_version_preset_prebuild_schedules table?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking, but validating the overlap of two cron expressions is complex, and is part of the reason that workspaces have an autostart schedule and a TTL instead of a second autostop schedule.

An alternative option here would be to have a schedule for starting a given number of prebuild instances and a time.Duration for which they should be active.

Comment on lines +82 to +95
message Schedule {
string cron = 1;
int32 instances = 2;
}

message Autoscaling {
string timezone = 1;
repeated Schedule schedule = 2;
}

message Prebuild {
int32 instances = 1;
ExpirationPolicy expiration_policy = 2;
int32 instances = 1;
ExpirationPolicy expiration_policy = 2;
Autoscaling autoscaling = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#18268 already incremented the minor version in provisionerd/proto/version.go so you don't need to do this again. But do please update provisionerd/proto/version.go with a comment indicating the additional fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coderd: implement autoscaling mechanism
4 participants