Skip to content

feat: implement scheduling mechanism for prebuilds #18126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
Jun 19, 2025

Conversation

evgeniy-scherbina
Copy link
Contributor

@evgeniy-scherbina evgeniy-scherbina commented May 30, 2025

Closes coder/internal#312
Depends on coder/terraform-provider-coder#408

This PR adds support for defining an autoscaling block for prebuilds, allowing number of desired instances to scale dynamically based on a schedule.

Example usage:

data "coder_workspace_preset" "us-nix" {
  ...
  
  prebuilds = {
    instances = 0                  # default to 0 instances
    
    scheduling = {
      timezone = "UTC"             # a single timezone is used for simplicity
      
      # Scale to 3 instances during the work week
      schedule {
        cron = "* 8-18 * * 1-5"    # from 8AM–6:59PM, Mon–Fri, UTC
        instances = 3              # scale to 3 instances
      }
      
      # Scale to 1 instance on Saturdays for urgent support queries
      schedule {
        cron = "* 8-14 * * 6"      # from 8AM–2:59PM, Sat, UTC
        instances = 1              # scale to 1 instance
      }
    }
  }
}

Behavior

  • Multiple schedule blocks per prebuilds block are supported.
  • If the current time matches any defined autoscaling schedule, the corresponding number of instances is used.
  • If no schedule matches, the default instance count (prebuilds.instances) is used as a fallback.

Why

This feature allows prebuild instance capacity to adapt to predictable usage patterns, such as:

  • Scaling up during business hours or high-demand periods
  • Reducing capacity during off-hours to save resources

Cron specification

The cron specification is interpreted as a continuous time range.

For example, the expression:

* 9-18 * * 1-5

is intended to represent a continuous range from 09:00 to 18:59, Monday through Friday.

However, due to minor implementation imprecision, it is currently interpreted as a range from 08:59:00 to 18:58:59, Monday through Friday.

This slight discrepancy arises because the evaluation is based on whether a specific point in time falls within the range, using the github.com/coder/coder/v2/coderd/schedule/cron library, which performs per-minute matching rather than strict range evaluation.

@evgeniy-scherbina evgeniy-scherbina force-pushed the yevhenii/prebuilds-autoscaling-mechanism branch from 606894f to ff0e813 Compare May 30, 2025 12:25
@evgeniy-scherbina evgeniy-scherbina changed the title Implement autoscaling mechanism for prebuilds feat: implement autoscaling mechanism for prebuilds May 30, 2025
@evgeniy-scherbina evgeniy-scherbina force-pushed the yevhenii/prebuilds-autoscaling-mechanism branch 5 times, most recently from 9af5e02 to bcfbb04 Compare June 6, 2025 19:53
@evgeniy-scherbina evgeniy-scherbina force-pushed the yevhenii/prebuilds-autoscaling-mechanism branch 4 times, most recently from 3a25178 to e0d1de7 Compare June 11, 2025 19:05
@@ -0,0 +1,12 @@
-- Add autoscaling_timezone column to template_version_presets table
ALTER TABLE template_version_presets
ADD COLUMN autoscaling_timezone TEXT DEFAULT 'UTC' NOT NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a default here, if the provider is not defining a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not null, so it requires default. You're suggesting DEFAULT ''?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was defensive mechanism to make sure we don't fail here on time.LoadLocation(p.Preset.SchedulingTimezone):

func (p PresetSnapshot) CalculateDesiredInstances(at time.Time) int32 {
	if len(p.PrebuildSchedules) == 0 {
		// If no schedules are defined, fall back to the default desired instance count
		return p.Preset.DesiredInstances.Int32
	}

	// Validate that the provided timezone is valid
	_, err := time.LoadLocation(p.Preset.SchedulingTimezone)
	if err != nil {
		p.logger.Error(context.Background(), "invalid timezone in prebuild scheduling configuration",
			slog.F("preset_id", p.Preset.ID),
			slog.F("timezone", p.Preset.SchedulingTimezone),
			slog.Error(err))

		// If timezone is invalid, fall back to the default desired instance count
		return p.Preset.DesiredInstances.Int32
	}
}

It shouldn't happen, because if len(p.PrebuildSchedules) > 0 - SchedulingTimezone should be set and valid according to validation rules in tf-provider-coder. Even if this happens we should return default - so we should be safe.

But I added DEFAULT 'UTC' as another layer of defense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final decision: changed to '' instead of UTC

@spikecurtis
Copy link
Contributor

The "continuous time" interpretation of the crontab makes it, I guess, relatively simple to specify spans that start and end on the hour, but consider attempting to program a span like 7:45am - 9:10am. You'd need 3 spans

45-59 7 * * 1-5
* 8 * * 1-5
0-10 9 * * 1-5

Are we just kind of assuming that operators likely only want hourly precision?

@evgeniy-scherbina
Copy link
Contributor Author

The "continuous time" interpretation of the crontab makes it, I guess, relatively simple to specify spans that start and end on the hour, but consider attempting to program a span like 7:45am - 9:10am. You'd need 3 spans

45-59 7 * * 1-5 * 8 * * 1-5 0-10 9 * * 1-5

Are we just kind of assuming that operators likely only want hourly precision?

@spikecurtis

Minutes must always be * according to our spec and validation rules.

If we allow specific minute ranges (e.g., 0-30 8-9 * * *) it will be interpreted as multiple disjoint intervals (e.g., 08:00–08:30 and 09:00–09:30), which is unintuitive and likely unexpected for operators.

Therefore, the minimum supported granularity is one hour.

There was a long discussion, and majority voted for this approach. Alternative approaches are described here: https://www.notion.so/coderhq/Implement-autoscaling-mechanism-201d579be5928054837dceb8358bfed3?source=copy_link#207d579be5928059b5ded1778841fba2

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work @evgeniy-scherbina!
I suspect coder/terraform-provider-coder#408 might prompt a naming change, but I'm happy to approve as-is.

RunningPrebuilds []database.GetRunningPrebuiltWorkspacesRow
PrebuildsInProgress []database.CountInProgressPrebuildsRow
Backoffs []database.GetPresetsBackoffRow
HardLimitedPresetsMap map[uuid.UUID]database.GetPresetsAtFailureLimitRow
clock quartz.Clock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind either approach, but let's pick one 👍

@evgeniy-scherbina evgeniy-scherbina changed the title feat: implement autoscaling mechanism for prebuilds feat: implement scheduling mechanism for prebuilds Jun 18, 2025
@evgeniy-scherbina evgeniy-scherbina merged commit 0f6ca55 into main Jun 19, 2025
35 of 37 checks passed
@evgeniy-scherbina evgeniy-scherbina deleted the yevhenii/prebuilds-autoscaling-mechanism branch June 19, 2025 15:08
@github-actions github-actions bot locked and limited conversation to collaborators Jun 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coderd: implement autoscaling mechanism
4 participants