Skip to content

Automatically retry failed workspace builds #14031

@stirby

Description

@stirby

Background

Frequently, customers' workspace builds will fail from flakey issues. This costs significant developer time, as most users expect to open the dashboard to an already-started workspace (triggered by autostart). Thus, they gain no benefits from the scheduling tool and must wait while the workspace builds again, sometimes completing after multiple attempts.

In many cases, workspace builds can take from 10-30 minutes. This issue can easily cost hours of developer time each week, given that almost all templates depend on external services that cannot guarantee 100% uptime.

From the developer's perspective, there's an idle period when the control plane could be attempting to restart the workspace and let the transient issue naturally resolve. Admins and users could later audit the failed builds via asynchronous Notifications.

Many customers have requested an automated retry system to reduce developer time lost to this issue.

The problem is most impactful on failed autostart, but we should consider automated retries for all builds if some templates take >= 1 hour. A developer may kick off the start job and expect to come back to a ready environment. We should allow template admins to minimize the idle time by not restricting auto-retries to automatic builds. This also could help with build cleanup.

Proposal

We add a "Retry Failed Builds" option for templates that can be triggered on all builds, or only automated starts. This would be configured (opt-in) in template settings by setting a maximum number of retries to attempt.

When toggled-on, the control plane will automatically attempt the start build upon failure N times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    customer-requestedFeatures requested by enterprise customers. Only humans may set this.roadmaphttps://coder.com/roadmap. Only humans may set this.roadmap-maybeIdeas we're considering! Only humans may set this.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions