-
Notifications
You must be signed in to change notification settings - Fork 881
fix: Allow terraform provisions to be gracefully cancelled #3526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change allows terraform commands to be gracefully cancelled on Unix-like platforms by signaling interrupt on provision cancellation. One implementation detail to note is that we do not necessarily kill a running terraform command immediately even if the stream is closed. The reason for this is to allow for graceful cancellation even in such an event. Currently the timeout is set to one minute, which was chosen arbitrarily. Also note that the `force` flag was added to `provisioner.proto` and is handled in the `terraform` package, however, it is not used by the `provisionerd/runner`. The reason is that the `runner` would need to be refactored. Currently the "force stop" context is used as a base for streams, and this would mean that in a force stop event, the stream would close and we can't send out our force stop cancellation message. Related: #2683 The above issue may be partially of fully fixed by this change.
Should we remove force? If it's unused, it seems like there's little harm. Could we add some tests for cancelation too? I'm not entirely sure how we would but just asking. |
From proto? I'm open to do both approaches (keep it or remove it). This is how it'd be used: diff --git provisionerd/runner/runner.go provisionerd/runner/runner.go
index f26808c0..d9f68583 100644
--- provisionerd/runner/runner.go
+++ provisionerd/runner/runner.go
@@ -619,6 +619,13 @@ func (r *Runner) runTemplateImportProvision(values []*sdkproto.ParameterValue, m
go func() {
select {
case <-r.notStopped.Done():
+ _ = stream.Send(&sdkproto.Provision_Request{
+ Type: &sdkproto.Provision_Request_Cancel{
+ Cancel: &sdkproto.Provision_Cancel{
+ Force: true,
+ },
+ },
+ })
return
case <-r.notCanceled.Done():
_ = stream.Send(&sdkproto.Provision_Request{
@@ -773,6 +780,13 @@ func (r *Runner) runWorkspaceBuild() (
go func() {
select {
case <-r.notStopped.Done():
+ _ = stream.Send(&sdkproto.Provision_Request{
+ Type: &sdkproto.Provision_Request_Cancel{
+ Cancel: &sdkproto.Provision_Cancel{
+ Force: true,
+ },
+ },
+ })
return
case <-r.notCanceled.Done():
_ = stream.Send(&sdkproto.Provision_Request{ But like I mentioned in the PR description, needs a bit of refactor in the
It's a bit tricky, but I can try to add some. Essentially we'd need to fake a terraform binary that we can tweak the behavior for in our test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm down to remove Force
in entirety if we don't see a great use-case for now!
The only use-case currently, as I see it, is to ensure |
return stream.Send(&proto.Provision_Response{ | ||
Type: &proto.Provision_Response_Complete{ | ||
Complete: &proto.Provision_Complete{ | ||
Error: err.Error(), | ||
}, | ||
}, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kylecarbs yay or nay on this change? I'm thinking it'd be OK to return provision complete on cancellation, similar to plan
and apply
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with it!
Cancellation tests have been added for |
This change allows terraform commands to be gracefully cancelled on
Unix-like platforms by signaling interrupt on provision cancellation.
One implementation detail to note is that we do not necessarily kill a
running terraform command immediately even if the stream is closed. The
reason for this is to allow for graceful cancellation even in such an
event. Currently the timeout is set to 5 minutes by default.
Related: #2683
The above issue may be partially or fully fixed by this change.