Skip to content

"Graceful" shutdown with SIGTERM appears to interrupt Teraform provider #14433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aaronlehmann opened this issue Aug 26, 2024 · 2 comments · Fixed by #14466
Closed

"Graceful" shutdown with SIGTERM appears to interrupt Teraform provider #14433

aaronlehmann opened this issue Aug 26, 2024 · 2 comments · Fixed by #14466
Labels
must-do Issues that must be completed by the end of the Sprint. Or else. Only humans may set this.

Comments

@aaronlehmann
Copy link
Contributor

aaronlehmann commented Aug 26, 2024

Sending SIGTERM to the coder server is supposed to trigger a graceful shutdown that drains build jobs before exiting. However, it seems like when a build job is running at the time SIGTERM is received, the job gets interrupted anyway:

Stop caught, waiting for provisioner jobs to complete and gracefully exiting. Use ctrl+\ to force quitShutting down API server...

2024-08-23 15:12:17.146 [info]  provisionerd-40d0ef3f-5f61-40ea-838a-45d20073363d-3.runner: workspace provisioner job logged  job_id=4b457a13-609f-413b-bf61-fd29bf86bebd  template_name=workspace-v1  template_version=zealous_borg5  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151  workspace_id=d8b32732-8313-47a1-b12e-61a5be6ea289  workspace_name=[redacted]  workspace_owner=[redacted]  workspace_transition=start  level=INFO  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151 ...
    output= Interrupt received.
            Please wait for Terraform to exit or data loss may occur.
            Gracefully shutting down...
2024-08-23 15:12:17.146 [info]  provisionerd-40d0ef3f-5f61-40ea-838a-45d20073363d-3.runner: workspace provisioner job logged  job_id=4b457a13-609f-413b-bf61-fd29bf86bebd  template_name=workspace-v1  template_version=zealous_borg5  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151  workspace_id=d8b32732-8313-47a1-b12e-61a5be6ea289  workspace_name=[redacted]  workspace_owner=[redacted]  workspace_transition=start  level=INFO  output="Stopping operation..."  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151
2024-08-23 15:12:17.146 [info]  provisionerd-40d0ef3f-5f61-40ea-838a-45d20073363d-3.runner: workspace provisioner job logged  job_id=4b457a13-609f-413b-bf61-fd29bf86bebd  template_name=workspace-v1  template_version=zealous_borg5  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151  workspace_id=d8b32732-8313-47a1-b12e-61a5be6ea289  workspace_name=[redacted]  workspace_owner=[redacted]  workspace_transition=start  level=INFO  output="netflix_ec2.dev: Modifications errored after 24s"  workspace_build_id=60a52a9c-e60b-4a0a-85f8-7eb3a1775151

This was a result of configuring systemd to send the coder server SIGTERM and wait 10 minutes before following up with a kill signal. Howver, the interrupt and "Stopping operation..." log message appears to be immediate. The provider log also showed that its operation was cancelled partway through.

KillSignal=SIGTERM
SendSIGKILL=yes
TimeoutStopSec=10min

This is a high priority issue for us as it limits our ability to safely deploy updates.

@coder-labeler coder-labeler bot added bug must-do Issues that must be completed by the end of the Sprint. Or else. Only humans may set this. labels Aug 26, 2024
@ethanndickson
Copy link
Member

ethanndickson commented Aug 28, 2024

Thanks for reporting!

This should be fixed in #14466, but only if you're running external provisioners started using coder provisionerd start (or via helm), and then sending a sigterm to that process. If that's not the case, please re-open.

@aaronlehmann
Copy link
Contributor Author

I dug into this a bit more and it appears the problem was a missing

KillMode=mixed

in my systemd unit, so systemd was sending SIGTERM to all processes in the control group, including the terraform processes.

However, I'm glad this turned up an unrelated issue you were able to fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
must-do Issues that must be completed by the end of the Sprint. Or else. Only humans may set this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
@aaronlehmann @ethanndickson and others