-
Notifications
You must be signed in to change notification settings - Fork 887
Investigate possible solutions for "Text file busy" errors during builds #14726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Terraform plugins use https://github.com/hashicorp/go-plugin to communicate with Terraform. Plugins themselves are gRPC services that listen on a Unix domain socket in a temp directory (or on localhost on Windows). Terraform is the "client" of the gRPC service, but manages the lifecycle of the "service".
At the end of the day, Terraform is supposed to call a special API to tell the provider to shut down, and/or forcibly kill the child process. However, it seems that sometimes that's not happening, and the provider process can linger, just listening on its domain socket. |
Some ideas: If we get "text file busy" error:
|
We could fix Terraform and/or OpenTofu such that they don't reinstall the provider binary if it already exists (possibly including hashing contents). This would sidestep the issue, since we don't write to the file. However, the underlying issue of leaking provider processes would remain. |
I'll also check whether Terraform is setting Pdeathsignal in https://pkg.go.dev/syscall#SysProcAttr UPDATE: Terraform doesn't set any special |
I don't think I have the full story yet, but I've just confirmed that if you specify a
https://developer.hashicorp.com/terraform/cli/config/config-file#filesystem_mirror Pretty sure at least one customer who is seeing this issue is using a |
One of the customers seeing this is using the zipped layout for their filesystem mirror. That explains why the provider gets opened for writing (unzipping the provider package), but doesn't explain why the provider is still executing, since the symlink stuff I mentioned only applies to unpacked layout. I've confirmed that the output of I'm following up with the other customers who have seen the issue, so see what kind of layout they are using. I'm back to thinking that our provider must be still running from a previous build on the same provisionerd. Not sure how this could happen. |
Another customer has said they aren't using a |
I also get this error:
this is the tf file: and here are the coderd provisioner logs (busy error): |
Well, I guess that means coder/terraform-provider-coder#290 didn't help, since that was released in v1.0.3 |
@datapedd what was the workspace ID of the build that failed with "text file busy"? Is there any chance you could exec into the coderd pod and see if |
In coderd pod there is the provisioner running (ps aux). Cant create a dump as read only it say. After I killed the process for the provisioner it worked again. |
@datapedd what was the exact time the build that failed started? You should be able to see this by clicking the builds icon on the left: ![]() Then, click the failed build. |
re: #14726 If we see "text file busy" in the errors while initializing terraform, attempt to query the pprof endpoint set up by coder/terraform-provider-coder#295 and log at CRITICAL. --------- Signed-off-by: Spike Curtis <spike@coder.com>
re: #14726 If we see "text file busy" in the errors while initializing terraform, attempt to query the pprof endpoint set up by coder/terraform-provider-coder#295 and log at CRITICAL. --------- Signed-off-by: Spike Curtis <spike@coder.com> (cherry picked from commit d676ad5)
re: #14726 If we see "text file busy" in the errors while initializing terraform, attempt to query the pprof endpoint set up by coder/terraform-provider-coder#295 and log at CRITICAL. --------- Signed-off-by: Spike Curtis <spike@coder.com> (cherry picked from commit d676ad5) Co-authored-by: Spike Curtis <spike@coder.com>
This is unfortunately an issue with Terraform (and largely just filesystems in general). Some background:
text file busy
can occur in Linux when a process tries to write or modify an executable that's currently running. In TF's case this typically happens when multiple processes try to write and use plugins from the same cache directory.Regardless we should do our best to circumvent this when we detect this during a build. It's unknown to me how long the process lingers but it could potentially be indefinitely which would render that particular runner dead for subsequent runs, requiring the end user to rebuild a number of times before getting a successful build which is not acceptable.
e.g.
The text was updated successfully, but these errors were encountered: