Skip to content

fix: use raw syscalls to write binary we execute #11684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 18, 2024

Conversation

spikecurtis
Copy link
Contributor

Fixes flake seen here, I think

https://github.com/coder/coder/actions/runs/7565915337/job/20602500818

golang's file processing is complex, and in at least some cases it can return from a file.Close() call without having actually closed the file descriptor.

If we're holding open the file descriptor of an executable we just wrote, and try to execute it, it will fail with "text file busy" which is what we have seen.

So, to be extra sure, I've avoided the standard library and directly called the syscalls to open, write, and close the file we intend to use in the test.

I've also added some more logging so if it's some issue of multiple tests writing to the same location, the we might have a chance to see it.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @spikecurtis and the rest of your teammates on Graphite Graphite

Comment on lines +170 to +172
// golang's standard OS library can sometimes leave the file descriptor open even after
// "Closing" the file (which can then lead to a "text file busy" error, so we bypass this
// and use syscall directly).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a known / documented issue in the stdlib?
Would be nice to remove this workaround if it gets fixed in a later version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I think what's supposed to happen is that it only leaves the fd open if there is some pending I/O, like on another goroutine, which wouldn't apply here. But, behavior is different on different OSes, so it's hard to be sure, and I'm at a loss for another explanation of "text file busy".

I figured since we don't need any non-blocking IO or polling, it would be better to just simplify the syscalls we make, and see if we still see the flake.

@spikecurtis spikecurtis merged commit 1f0e6ba into main Jan 18, 2024
@spikecurtis spikecurtis deleted the spike/flake-test-provision-cancel branch January 18, 2024 12:21
Copy link
Contributor Author

Merge activity

@github-actions github-actions bot locked and limited conversation to collaborators Jan 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants