-
Notifications
You must be signed in to change notification settings - Fork 892
fix: make handleManifest always signal dependents #13141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @spikecurtis and the rest of your teammates on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the checkpoint implementation, just some minor stuff but otherwise LGTM (no need to re-review unless you want.)
|
||
// complete the checkpoint. Pass nil to indicate the checkpoint was ok. | ||
func (c *checkpoint) complete(err error) { | ||
c.err = err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using sync once here to simplify usage an allowing multiple calls to complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That introduces the possibility of multiple calls racing success vs failure. Better to ensure that there is only one call to complete()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way would it be racy? Sync once is atomic AFAIK and first come-first served, the other remains blocked waiting for the first to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps another way to put it: I think it would be preferable for this to be safe to use incorrectly and document how it should be used, vs incorrect use (or carelessness) resulting in a runtime panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's racy in the sense that if you have two callers, one saying "success" and one saying "failure", then they can race each other to determine which gets reported to things waiting on the checkpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand your concern but I have a hard time understanding how it's relevant here. If someone introduces that race right now the program will panic instead. So it needs to be avoided in either case, i.e. a non-issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable for this to be safe to use incorrectly and document how it should be used, vs incorrect use (or carelessness) resulting in a runtime panic.
I think that kind of defensive programming is a disservice in this case. Silently absorbing incorrect/careless use in a non-deterministic way is a source of frustrating and hard to diagnose bugs.
There is a genuine case to be made for a central service like Coderd needing to avoid panicking because of programming bugs, but here in the Agent, I think it's preferable to panic, print a stack trace, and exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely see the benefit of that approach as well, but I do feel we should have a high tolerance towards introducing sharp edges that can result in decreased developer flow. If we feel this is important enough to panic, wouldn’t another approach be to detect a second call and log it as an error + stack trace?
We don’t know how well the logs from a panic will be persisted either, the user may attempt to rebuild their workspace and ultimately we caused an inconvenience and are never the wiser.
92f0f66
to
094ce97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the critical log handling, LGTM!
2a73bb4
to
49cee21
Compare
094ce97
to
4e4c469
Compare
4e4c469
to
4ad81b0
Compare
Merge activity
|
Fixes #13139
Using a bare channel to signal dependent goroutines means that we can only signal success, not failure, which leads to deadlock if we fail in a way that doesn't cause the whole
apiConnRoutineManager
to tear down routines.Instead, we use a new object called a
checkpoint
that signals success or failure, so that dependent routines get unblocked if the routine they depend on fails.