fix: make handleManifest always signal dependents #13141

spikecurtis · 2024-05-03T09:36:12Z

Using a bare channel to signal dependent goroutines means that we can only signal success, not failure, which leads to deadlock if we fail in a way that doesn't cause the whole apiConnRoutineManager to tear down routines.

Instead, we use a new object called a checkpoint that signals success or failure, so that dependent routines get unblocked if the routine they depend on fails.

spikecurtis · 2024-05-03T09:36:22Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @spikecurtis and the rest of your teammates on Graphite

mafredri

I like the checkpoint implementation, just some minor stuff but otherwise LGTM (no need to re-review unless you want.)

agent/agent.go

agent/checkpoint.go

agent/agent.go

mafredri · 2024-05-03T09:54:58Z

agent/checkpoint.go

+
+// complete the checkpoint.  Pass nil to indicate the checkpoint was ok.
+func (c *checkpoint) complete(err error) {
+	c.err = err


How about using sync once here to simplify usage an allowing multiple calls to complete?

That introduces the possibility of multiple calls racing success vs failure. Better to ensure that there is only one call to complete().

In what way would it be racy? Sync once is atomic AFAIK and first come-first served, the other remains blocked waiting for the first to complete.

Perhaps another way to put it: I think it would be preferable for this to be safe to use incorrectly and document how it should be used, vs incorrect use (or carelessness) resulting in a runtime panic.

It's racy in the sense that if you have two callers, one saying "success" and one saying "failure", then they can race each other to determine which gets reported to things waiting on the checkpoint.

I think I understand your concern but I have a hard time understanding how it's relevant here. If someone introduces that race right now the program will panic instead. So it needs to be avoided in either case, i.e. a non-issue.

I think it would be preferable for this to be safe to use incorrectly and document how it should be used, vs incorrect use (or carelessness) resulting in a runtime panic.

I think that kind of defensive programming is a disservice in this case. Silently absorbing incorrect/careless use in a non-deterministic way is a source of frustrating and hard to diagnose bugs.

There is a genuine case to be made for a central service like Coderd needing to avoid panicking because of programming bugs, but here in the Agent, I think it's preferable to panic, print a stack trace, and exit.

I can definitely see the benefit of that approach as well, but I do feel we should have a high tolerance towards introducing sharp edges that can result in decreased developer flow. If we feel this is important enough to panic, wouldn’t another approach be to detect a second call and log it as an error + stack trace?

We don’t know how well the logs from a panic will be persisted either, the user may attempt to rebuild their workspace and ultimately we caused an inconvenience and are never the wiser.

mafredri

Thanks for adding the critical log handling, LGTM!

spikecurtis · 2024-05-06T10:34:28Z

Merge activity

May 6, 6:34 AM EDT: Graphite rebased this pull request as part of a merge.
May 6, 6:47 AM EDT: @spikecurtis merged this pull request with Graphite.

github-actions bot assigned spikecurtis May 3, 2024

This was referenced May 3, 2024

fix: use a native websocket.NetConn for agent RPC client #13142

Merged

chore: remove superfluous context.Canceled handling #13140

Merged

spikecurtis requested a review from mafredri May 3, 2024 09:40

spikecurtis marked this pull request as ready for review May 3, 2024 09:40

mafredri approved these changes May 3, 2024

View reviewed changes

spikecurtis force-pushed the spike/13139-handleOK-promise branch 2 times, most recently from 92f0f66 to 094ce97 Compare May 6, 2024 05:19

mafredri approved these changes May 6, 2024

View reviewed changes

spikecurtis force-pushed the spike/13139-superfluous-ctx branch from 2a73bb4 to 49cee21 Compare May 6, 2024 10:22

spikecurtis force-pushed the spike/13139-handleOK-promise branch from 094ce97 to 4e4c469 Compare May 6, 2024 10:22

Base automatically changed from spike/13139-superfluous-ctx to main May 6, 2024 10:33

fix: make handleManifest always signal dependents

4ad81b0

spikecurtis force-pushed the spike/13139-handleOK-promise branch from 4e4c469 to 4ad81b0 Compare May 6, 2024 10:33

spikecurtis merged commit d51c691 into main May 6, 2024
25 checks passed

spikecurtis deleted the spike/13139-handleOK-promise branch May 6, 2024 10:47

github-actions bot locked and limited conversation to collaborators May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: make handleManifest always signal dependents #13141

fix: make handleManifest always signal dependents #13141

Uh oh!

spikecurtis commented May 3, 2024 •

edited

Loading

Uh oh!

spikecurtis commented May 3, 2024 •

edited

Loading

Uh oh!

mafredri left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri May 3, 2024

Uh oh!

spikecurtis May 3, 2024

Uh oh!

mafredri May 3, 2024

Uh oh!

mafredri May 3, 2024 •

edited

Loading

Uh oh!

spikecurtis May 3, 2024

Uh oh!

mafredri May 3, 2024

Uh oh!

spikecurtis May 3, 2024

Uh oh!

mafredri May 3, 2024

Uh oh!

mafredri left a comment

Uh oh!

spikecurtis commented May 6, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

fix: make handleManifest always signal dependents #13141

fix: make handleManifest always signal dependents #13141

Uh oh!

Conversation

spikecurtis commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spikecurtis commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri May 3, 2024

Choose a reason for hiding this comment

Uh oh!

spikecurtis May 3, 2024

Choose a reason for hiding this comment

Uh oh!

mafredri May 3, 2024

Choose a reason for hiding this comment

Uh oh!

mafredri May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spikecurtis May 3, 2024

Choose a reason for hiding this comment

Uh oh!

mafredri May 3, 2024

Choose a reason for hiding this comment

Uh oh!

spikecurtis May 3, 2024

Choose a reason for hiding this comment

Uh oh!

mafredri May 3, 2024

Choose a reason for hiding this comment

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

spikecurtis commented May 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Uh oh!

spikecurtis commented May 3, 2024 •

edited

Loading

spikecurtis commented May 3, 2024 •

edited

Loading

mafredri May 3, 2024 •

edited

Loading

spikecurtis commented May 6, 2024 •

edited

Loading