-
Notifications
You must be signed in to change notification settings - Fork 892
fix: make handleManifest always signal dependents #13141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
package agent | ||
|
||
import ( | ||
"context" | ||
"runtime" | ||
"sync" | ||
|
||
"cdr.dev/slog" | ||
) | ||
|
||
// checkpoint allows a goroutine to communicate when it is OK to proceed beyond some async condition | ||
// to other dependent goroutines. | ||
type checkpoint struct { | ||
logger slog.Logger | ||
mu sync.Mutex | ||
called bool | ||
done chan struct{} | ||
err error | ||
} | ||
|
||
// complete the checkpoint. Pass nil to indicate the checkpoint was ok. It is an error to call this | ||
// more than once. | ||
func (c *checkpoint) complete(err error) { | ||
c.mu.Lock() | ||
defer c.mu.Unlock() | ||
if c.called { | ||
b := make([]byte, 2048) | ||
n := runtime.Stack(b, false) | ||
c.logger.Critical(context.Background(), "checkpoint complete called more than once", slog.F("stacktrace", b[:n])) | ||
return | ||
} | ||
c.called = true | ||
c.err = err | ||
close(c.done) | ||
} | ||
|
||
func (c *checkpoint) wait(ctx context.Context) error { | ||
select { | ||
case <-ctx.Done(): | ||
return ctx.Err() | ||
case <-c.done: | ||
return c.err | ||
spikecurtis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
} | ||
|
||
func newCheckpoint(logger slog.Logger) *checkpoint { | ||
return &checkpoint{ | ||
logger: logger, | ||
done: make(chan struct{}), | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
package agent | ||
|
||
import ( | ||
"testing" | ||
|
||
"github.com/stretchr/testify/require" | ||
"golang.org/x/xerrors" | ||
|
||
"cdr.dev/slog/sloggers/slogtest" | ||
"github.com/coder/coder/v2/testutil" | ||
) | ||
|
||
func TestCheckpoint_CompleteWait(t *testing.T) { | ||
t.Parallel() | ||
logger := slogtest.Make(t, nil) | ||
ctx := testutil.Context(t, testutil.WaitShort) | ||
uut := newCheckpoint(logger) | ||
err := xerrors.New("test") | ||
uut.complete(err) | ||
got := uut.wait(ctx) | ||
require.Equal(t, err, got) | ||
} | ||
|
||
func TestCheckpoint_CompleteTwice(t *testing.T) { | ||
t.Parallel() | ||
logger := slogtest.Make(t, &slogtest.Options{IgnoreErrors: true}) | ||
ctx := testutil.Context(t, testutil.WaitShort) | ||
uut := newCheckpoint(logger) | ||
err := xerrors.New("test") | ||
uut.complete(err) | ||
uut.complete(nil) // drops CRITICAL log | ||
got := uut.wait(ctx) | ||
require.Equal(t, err, got) | ||
} | ||
|
||
func TestCheckpoint_WaitComplete(t *testing.T) { | ||
t.Parallel() | ||
logger := slogtest.Make(t, nil) | ||
ctx := testutil.Context(t, testutil.WaitShort) | ||
uut := newCheckpoint(logger) | ||
err := xerrors.New("test") | ||
errCh := make(chan error, 1) | ||
go func() { | ||
errCh <- uut.wait(ctx) | ||
}() | ||
uut.complete(err) | ||
got := testutil.RequireRecvCtx(ctx, t, errCh) | ||
require.Equal(t, err, got) | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using sync once here to simplify usage an allowing multiple calls to complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That introduces the possibility of multiple calls racing success vs failure. Better to ensure that there is only one call to
complete()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way would it be racy? Sync once is atomic AFAIK and first come-first served, the other remains blocked waiting for the first to complete.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps another way to put it: I think it would be preferable for this to be safe to use incorrectly and document how it should be used, vs incorrect use (or carelessness) resulting in a runtime panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's racy in the sense that if you have two callers, one saying "success" and one saying "failure", then they can race each other to determine which gets reported to things waiting on the checkpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand your concern but I have a hard time understanding how it's relevant here. If someone introduces that race right now the program will panic instead. So it needs to be avoided in either case, i.e. a non-issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that kind of defensive programming is a disservice in this case. Silently absorbing incorrect/careless use in a non-deterministic way is a source of frustrating and hard to diagnose bugs.
There is a genuine case to be made for a central service like Coderd needing to avoid panicking because of programming bugs, but here in the Agent, I think it's preferable to panic, print a stack trace, and exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely see the benefit of that approach as well, but I do feel we should have a high tolerance towards introducing sharp edges that can result in decreased developer flow. If we feel this is important enough to panic, wouldn’t another approach be to detect a second call and log it as an error + stack trace?
We don’t know how well the logs from a panic will be persisted either, the user may attempt to rebuild their workspace and ultimately we caused an inconvenience and are never the wiser.