fix: Avoid panic in `ServerSentEventSender` by keeping handler alive #4821

mafredri · 2022-11-01T10:55:00Z

The goroutine launched by ServerSentEventSender can perform a write
and flush after the calling http handler has exited, at this point the
resources (e.g. http.ResponseWriter) are no longer safe to use.

To work around this issue, heartbeats and sending events are now handled
by the goroutine which signals its closure via a channel. This allows
the calling handler to ensure it is kept alive until it's safe to exit.

Fixes #4807

NOTE: This was a relatively quick fix, we may want to consider
rewriting the sender as a service to clean up the implementation.

The goroutine launched by `ServerSentEventSender` can perform a write and flush after the calling http handler has exited, at this point the resources (e.g. `http.ResponseWriter`) are no longer safe to use. To work around this issue, heartbeats and sending events are now handled by the goroutine which signals its closure via a channel. This allows the calling handler to ensure it is kept alive until it's safe to exit. Fixes #4807

deansheather · 2022-11-01T11:28:33Z

coderd/httpapi/httpapi.go

 			case <-r.Context().Done():
 				return


Why doesn't this prevent the write/flush after finish? We're not hijacking the connection so the context should still be cancelled on finish right?

Because this code had the following race condition:

Say watchWorkspace encountered an error and exited (this begins the process of all handlers and middleware exiting (r.Context() is not yet cancelled)

Timer is triggered

We write to rw which still succeeds as the teardown is still happening

Teardown finishes, context is cancelled, but we're currently not listening to this signal

We hit Flush after teardown so we get a nil pointer deref

Awesome explanation thank you

deansheather · 2022-11-01T11:29:54Z

coderd/httpapi/httpapi.go

@@ -174,8 +173,7 @@ func WebsocketCloseSprintf(format string, vars ...any) string {
 	return msg
 }

-func ServerSentEventSender(rw http.ResponseWriter, r *http.Request) (func(ctx context.Context, sse codersdk.ServerSentEvent) error, error) {
-	var mu sync.Mutex
+func ServerSentEventSender(rw http.ResponseWriter, r *http.Request) (sendEvent func(ctx context.Context, sse codersdk.ServerSentEvent) error, closed chan struct{}, err error) {


It seems like we can eliminate the error return from this as it's never set to a non-nil value

Not really relevant for this PR, but that's a good point. I'd like to see it kept and remove the panic in favor of an error, though.

deansheather · 2022-11-01T11:32:14Z

coderd/httpapi/httpapi.go

+		event := sseEvent{
+			payload: buf.Bytes(),
+			errC:    make(chan error, 1),
 		}


Did you have to move the writing to the goroutine to fix the bug or is this just a separate change you did? The old flow was simpler to understand as it didn't involve extra write channels and response channels, but if it helps to fix the bug then 👍

It's not mandator per-se, but I think we'd want do it if we service-ified this implementation anyway. That said, asynchronous code will always introduce complexity somewhere, this change moves the core logic into one place which IMO simplifies it. It's easier to reason about where and in what order mutations/writes are happening.

This change also has the benefit of allowing us to listen to both r.Context() and ctx completion which we didn't before, which is more true to the given API (function signature).

Oh yeah, forgot to mention that this logic change also stops writes after an error is encountered, allowing the caller (if errors are checked) to stop using the sender.

Previously we exited the goroutine on write error, meaning keepalives were no longer sent, so it makes sense to disable writes at this point.

The asynchronous logic introduces channel reads which can result in the goroutine hanging waiting for data indefinitely. For example, if you wrote an event to the events channel and the sender goroutine got closed before reading that event, you would be permanently stuck waiting for a response from the response channel as far as I can tell.

I think this logic should be kept the way it was to avoid the complexities and extra bugs introduced by having it be asynchronous.

wait nevermind I forgot it's an unbuffered channel

case in point it's hard to reason with this logic since it's a chain of passing messages around when it could just be a simple write to writer. The goroutine could be refactored to use this same function for writing messages so the safety checks are all in the same place and duplication is reduced

I don't see why that's a problem? The previous implementation was already asynchronous and required mutex locking and taking care not to use the resources after close. Here I've fixed those issues and ensured there are no hangs by listening on e.g. <-closed which is closed when there no longer is a listener for events. If you're considering a future developer making changes, I'd venture those changes would be "dangerous" in the previous implementation as well?

A motivation to keep this change is that AFAIK we'll be using SSEs more in the future, and we'll want to send different types of events on a single channel (due to browser limitations). At this point we'll need a service (sendEvent called from multiple places) and at that point the previous implementation of sendEvent becomes unsafe (same bug we're fixing here).

If we're rewriting this soon like you say then I think this is fine. I still feel like a safe implementation can be achieved in a less complicated way and that we should investigate it in the future, but I won't block your PR for it since this is to fix a panic

I honestly don't know when we're rewriting/refactoring this, @f0ssel may have some idea. Personally I don't feel channels are any more dangerous than mutexes (you have deadlocks and missing unlocks to think about there), but I guess this is in the eye of the beholder. I'll certainly consider your input if I'm the one rewriting this 👍🏻.

coderd/httpapi/httpapi.go

mafredri self-assigned this Nov 1, 2022

mafredri requested a review from a team November 1, 2022 11:18

deansheather reviewed Nov 1, 2022

View reviewed changes

coderd/httpapi/httpapi.go Show resolved Hide resolved

deansheather approved these changes Nov 1, 2022

View reviewed changes

Improve cancellation in sendEvent

87a10aa

mafredri force-pushed the mafredri/fix-sse-flush-after-handler-end branch from e16843c to 87a10aa Compare November 1, 2022 13:06

Merge branch 'main' into mafredri/fix-sse-flush-after-handler-end

6a6232b

coadler approved these changes Nov 1, 2022

View reviewed changes

f0ssel approved these changes Nov 1, 2022

View reviewed changes

mafredri merged commit e508057 into main Nov 1, 2022

mafredri deleted the mafredri/fix-sse-flush-after-handler-end branch November 1, 2022 14:57

github-actions bot locked and limited conversation to collaborators Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Avoid panic in `ServerSentEventSender` by keeping handler alive #4821

fix: Avoid panic in `ServerSentEventSender` by keeping handler alive #4821

mafredri commented Nov 1, 2022 •

edited

Loading

deansheather Nov 1, 2022

mafredri Nov 1, 2022

f0ssel Nov 1, 2022

deansheather Nov 1, 2022

mafredri Nov 1, 2022

deansheather Nov 1, 2022

mafredri Nov 1, 2022

mafredri Nov 1, 2022

deansheather Nov 1, 2022

deansheather Nov 1, 2022

deansheather Nov 1, 2022

mafredri Nov 1, 2022

deansheather Nov 1, 2022

mafredri Nov 1, 2022

fix: Avoid panic in ServerSentEventSender by keeping handler alive #4821

fix: Avoid panic in ServerSentEventSender by keeping handler alive #4821

Conversation

mafredri commented Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fix: Avoid panic in `ServerSentEventSender` by keeping handler alive #4821

fix: Avoid panic in `ServerSentEventSender` by keeping handler alive #4821

mafredri commented Nov 1, 2022 •

edited

Loading