Skip to content

feat: Move from datadog to generic otel #1567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 19, 2022
Merged

feat: Move from datadog to generic otel #1567

merged 10 commits into from
May 19, 2022

Conversation

f0ssel
Copy link
Contributor

@f0ssel f0ssel commented May 18, 2022

This replaces the existing datadog tracing library with OpenTelemetry, which is a more open standard that more customers can integrate with.

I took a look at the middleware packages they use and they were pretty simple, so I've implemented the basic http information into a tracing middleware.

@f0ssel
Copy link
Contributor Author

f0ssel commented May 18, 2022

@kylecarbs looking for initial thoughts

Copy link
Member

@kylecarbs kylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few q's. How does this play into database tracing stuff with DataDog?

Comment on lines 29 to 32
tracerProvider := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this actually send data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a comment, but they get shipped to a local otel exporter by default. You can change where it gets shipped with otel env vars like OTEL_COLLECTOR_ENDPOINT and such, so we don't need to write any code to support the configuration. The datadog agent we have deployed on dev.coder.com has a collector built in, so this will automatically start sending the otel data there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see! Even just an external link would help. I was expecting --trace to accept an endpoint to write traces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,59 @@
package telemetry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name telemetry might be confusing. I think tracing would be very reasonable. We might want to place this in coderd/tracing, because I'm sure we'll probably only trace there for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence between the two

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this higher level BECAUSE I didn't want to exclude the provisioner apps. Those reasonably could be traced as well, and those distributed traces would be connected in otel with the right glue code.

That being said, we are not currently using any tracing in the other apps, so I'm also fine to move to coderd if we want to isolate it for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we isolate for now. Nothing else has an HTTP server, and if that's our primary point of tracing I'd be hesitant to say we'll do it for other things soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)

// HTTPMW adds tracing to http routes.
func HTTPMW(tracerProvider *sdktrace.TracerProvider, name string) func(http.Handler) http.Handler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you didn't wrap the *sdktrace.TraceProvider type. Very simple to process!

@f0ssel
Copy link
Contributor Author

f0ssel commented May 18, 2022

This doesn't do DB tracing yet, that never got merged with DD either. I've taken a look and it seems pretty simple though. This will match parity (for the important info) in datadog so it doesn't disrupt the monitoring.

@f0ssel f0ssel requested a review from kylecarbs May 18, 2022 21:52
@f0ssel f0ssel marked this pull request as ready for review May 18, 2022 21:52
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Had a few thoughts in addition to the comments:

  • Do we currently inform the user where traces are sent (when --trace is enabled), should we?
  • (Not this PR) I'm not seeing a panic handler, should we add one and make sure it plays nice with tracing?

func HTTPMW(tracerProvider *sdktrace.TracerProvider, name string) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(rw http.ResponseWriter, r *http.Request) {
// // do not trace if exporter has not be initialized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// // do not trace if exporter has not be initialized
// do not trace if exporter has not be initialized

}

// start span with default span name. Span name will be updated once request finishes
_, span := tracerProvider.Tracer(name).Start(r.Context(), "http.request")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Not ideal, but we could consider the following fallback:

Suggested change
_, span := tracerProvider.Tracer(name).Start(r.Context(), "http.request")
_, span := tracerProvider.Tracer(name).Start(r.Context(), fmt.Sprintf("%s %s", r.Method, r.RequestURI))

My thought here is that if we want to combine logging with otel, either by using otel directly or integration via logging library, having the name set can provide additional context for logs that may be output before the handler completes.

cli/server.go Outdated
logger.Warn(cmd.Context(), "failed to start telemetry exporter", slog.Error(err))
} else {
defer func() {
_ = tracerProvider.Shutdown(cmd.Context())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want to use cmd.Context() here since it is quite likely to already be cancelled, which would prevent Shutdown from waiting for flush. Perhaps we should create a new context with a timeout instead, say 5 seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that makes sense

@f0ssel
Copy link
Contributor Author

f0ssel commented May 19, 2022

  • Do we currently inform the user where traces are sent (when --trace is enabled), should we?
  • (Not this PR) I'm not seeing a panic handler, should we add one and make sure it plays nice with tracing?

Yes I want to have both of these things by community MVP

@f0ssel f0ssel merged commit 376c681 into main May 19, 2022
@f0ssel f0ssel deleted the f0ssel/otel2 branch May 19, 2022 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants