-
Notifications
You must be signed in to change notification settings - Fork 881
feat: add support package #12289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support package #12289
Conversation
89b4d9e
to
f60dcc6
Compare
05a48b4
to
99ad0db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, my suggestions are mainly to add safety rails to prevent overly high memory usage. Then again, if this is only running on the client, maybe we don't need to be so careful (I'd be more worried if coderd was assembling the bundle).
log.Error(ctx, "fetch experiments", slog.Error(err)) | ||
} else { | ||
d.Experiments = exp | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do these concurrently for better speed? The data snapshot is "closer together" as well, in case that matters (same for the other functions too?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Will address in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be good to go with single-threaded workflow. It is a support bundle not really a time-critical function.
log.Error(ctx, "fetch coordinator debug page", slog.Error(err)) | ||
} else { | ||
defer coordResp.Body.Close() | ||
bs, err := io.ReadAll(coordResp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a limit reader here (and elsewhere)? Protective measure just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about it, the less I like limiting the amount of data read.
A support bundle with truncated information is likely to be useless.
log.Error(ctx, "fetch provisioner job logs", slog.Error(err), slog.F("job_id", ws.LatestBuild.Job.ID.String())) | ||
} else { | ||
defer closer.Close() | ||
for log := range buildLogCh { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a max size, only pick the tail-end if too big?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can know the number of logs ahead of time.
log.Error(ctx, "fetch agent startup logs", slog.Error(err), slog.F("agent_id", agentID.String())) | ||
} else { | ||
defer closer.Close() | ||
for logChunk := range agentLogCh { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with workspace build, agent logs are limited to 1MB currently, but that assumption could change.
|
||
b.Deployment = DeploymentInfo(ctx, d.Client, d.Log) | ||
b.Workspace = WorkspaceInfo(ctx, d.Client, d.Log, d.WorkspaceID, d.AgentID) | ||
b.Network = NetworkInfo(ctx, d.Client, d.Log, d.AgentID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fetch concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR is 👍
// Bundle is a set of information discovered about a deployment. | ||
// Even though we do attempt to sanitize data, it may still contain | ||
// sensitive information and should thus be treated as secret. | ||
type Bundle struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking loud: don't need to move it to codersdk
? I'm thinking about some scripts or our site reviewing these files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd defer doing this until it's absolutely necessary.
type Network struct { | ||
CoordinatorDebug string `json:"coordinator_debug"` | ||
TailnetDebug string `json:"tailnet_debug"` | ||
NetcheckLocal *codersdk.WorkspaceAgentConnectionInfo `json:"netcheck_local"` | ||
NetcheckRemote *codersdk.WorkspaceAgentConnectionInfo `json:"netcheck_remote"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it might be useful to annotate these fields and give some examples what exactly can a user/developer expect. Might be nice to highlight fields with sensitive data.
type Workspace struct { | ||
Workspace codersdk.Workspace `json:"workspace"` | ||
BuildLogs []codersdk.ProvisionerJobLog `json:"build_logs"` | ||
Agent codersdk.WorkspaceAgent `json:"agent"` | ||
AgentStartupLogs []codersdk.WorkspaceAgentLog `json:"startup_logs"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include last N-workspace builds? User may retry building the workspace a few times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave N=1 for now and add more if we find a use-case. I would generally always look at the latest build when troubleshooting an issue.
log.Error(ctx, "fetch experiments", slog.Error(err)) | ||
} else { | ||
d.Experiments = exp | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be good to go with single-threaded workflow. It is a support bundle not really a time-critical function.
support/support.go
Outdated
// Get /api/v2/debug/coordinator | ||
coordResp, err := client.Request(ctx, http.MethodGet, "/api/v2/debug/coordinator", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I believe this line of code is self-descriptive :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add debug logging instead comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments were more of a reminder to myself :-) Will remove.
var d Deployment | ||
bi, err := client.BuildInfo(ctx) | ||
if err != nil { | ||
log.Error(ctx, "fetch build info", slog.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the approach you implemented here. It is appropriate, but maybe we could keep a field for bundling errors within the bundle? It would make debugging easier when coderd returns HTTP 500, and the client error is saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if there's a nice generic solution to this in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely 👍 This is something that can be done in a follow-up.
codersdk/health.go
Outdated
@@ -44,6 +44,19 @@ type UpdateHealthSettings struct { | |||
DismissedHealthchecks []HealthSection `json:"dismissed_healthchecks"` | |||
} | |||
|
|||
func (c *Client) GetDebugHealth(ctx context.Context) (HealthcheckReport, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably remove the Get
here since other API methods don't have it.
Part of #12161
More to be added in later PRs.