-
Notifications
You must be signed in to change notification settings - Fork 887
chore: track terraform modules in telemetry #15450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d406ef6
to
67795fc
Compare
wooo!!! |
provisioner/terraform/provision.go
Outdated
if err != nil { | ||
return provisionersdk.PlanErrorf("get modules: %s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if we fail to get the list of modules for the build that we only need for telemetry purposes, we fail to build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it'd probably be better to only log an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, after the discussion regarding using modules in places other than telemetry, it may make sense to return an error rather than just log. Otherwise any code using module info would have to take into account that it may be missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so we're clear, this means that any failure to fetch the modules in the working directory will result in a failure of the whole plan stage.
Right now, as far as I can tell, the only scenario that will trigger this is a malformed modules file. Are there any other possible scenarios I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pair review (blocking): as this is currently only for telemetry, we should not fail to plan if anything goes wrong with extracting modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other possible scenarios I'm missing?
I don't think there are any other scenarios - barring disk read failures and such. In normal operation this should only happen if terraform changes how it stores modules on disk and we upgrade to that version.
CREATE TABLE workspace_modules ( | ||
id uuid NOT NULL, | ||
job_id uuid NOT NULL REFERENCES provisioner_jobs (id) ON DELETE CASCADE, | ||
transition workspace_transition NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to capture the module for all transitions? What's the value of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mirrored workspace_resources
. I think for template import jobs you get resources that reference the same provisioner job but have different workspace transitions. My thinking was if you'd like to associate a specific resource from a template import job with its module, you'd need to know which transition it came from.
@@ -0,0 +1,64 @@ | |||
package terraform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about this.
We extract all other resources from the template by executing terraform graph
(see provisioner/terraform/executor.go
-> graph()
) and then pass that to provisioner/terraform/resources.go
-> ConvertState()
. From there we retrieve all the template's resources, and they're then persisted.
Did this approach not work for modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find either module versions or their sources inside the state. This was the first approach I tried, and I switched to reading files only when I couldn't make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the output from a terraform graph
run locally with a module specified:
$ terraform graph
digraph G {
...
"module.jetbrains_gateway.coder_app.gateway" -> "coder_agent.main";
"module.jetbrains_gateway.coder_app.gateway" -> "module.jetbrains_gateway.data.coder_parameter.jetbrains_ide";
"module.jetbrains_gateway.coder_app.gateway" -> "module.jetbrains_gateway.data.coder_workspace.me";
"module.jetbrains_gateway.coder_app.gateway" -> "module.jetbrains_gateway.data.coder_workspace_owner.me";
"module.jetbrains_gateway.coder_app.gateway" -> "module.jetbrains_gateway.data.http.jetbrains_ide_versions";
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the graph only displays the module names specified by the user in the template. It doesn’t show the source, like registry.coder.com/modules/jetbrains-gateway/coder
, or the version, like 1.0.23
. That information is available in the files on disk.
Thought: I can imagine a valid use-case for storing module data separate to telemetry -- showing in the UI which module a resource came from is probably useful from a template editor / admin perspective when trying to answer the question "Which module did this resource come from?" |
@johnstcn I addressed your comments:
|
175c10e
to
b72b8ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few more comments and issues I'd like to see clarified before merging this.
provisionerd/runner/runner.go
Outdated
@@ -1033,6 +1038,7 @@ func (r *Runner) runWorkspaceBuild(ctx context.Context) (*proto.CompletedJob, *p | |||
State: applyComplete.State, | |||
Resources: applyComplete.Resources, | |||
Timings: applyComplete.Timings, | |||
Modules: planComplete.Modules, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using modules from the plan response here? Will these be different from the apply response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module files on disk are created by terraform init
. init
is called only by plan
here, so modules at the time of apply should never be different from those at the time of plan. I admit this is highly non-obvious. I'll add a comment about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, a comment would definitely help. From someone else's perspective, this simply looks like a typo.
provisioner/terraform/resources.go
Outdated
// the database, a null value in WorkspaceResource's ModulePath | ||
// indicates "this resource was created before module paths | ||
// were tracked." | ||
modulePath = "FAILED_TO_PARSE_TERRAFORM_ADDRESS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we extract this to a constant we can refer to elsewhere?
EDIT: it might actually be even better as a sentinel error ErrInvalidTerraformAddr
Files: map[string]string{ | ||
"main.tf": `module "hello" { | ||
source = "./module" | ||
}`, | ||
"module/module.tf": ` | ||
resource "null_resource" "example" {} | ||
|
||
module "there" { | ||
source = "./inner_module" | ||
} | ||
`, | ||
"module/inner_module/inner_module.tf": ` | ||
resource "null_resource" "inner_example" {} | ||
`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test with a malformed or otherwise invalid module?
provisioner/terraform/provision.go
Outdated
if err != nil { | ||
return provisionersdk.PlanErrorf("get modules: %s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so we're clear, this means that any failure to fetch the modules in the working directory will result in a failure of the whole plan stage.
Right now, as far as I can tell, the only scenario that will trigger this is a malformed modules file. Are there any other possible scenarios I'm missing?
err = InsertWorkspaceModule(ctx, s.Database, jobID, transition, module, telemetrySnapshot) | ||
if err != nil { | ||
return nil, xerrors.Errorf("insert module: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can combine assignment and conditional here:
err = InsertWorkspaceModule(ctx, s.Database, jobID, transition, module, telemetrySnapshot) | |
if err != nil { | |
return nil, xerrors.Errorf("insert module: %w", err) | |
} | |
if err := InsertWorkspaceModule(ctx, s.Database, jobID, transition, module, telemetrySnapshot); err != nil { | |
return nil, xerrors.Errorf("insert module: %w", err) | |
} |
@@ -2666,6 +2666,20 @@ func (q *querier) GetWorkspaceByWorkspaceAppID(ctx context.Context, workspaceApp | |||
return fetch(q.log, q.auth, q.db.GetWorkspaceByWorkspaceAppID)(ctx, workspaceAppID) | |||
} | |||
|
|||
func (q *querier) GetWorkspaceModulesByJobID(ctx context.Context, jobID uuid.UUID) ([]database.WorkspaceModule, error) { | |||
if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceSystem); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: rbac.ResourceSystem
is fine just for telemetry, but we'll probably want to create a separate RBAC resource in future for some of the other use-cases here. Can you create a follow-up issue for this and add a comment referencing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes based on pair review with @dannykopping.
The major worry we have is that there is the possibility that a failure to read the modules on-disk will result in a failure to Plan()
. While a malformed or otherwise problematic modules.json
will probably cause Terraform to bail, we shouldn't.
provisioner/terraform/provision.go
Outdated
if err != nil { | ||
return provisionersdk.PlanErrorf("get modules: %s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pair review (blocking): as this is currently only for telemetry, we should not fail to plan if anything goes wrong with extracting modules.
coderd/telemetry/telemetry_test.go
Outdated
@@ -119,6 +123,28 @@ func TestTelemetry(t *testing.T) { | |||
require.Len(t, snapshot.Users, 1) | |||
require.Equal(t, snapshot.Users[0].EmailHashed, "bb44bf07cf9a2db0554bba63a03d822c927deae77df101874496df5a6a3e896d@coder.com") | |||
}) | |||
t.Run("HashedModule", func(t *testing.T) { | |||
t.Parallel() | |||
db := dbmem.New() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we used dbtestutil.NewDB
here instead? Also, less work for you later when dbmem goes away ;-)
coderd/telemetry/telemetry.go
Outdated
@@ -661,6 +676,26 @@ func ConvertWorkspaceResourceMetadata(metadata database.WorkspaceResourceMetadat | |||
} | |||
} | |||
|
|||
func ConvertWorkspaceModule(module database.WorkspaceModule) WorkspaceModule { | |||
isCoderModule := strings.Contains(module.Source, "registry.coder.com") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should extract this to a top-level function. If we have multiple registries in the future we may need to add to this exclusion inclusion list.
job_id = $1; | ||
|
||
-- name: GetWorkspaceModulesCreatedAfter :many | ||
SELECT * FROM workspace_modules WHERE created_at > $1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: If we're going to be querying on created_at
, do we need an index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't hurt, I'll add it. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b72b8ef
to
780730f
Compare
@johnstcn @dannykopping I made changes based on your feedback. List of changesEach point references the source comment.
|
writer := tar.NewWriter(&buffer) | ||
|
||
addedDirs := make(map[string]bool) | ||
for name, content := range files { | ||
// Add parent directories if they don't exist | ||
dir := filepath.Dir(name) | ||
if dir != "." && !addedDirs[dir] { | ||
err := writer.WriteHeader(&tar.Header{ | ||
Name: dir + "/", // Directory names must end with / | ||
Mode: 0o755, | ||
Typeflag: tar.TypeDir, | ||
}) | ||
require.NoError(t, err) | ||
addedDirs[dir] = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found many instances of this function sprinkled around our codebase so I added a helper function for this in #15540
I'll wait for this PR to be merged and then update these tests to use testutil.CreateTar
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Addresses https://github.com/coder/nexus/issues/35.
This PR:
workspace_modules
table to track modules used by the Terraform provisioner in provisioner jobs.module_path
column to theworkspace_resources
table, allowing to identify which module a resource originates from.For the person reviewing this PR, do not fret about the 1,500 new lines - ~1,000 of them are auto-generated.