Skip to content

chore: track terraform modules in telemetry #15450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Nov 16, 2024

Conversation

hugodutka
Copy link
Contributor

@hugodutka hugodutka commented Nov 8, 2024

Addresses https://github.com/coder/nexus/issues/35.

This PR:

  • Adds a workspace_modules table to track modules used by the Terraform provisioner in provisioner jobs.
  • Adds a module_path column to the workspace_resources table, allowing to identify which module a resource originates from.
  • Starts pushing this new information into telemetry.

For the person reviewing this PR, do not fret about the 1,500 new lines - ~1,000 of them are auto-generated.

@hugodutka hugodutka force-pushed the hugodutka/track-workspace-modules branch from d406ef6 to 67795fc Compare November 8, 2024 11:44
@bpmct
Copy link
Member

bpmct commented Nov 8, 2024

wooo!!!

@hugodutka hugodutka marked this pull request as ready for review November 8, 2024 14:04
@matifali matifali changed the title feat: track terraform modules in telemetry chore: track terraform modules in telemetry Nov 8, 2024
Comment on lines 146 to 148
if err != nil {
return provisionersdk.PlanErrorf("get modules: %s", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if we fail to get the list of modules for the build that we only need for telemetry purposes, we fail to build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it'd probably be better to only log an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, after the discussion regarding using modules in places other than telemetry, it may make sense to return an error rather than just log. Otherwise any code using module info would have to take into account that it may be missing.

Copy link
Member

@johnstcn johnstcn Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so we're clear, this means that any failure to fetch the modules in the working directory will result in a failure of the whole plan stage.

Right now, as far as I can tell, the only scenario that will trigger this is a malformed modules file. Are there any other possible scenarios I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair review (blocking): as this is currently only for telemetry, we should not fail to plan if anything goes wrong with extracting modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other possible scenarios I'm missing?

I don't think there are any other scenarios - barring disk read failures and such. In normal operation this should only happen if terraform changes how it stores modules on disk and we upgrade to that version.

CREATE TABLE workspace_modules (
id uuid NOT NULL,
job_id uuid NOT NULL REFERENCES provisioner_jobs (id) ON DELETE CASCADE,
transition workspace_transition NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to capture the module for all transitions? What's the value of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mirrored workspace_resources. I think for template import jobs you get resources that reference the same provisioner job but have different workspace transitions. My thinking was if you'd like to associate a specific resource from a template import job with its module, you'd need to know which transition it came from.

@@ -0,0 +1,64 @@
package terraform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about this.

We extract all other resources from the template by executing terraform graph (see provisioner/terraform/executor.go -> graph()) and then pass that to provisioner/terraform/resources.go -> ConvertState(). From there we retrieve all the template's resources, and they're then persisted.

Did this approach not work for modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find either module versions or their sources inside the state. This was the first approach I tried, and I switched to reading files only when I couldn't make it work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the output from a terraform graph run locally with a module specified:

$ terraform graph
digraph G {
  ...
  "module.jetbrains_gateway.coder_app.gateway" -> "coder_agent.main";
  "module.jetbrains_gateway.coder_app.gateway" -> "module.jetbrains_gateway.data.coder_parameter.jetbrains_ide";
  "module.jetbrains_gateway.coder_app.gateway" -> "module.jetbrains_gateway.data.coder_workspace.me";
  "module.jetbrains_gateway.coder_app.gateway" -> "module.jetbrains_gateway.data.coder_workspace_owner.me";
  "module.jetbrains_gateway.coder_app.gateway" -> "module.jetbrains_gateway.data.http.jetbrains_ide_versions";
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the graph only displays the module names specified by the user in the template. It doesn’t show the source, like registry.coder.com/modules/jetbrains-gateway/coder, or the version, like 1.0.23. That information is available in the files on disk.

@johnstcn
Copy link
Member

Thought: I can imagine a valid use-case for storing module data separate to telemetry -- showing in the UI which module a resource came from is probably useful from a template editor / admin perspective when trying to answer the question "Which module did this resource come from?"

@hugodutka
Copy link
Contributor Author

@johnstcn I addressed your comments:

  • changed the invalid module path string to a static sentinel string
  • obfuscated module source and version for modules outside of registry.coder.com
  • changed the error message when we fail to parse a terraform address

@hugodutka hugodutka force-pushed the hugodutka/track-workspace-modules branch from 175c10e to b72b8ef Compare November 14, 2024 10:00
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more comments and issues I'd like to see clarified before merging this.

@@ -1033,6 +1038,7 @@ func (r *Runner) runWorkspaceBuild(ctx context.Context) (*proto.CompletedJob, *p
State: applyComplete.State,
Resources: applyComplete.Resources,
Timings: applyComplete.Timings,
Modules: planComplete.Modules,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using modules from the plan response here? Will these be different from the apply response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module files on disk are created by terraform init. init is called only by plan here, so modules at the time of apply should never be different from those at the time of plan. I admit this is highly non-obvious. I'll add a comment about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a comment would definitely help. From someone else's perspective, this simply looks like a typo.

// the database, a null value in WorkspaceResource's ModulePath
// indicates "this resource was created before module paths
// were tracked."
modulePath = "FAILED_TO_PARSE_TERRAFORM_ADDRESS"
Copy link
Member

@johnstcn johnstcn Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we extract this to a constant we can refer to elsewhere?

EDIT: it might actually be even better as a sentinel error ErrInvalidTerraformAddr

Comment on lines +764 to +777
Files: map[string]string{
"main.tf": `module "hello" {
source = "./module"
}`,
"module/module.tf": `
resource "null_resource" "example" {}

module "there" {
source = "./inner_module"
}
`,
"module/inner_module/inner_module.tf": `
resource "null_resource" "inner_example" {}
`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test with a malformed or otherwise invalid module?

Comment on lines 146 to 148
if err != nil {
return provisionersdk.PlanErrorf("get modules: %s", err)
}
Copy link
Member

@johnstcn johnstcn Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so we're clear, this means that any failure to fetch the modules in the working directory will result in a failure of the whole plan stage.

Right now, as far as I can tell, the only scenario that will trigger this is a malformed modules file. Are there any other possible scenarios I'm missing?

Comment on lines 1282 to 1285
err = InsertWorkspaceModule(ctx, s.Database, jobID, transition, module, telemetrySnapshot)
if err != nil {
return nil, xerrors.Errorf("insert module: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can combine assignment and conditional here:

Suggested change
err = InsertWorkspaceModule(ctx, s.Database, jobID, transition, module, telemetrySnapshot)
if err != nil {
return nil, xerrors.Errorf("insert module: %w", err)
}
if err := InsertWorkspaceModule(ctx, s.Database, jobID, transition, module, telemetrySnapshot); err != nil {
return nil, xerrors.Errorf("insert module: %w", err)
}

@@ -2666,6 +2666,20 @@ func (q *querier) GetWorkspaceByWorkspaceAppID(ctx context.Context, workspaceApp
return fetch(q.log, q.auth, q.db.GetWorkspaceByWorkspaceAppID)(ctx, workspaceAppID)
}

func (q *querier) GetWorkspaceModulesByJobID(ctx context.Context, jobID uuid.UUID) ([]database.WorkspaceModule, error) {
if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceSystem); err != nil {
Copy link
Member

@johnstcn johnstcn Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: rbac.ResourceSystem is fine just for telemetry, but we'll probably want to create a separate RBAC resource in future for some of the other use-cases here. Can you create a follow-up issue for this and add a comment referencing it?

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on pair review with @dannykopping.

The major worry we have is that there is the possibility that a failure to read the modules on-disk will result in a failure to Plan(). While a malformed or otherwise problematic modules.json will probably cause Terraform to bail, we shouldn't.

Comment on lines 146 to 148
if err != nil {
return provisionersdk.PlanErrorf("get modules: %s", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair review (blocking): as this is currently only for telemetry, we should not fail to plan if anything goes wrong with extracting modules.

@@ -119,6 +123,28 @@ func TestTelemetry(t *testing.T) {
require.Len(t, snapshot.Users, 1)
require.Equal(t, snapshot.Users[0].EmailHashed, "bb44bf07cf9a2db0554bba63a03d822c927deae77df101874496df5a6a3e896d@coder.com")
})
t.Run("HashedModule", func(t *testing.T) {
t.Parallel()
db := dbmem.New()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we used dbtestutil.NewDB here instead? Also, less work for you later when dbmem goes away ;-)

@@ -661,6 +676,26 @@ func ConvertWorkspaceResourceMetadata(metadata database.WorkspaceResourceMetadat
}
}

func ConvertWorkspaceModule(module database.WorkspaceModule) WorkspaceModule {
isCoderModule := strings.Contains(module.Source, "registry.coder.com")
Copy link
Member

@johnstcn johnstcn Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should extract this to a top-level function. If we have multiple registries in the future we may need to add to this exclusion inclusion list.

job_id = $1;

-- name: GetWorkspaceModulesCreatedAfter :many
SELECT * FROM workspace_modules WHERE created_at > $1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: If we're going to be querying on created_at, do we need an index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't hurt, I'll add it. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ You'll need to rename the migration number. I think you may want 276, but check just before you merge.

@hugodutka hugodutka force-pushed the hugodutka/track-workspace-modules branch from b72b8ef to 780730f Compare November 15, 2024 16:59
Comment on lines 84 to +98
writer := tar.NewWriter(&buffer)

addedDirs := make(map[string]bool)
for name, content := range files {
// Add parent directories if they don't exist
dir := filepath.Dir(name)
if dir != "." && !addedDirs[dir] {
err := writer.WriteHeader(&tar.Header{
Name: dir + "/", // Directory names must end with /
Mode: 0o755,
Typeflag: tar.TypeDir,
})
require.NoError(t, err)
addedDirs[dir] = true
}
Copy link
Member

@johnstcn johnstcn Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found many instances of this function sprinkled around our codebase so I added a helper function for this in #15540

I'll wait for this PR to be merged and then update these tests to use testutil.CreateTar 👍

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@hugodutka hugodutka merged commit aa0dc2d into main Nov 16, 2024
29 checks passed
@hugodutka hugodutka deleted the hugodutka/track-workspace-modules branch November 16, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants