Skip to content

Use postgres views to reduce database calls in api (allow JOINs in pg queries) #2201

Closed
@Emyrk

Description

@Emyrk

This is just tracking a suggestion and making a place for discussion.

Branch that implements this suggestion: stevenmasley/build_initiator_username

PR that started conversation: #2137


Enable use of JOIN in sqlc queries

Currently the UI requires additional information regarding data structures returned by the database. An example is the workspace_build table does not include the following fields that the UI wants and uses convertWorkspaceBuild to add these fields.

  • workspace owner (id & name)
  • initiator name
  • workspace name
  • provisioner job (although included, going to not JOIN this data in right now)

This requires 2 extra database calls (fetch users, and workspace). Ideally we could fetch this information with a JOIN.

-- Make the SELECT a VIEW
CREATE VIEW workspace_build_with_names AS
SELECT
    -- coalesce is used because technically the joins do not guarantee a value.
    -- If we setup proper foreign keys, we can remove the coalesce.
	coalesce(initiator_user.username, 'unknown') AS initiator_username,
	coalesce(workspaces.owner_id, '00000000-00000000-00000000-00000000') AS owner_id,
	coalesce(owner_user.username, 'unknown') AS owner_name,
	coalesce(workspaces.name, 'unknown') AS workspace_name,
        -- This information also seems helpful for UI purposes and is used in some areas of the UI.
	coalesce(templates.id, '00000000-00000000-00000000-00000000') AS template_id,
	coalesce(templates.name, 'unknown') AS template_name,
	coalesce(templates.active_version_id, '00000000-00000000-00000000-00000000') AS template_active_version,
	workspace_builds.*
FROM workspace_builds
		 LEFT JOIN users AS initiator_user ON workspace_builds.initiator_id = initiator_user.id
		 LEFT JOIN workspaces ON workspaces.id = workspace_builds.workspace_id
		 LEFT JOIN users AS owner_user ON workspaces.owner_id = owner_user.id
		 LEFT JOIN templates ON workspaces.template_id = templates.id;

The solution to using joins is to make a migration that makes the above a VIEW in our database that can be reused. This simplifies all the queries around workspace_builds as we do not need to copy the JOIN more than once (DRY).

Downsides to views

Maintaining the view

If a column is added to workspace_build, that column is not automatically added to the view. So your migration will look like this:

-- We must drop the old view first, then recreate it. We cannot just update the old view
-- as the new view must have the same columns.
DROP VIEW workspace_build_with_initiator;

-- This alter table could be adding a column OR removing a column. In both cases, we will 
-- need to drop the view first.
-- You cannot delete a column that has a dependency on it, and a view is a dependency.
ALTER TABLE workspace_builds
    ADD extra_column int DEFAULT 1 NOT NULL;

CREATE VIEW workspace_build_with_names AS
SELECT -- ... Copy the Select query again ...

The biggest downside is that is adds some complexity to our migrations. The developer writing the migration needs to know which views to also update. Postgres will fail if a view is a dependency and the underlying table is changed. They will likely have to copy the CREATE VIEW ... code block from migration to migration.

This is just the cost of the views.

Con: Defaulting to using joins

One downside is that all our queries now use joins. In the example above, the data being joined is only used by the UI. So internal usage doesn't require the join (at the moment). However, having the additional context could be nice for logging purposes, so it might not be a total waste.

I don't think join performance will be so bad that this is really an issue, but it does cause over-fetching.

If we use the sqlc.embed fork of sqlc, we can make db calls with the join, and some db calls without the join. Having this mix can prevent overfetching if we hit performance issues. Using the embed means we can nest data-structs and have the UI data-structure be something like:

type WorkspaceBuildWithNames struct {
  // The backend can work exclusively with 'WorkspaceBuild' type. So calling the FE version of the call
  // is still usable for BE purposes.
  WorkspaceBuild
  OwnerName string
  // .. The rest of the fields
}

The upside

  • Less db calls as we can merge them into 1 with a join
  • Still 1 database.Type shared by all sqlc queries
  • Join queries are reused, and only typed once as a view.
  • Filter options can use names rather than ids. (owner_name vs owner_id from a user perspective)

Alternative (Status Quo)

The alternative is to do the status quo, where we make multiple db calls and merge the resulting structs.

  • Query for workspace build
  • Query for owner & initiator user's
  • Merge in a go convertWorkspaceBuild function.
    • Handling errors in convert is currently inconsistent. The SQL joins handle these more consistently.

Extra links

SQLc implementations that could help. But gated since these features do not exist yet :(.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiArea: HTTP APIdesign neededRequest for more beautyneeds decisionNeeds a higher-level decision to be unblocked.staleThis issue is like stale bread.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions