Description
This is just tracking a suggestion and making a place for discussion.
Branch that implements this suggestion: stevenmasley/build_initiator_username
PR that started conversation: #2137
Enable use of JOIN
in sqlc queries
Currently the UI requires additional information regarding data structures returned by the database. An example is the workspace_build
table does not include the following fields that the UI wants and uses convertWorkspaceBuild
to add these fields.
- workspace owner (id & name)
- initiator name
- workspace name
provisioner job(although included, going to not JOIN this data in right now)
This requires 2 extra database calls (fetch users, and workspace). Ideally we could fetch this information with a JOIN
.
-- Make the SELECT a VIEW
CREATE VIEW workspace_build_with_names AS
SELECT
-- coalesce is used because technically the joins do not guarantee a value.
-- If we setup proper foreign keys, we can remove the coalesce.
coalesce(initiator_user.username, 'unknown') AS initiator_username,
coalesce(workspaces.owner_id, '00000000-00000000-00000000-00000000') AS owner_id,
coalesce(owner_user.username, 'unknown') AS owner_name,
coalesce(workspaces.name, 'unknown') AS workspace_name,
-- This information also seems helpful for UI purposes and is used in some areas of the UI.
coalesce(templates.id, '00000000-00000000-00000000-00000000') AS template_id,
coalesce(templates.name, 'unknown') AS template_name,
coalesce(templates.active_version_id, '00000000-00000000-00000000-00000000') AS template_active_version,
workspace_builds.*
FROM workspace_builds
LEFT JOIN users AS initiator_user ON workspace_builds.initiator_id = initiator_user.id
LEFT JOIN workspaces ON workspaces.id = workspace_builds.workspace_id
LEFT JOIN users AS owner_user ON workspaces.owner_id = owner_user.id
LEFT JOIN templates ON workspaces.template_id = templates.id;
The solution to using joins is to make a migration that makes the above a VIEW in our database that can be reused. This simplifies all the queries around workspace_builds
as we do not need to copy the JOIN
more than once (DRY).
Downsides to views
Maintaining the view
If a column is added to workspace_build
, that column is not automatically added to the view. So your migration will look like this:
-- We must drop the old view first, then recreate it. We cannot just update the old view
-- as the new view must have the same columns.
DROP VIEW workspace_build_with_initiator;
-- This alter table could be adding a column OR removing a column. In both cases, we will
-- need to drop the view first.
-- You cannot delete a column that has a dependency on it, and a view is a dependency.
ALTER TABLE workspace_builds
ADD extra_column int DEFAULT 1 NOT NULL;
CREATE VIEW workspace_build_with_names AS
SELECT -- ... Copy the Select query again ...
The biggest downside is that is adds some complexity to our migrations. The developer writing the migration needs to know which views to also update. Postgres will fail if a view is a dependency and the underlying table is changed. They will likely have to copy the CREATE VIEW ...
code block from migration to migration.
This is just the cost of the views.
Con: Defaulting to using joins
One downside is that all our queries now use joins. In the example above, the data being joined is only used by the UI. So internal usage doesn't require the join (at the moment). However, having the additional context could be nice for logging purposes, so it might not be a total waste.
I don't think join performance will be so bad that this is really an issue, but it does cause over-fetching.
If we use the sqlc.embed fork of sqlc, we can make db calls with the join, and some db calls without the join. Having this mix can prevent overfetching if we hit performance issues. Using the embed
means we can nest data-structs and have the UI data-structure be something like:
type WorkspaceBuildWithNames struct {
// The backend can work exclusively with 'WorkspaceBuild' type. So calling the FE version of the call
// is still usable for BE purposes.
WorkspaceBuild
OwnerName string
// .. The rest of the fields
}
The upside
- Less db calls as we can merge them into 1 with a join
- Still 1
database.Type
shared by all sqlc queries - Join queries are reused, and only typed once as a view.
- Filter options can use names rather than ids. (owner_name vs owner_id from a user perspective)
Alternative (Status Quo)
The alternative is to do the status quo, where we make multiple db calls and merge the resulting structs.
- Query for workspace build
- Query for owner & initiator user's
- Merge in a go
convertWorkspaceBuild
function.- Handling errors in convert is currently inconsistent. The SQL joins handle these more consistently.
Extra links
SQLc implementations that could help. But gated since these features do not exist yet :(.
sqlc.embed()
fork: feat: addsqlc.embed
to allow model re-use sqlc-dev/sqlc#1615.- This was a design sqlc dropped, but
template: workspace_build_initiator
support for putting the views in the queries.sql- Downside is that we have to copy the view's SELECT query with the join to every query.