Skip to content

SQLite: Coerce jsonb columns to json before returning to Go code #3968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

brandur
Copy link

@brandur brandur commented May 16, 2025

This one follows up the discussion in #3953 to try and make the jsonb
data type in SQLite usable (see discussion there, but I believe that
it's currently not).

According the SQLite docs on JSONB [1], it's considered a format that's
internal to the database itself, and no attempt should be made to parse
it elsewhere:

JSONB is not intended as an external format to be used by
applications. JSONB is designed for internal use by SQLite only.
Programmers do not need to understand the JSONB format in order to use
it effectively. Applications should access JSONB only through the JSON
SQL functions, not by looking at individual bytes of the BLOB.

Currently, when trying to use a jsonb column in SQLite, sqlc ends up
returning the internal binary data, which ends up being unparsable in Go:

riverdrivertest.go:3030:
            Error Trace:    /Users/brandur/Documents/projects/river/internal/riverinternaltest/riverdrivertest/riverdrivertest.go:3030
            Error:          Not equal:
                            expected: []byte{0x7b, 0x22, 0x66, 0x6f, 0x6f, 0x22, 0x3a, 0x20, 0x22, 0x62, 0x61, 0x72, 0x22, 0x7d}
                            actual  : []byte{0x8c, 0x37, 0x66, 0x6f, 0x6f, 0x37, 0x62, 0x61, 0x72}

                            Diff:
                            --- Expected
                            +++ Actual
                            @@ -1,3 +1,3 @@
                            -([]uint8) (len=14) {
                            - 00000000  7b 22 66 6f 6f 22 3a 20  22 62 61 72 22 7d        |{"foo": "bar"}|
                            +([]uint8) (len=9) {
                            + 00000000  8c 37 66 6f 6f 37 62 61  72                       |.7foo7bar|
                             }
            Test:           TestDriverRiverSQLite/QueueCreateOrSetUpdatedAt/InsertsANewQueueWithDefaultUpdatedAt

The fix is that we should make sure to coerce jsonb columns back to
json before returning. That's what this pull request does, intercepting
SELECT * and wrapping jsonb columns with a json(...) invocation.

I also assign json and jsonb a []byte data type by default so they
don't end up as any, which isn't very useful. []byte is consistent
with the default for pgx/v5.

[1] https://sqlite.org/jsonb.html

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 16, 2025
@@ -56,6 +56,9 @@ func sqliteType(req *plugin.GenerateRequest, options *opts.Options, col *plugin.
}
return "sql.NullTime"

case "json", "jsonb":
return "[]byte"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is json.RawMessage preferred over []byte?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's what we do for PostgreSQL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, that makes sense. Changed!

@dosubot dosubot bot added the 🔧 golang label May 16, 2025
@@ -149,6 +150,9 @@ func (c *Compiler) expandStmt(qc *QueryCatalog, raw *ast.RawStmt, node ast.Node)
if counts[cname] > 1 {
cname = tableName + "." + cname
}
if _, ok := c.parser.(*sqlite.Parser); ok && column.DataType == "jsonb" {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I left the *sqlite.Parser in here as a dummy implementation for detecting when we're running as SQLite, but I could use some help determining what the preferred convention should be. It feels a little heavy handed to make it part of one of the interfaces like Parser given it's such an unusual corner case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through the code to see if there was a better place to put this. The ideal solution is to create a SelectExpr function on the compiler.Column which adds the json(). That said, we'd have to pipe it through the catalog and the ast, all of which are engine-agnostic.

Instead, let's create a new Selector interface with a single method Expr(name, column) string. In NewCompiler we already switch on the engine so we can pick the correct one to use. Unlike the parser and the catalog, the selector structs shouldn't be exported and should live in the compiler package.

What do you think?

Copy link
Author

@brandur brandur May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, works for me. I just pushed a change that does roughly what you described here. The one tweak I made is that I made Expr -> ColumnExpr to make the interface function a little more clear that it's for use with columns specifically.

I just wanted to call this out just in case: while I was going back through this code, I found a couple places in expand.go that special case based on particular engine name already. e.g.

https://github.com/sqlc-dev/sqlc/blob/c62c6e78843ada0f137960d455baf638390b5c7b/internal/compiler/expand.go/#L60-L68

There is an argument to be made that a new interface makes the code a little more indirect, and given it's already existing convention in a couple places, it might be okay to just make an if statement instead. Either way though, the selector work is already done so I'm good with either.

This one follows up the discussion in sqlc-dev#3953 to try and make the `jsonb`
data type in SQLite usable (see discussion there, but I believe that
it's currently not).

According the SQLite docs on JSONB [1], it's considered a format that's
internal to the database itself, and no attempt should be made to parse
it elsewhere:

> JSONB is not intended as an external format to be used by
> applications. JSONB is designed for internal use by SQLite only.
> Programmers do not need to understand the JSONB format in order to use
> it effectively. Applications should access JSONB only through the JSON
> SQL functions, not by looking at individual bytes of the BLOB.

Currently, when trying to use a `jsonb` column in SQLite, sqlc ends up
returning the internal binary data, which ends up being unparsable in Go:

    riverdrivertest.go:3030:
                Error Trace:    /Users/brandur/Documents/projects/river/internal/riverinternaltest/riverdrivertest/riverdrivertest.go:3030
                Error:          Not equal:
                                expected: []byte{0x7b, 0x22, 0x66, 0x6f, 0x6f, 0x22, 0x3a, 0x20, 0x22, 0x62, 0x61, 0x72, 0x22, 0x7d}
                                actual  : []byte{0x8c, 0x37, 0x66, 0x6f, 0x6f, 0x37, 0x62, 0x61, 0x72}

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,3 +1,3 @@
                                -([]uint8) (len=14) {
                                - 00000000  7b 22 66 6f 6f 22 3a 20  22 62 61 72 22 7d        |{"foo": "bar"}|
                                +([]uint8) (len=9) {
                                + 00000000  8c 37 66 6f 6f 37 62 61  72                       |.7foo7bar|
                                 }
                Test:           TestDriverRiverSQLite/QueueCreateOrSetUpdatedAt/InsertsANewQueueWithDefaultUpdatedAt

The fix is that we should make sure to coerce `jsonb` columns back to
`json` before returning. That's what this pull request does, intercepting
`SELECT *` and wrapping `jsonb` columns with a `json(...)` invocation.

I also assign `json` and `jsonb` a `[]byte` data type by default so they
don't end up as `any`, which isn't very useful. `[]byte` is consistent
with the default for `pgx/v5`.

[1] https://sqlite.org/jsonb.html
@brandur brandur force-pushed the brandur-coerce-jsonb-to-json branch from 14bca9e to 3695ee3 Compare May 16, 2025 06:18
@brandur
Copy link
Author

brandur commented May 16, 2025

@kyleconroy I'm sure this could use some work, but I wanted to pitch something to prove out the concept. Thoughts?

@kyleconroy kyleconroy changed the base branch from main to kyle/brandur-coerce-jsonb-to-json May 22, 2025 04:46
@kyleconroy kyleconroy changed the base branch from kyle/brandur-coerce-jsonb-to-json to main May 22, 2025 04:47
brandur added 2 commits May 25, 2025 17:41
This in response to code review feedback, it's a little more correct to
make json/jsonb values `json.RawMessage` instead of `[]byte`. The former
is a form of the latter, but better represents that the value is meant
to be JSON.
This is response to code review feedback, add a new `Selector `interface
whose job it is to provide an engine-agnostic way of generating output
expressions for when selecting column values with `SELECT ...` or
`RETURNING ...`. This is exclusively needed for SQLite for the time
being, which uses it to wrap all output `jsonb` column values with a
call to `json(...)` so that values are coerced to a publicly usable
format before being returned.

[1] sqlc-dev#3968 (comment)
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 25, 2025
This is based on original code review feedback that I'd misread
initially. The selector interface doesn't need to be an outside package
for any reason (it's used only internal to the compiler), and this lets
us improve it somewhat by taking a full `*Column` struct rather than
having to make it a `dataType string` (because `Column` is internal to
`compiler` and it would otherwise introduce dependency cycles).
@brandur
Copy link
Author

brandur commented May 26, 2025

@kyleconroy Thx! One tweak I made is to make the selector interface internal to the compiler package as requested in your original comment.

In terms of review: is there a particular process I should know about/follow? i.e. Do you go through all open PRs with lgtms periodically and merge them? Do I need to do anything?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files. 🔧 golang
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants