-
Notifications
You must be signed in to change notification settings - Fork 138
Add support for array and struct literals #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Simple select example: import sqlalchemy as sa
from pybigquery import array, struct
from pybigquery.sqlalchemy_bigquery import BigQueryDialect
engine = sa.create_engine("bigquery://<your connstr here>")
engine.execute(
"""
create table tmp.test_struct as (
select 'a' as id, struct(1 as x__count, 2 as y__count, 3 as z__count) as dimensions
)
"""
)
table = sa.Table("tmp.test_struct", sa.MetaData(bind=engine), autoload=True)
# This would obviously be cleaner if the RECORD built these natively.
dimensions = struct([], table.c.dimensions.name)
query = sa.select([table.c.id, (dimensions["x__count"] + 5).label("x")])
print(query.compile(dialect=BigQueryDialect(), compile_kwargs={"literal_binds": True})) Example doing a pivot: import sqlalchemy as sa
from pybigquery import array, struct
from pybigquery.sqlalchemy_bigquery import BigQueryDialect
engine = sa.create_engine("bigquery://<your connstr here>")
engine.execute(
"""
create table tmp.test as (
select 1 as x__count, 2 as y__count, 3 as z__count
)
"""
)
table = sa.Table("tmp.test", sa.MetaData(bind=engine), autoload=True)
pivot_unnest = sa.func.unnest(
array(
[
struct([sa.literal(dim).label("dimension"), col.label("count")])
for dim, col in {
"x": table.c.x__count,
"y": table.c.y__count,
"z": table.c.z__count,
}.items()
]
)
).alias("pivot")
pivot = struct([sa.column("dimension"), sa.column("count")], field=pivot_unnest.name)
query = sa.select(
[pivot["dimension"].label("dimension"), pivot["count"].label("count")]
).select_from(table.join(pivot_unnest, sa.literal(True)))
print(query.compile(dialect=BigQueryDialect(), compile_kwargs={"literal_binds": True})) It's a little bit awkward since they're not table objects (so no A few other things that'd be nice:
|
It might make sense to support ARRAY first, as there's already a data type for it in SQLAlchemy. https://docs.sqlalchemy.org/en/13/core/type_basics.html#sql-standard-and-multiple-vendor-types Have you seen any discussion on adding STRUCT/RECORD types to SQLAlchemy for other backends? I'd want to make sure we do something that aligns with the direction of the rest of the community. |
@tswast makes sense - the Re a STRUCT/RECORD datatype - I didn't find anything on a cursory look at the sqla Google Group, they might be even more rare than Perhaps I can start a discussion in the Google Groups about these things. |
@JacobHayes wrt array literal, why not just I guess this gets more exciting with structs, and especially arrays of structs. |
@jimfulton oh yeah, hadn't thought of import sqlalchemy as sa
from pybigquery import array, struct
from pybigquery.sqlalchemy_bigquery import BigQueryDialect
engine = sa.create_engine("bigquery://")
engine.execute(
"""
create or replace table tmp.test_struct as (
select 'a' as id, struct(1 as x__count, 2 as y__count, 3 as z__count) as dimensions
)
"""
)
table = sa.Table("tmp.test_struct", sa.MetaData(bind=engine), autoload=True)
print(list(engine.execute(sa.select([sa.literal(["a"])]))))
# [(['a'],)]
engine.execute(sa.select([sa.literal([table.c.id])]))
# ProgrammingError: (google.cloud.bigquery.dbapi.exceptions.ProgrammingError) Encountered unexpected first array element of parameter param_1, cannot determine array elements type.
# [SQL: SELECT %(param_1)s AS `anon_1`]
# [parameters: {'param_1': [Column('id', String(), table=<tmp.test_struct>)]}]
# (Background on this error at: http://sqlalche.me/e/13/f405)
print(list(engine.execute(sa.select([array(["a"])]))))
# [(['a'],)]
print(list(engine.execute(sa.select([array([table.c.id])]))))
# [(['a'],)] |
Thanks. It's really helpful to have specific examples! |
FTR, IMO a big (surprising) part of the value proposition of SQLAlchemy is as an abstraction layer over disparate database engines. For that reason, I'd like to enable as much as possible without introducing dialect-specific APIs. |
Absolutely agree there! The array literal side at least matches postgres, but structs seem a bit more unique (though not so much for data warehouses). Perhaps upstream can be convinced to move the Additionally, a proper |
@JacobHayes please checkout https://github.com/googleapis/python-bigquery-sqlalchemy/pull/318/files#diff-ee8781d1be11dbad05f4936052659eed168b4ebeaef166b9e2684f74293e30f8R94-R183 and let me know if I satisfied what you were trying to do. I didn't really follow your pivot example, but I'm hoping I covered it in |
Superseded by #318 |
This allows building
array
s andstruct
s up insa.select
statements from other columns. Thearray
is just ripped right from the postgres dialect with a customvisit_array
. Thestruct
have their own type and column expression. I didn't touch the existingRECORD
type (which is justJSON
) but instead made a customSTRUCT
type that supports lookups to access subfields - not sure how to reconcile this with the existingRECORD
type. Is it as easy as just changing_type_map
's value toSTRUCT
instead ofJSON
?These have been useful for pivots (ie: array of struct("{X}" as name, {col} as value)).
NOTE: due to struct fields requiring labels, things break when not using#47 was mergedliteral_binds
due to #39, but work fine after #47. I have an integration branch here if it is useful in the meantime.