-
Notifications
You must be signed in to change notification settings - Fork 138
Unnest Function In BigQuery SQLAlchemy Plugin Joins Twice When Selecting Other Columns in CTE or Subquery #368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
PR #369 adds a fix for the CTE case. I tried to write a failing test using I have a feeling that this is going to be whack-a-mole salad (e.g., nesting CTEs is likely still broken). |
I took a closer looks at this logic. The multiple "FROM" logic is technically a cross-join https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#comma_cross_join That is probably what we want, since it "flattens" the array, but I wonder if we can make that cross-join more explicit to avoid issues like this? Perhaps we make |
Hi, I originally submitted the request, let me know if I can do anything to help. The linked PR fixed the issue as reported: q = select(table.c.foo, F.unnest(table.c.bars).column_valued("bar"))
q = q.cte("cte")
q = select(*q.columns)
print(q.compile(engine)) WITH `cte` AS
(SELECT `table1`.`foo` AS `foo`, `bar`
FROM `table1`, unnest(`table1`.`bars`) AS `bar`)
SELECT `cte`.`foo`, `cte`.`bar`
FROM `cte` but...
q = q.cte("cte2")
q = select(*q.columns)
print(q.compile(engine)) WITH `cte` AS
(SELECT `table1`.`foo` AS `foo`, `bar`
FROM `table1`, `table1` `table1_1`, unnest(`table1_1`.`bars`) AS `bar`),
`cte2` AS
(SELECT `cte`.`foo` AS `foo`, `cte`.`bar` AS `bar`
FROM `cte`)
SELECT `cte2`.`foo`, `cte2`.`bar`
FROM `cte2` yup.
It's definitely possible that I'm using this wrong, and if there is a more explicit way to express this I'm happy to use that. FWIW the Postgres backend seems to be OK with this though: pg_engine = create_engine("postgresql://", future=True)
print(q.compile(pg_engine)) WITH cte AS
(SELECT table1.foo AS foo, bar
FROM table1, unnest(table1.bars) AS bar),
cte2 AS
(SELECT cte.foo AS foo, cte.bar AS bar
FROM cte)
SELECT cte2.foo, cte2.bar
FROM cte2 |
@kwohlfahrt thanks for the examples and the note regarding Postgres. |
Hello team! This issue has been open for some time now. I just wanted to make a comment to bring some visibility to the issue. @kwohlfahrt you mentioned that the linked PR fixed the issue as reported but you have another use case where you are uncertain whether you are using the product wrong. @tseaver how could we resolve this situation? |
👋 any progress on this one? sadly running into the nested cte issue 😢 |
Since this look completely stagnant, someone knows some workaround it? Is possible to write the I tried using the solution of @kwohlfahrt but didn't work because I need to know the values inside the struct, so I had to use table_valued(). But it returns
which basically duplicates the data. |
I am getting this issue when trying to unnest an array of structs, similar to this issue: #376 |
@lazjake oh hi - still an issue 🫠 |
IMHO the original problematic code was fixed by #839, though there is still a scenario that is not handled: When creating a statement using JOIN + UNNEST and then using it as a subquery for a new SELECT statement, it causes the table used in the UNNEST to be specified twice in the FROM statement, thus creating implicit JOIN between that table to itself. Example code:
Which prints the following:
Thanks to @yuval-bavli for helping with pinpointing the scenario |
From a customer support request:
Issue summary: This issue is closely related to Unnest function in BigQuery SQLAlchemy plugin joins twice when selecting other columns #353. After the previous issue was resolved, queries using the UNNEST function that are not in Common Table Expressions (CTE) or subqueries work as expected. See the following minimal example:
This outputs the following, which is expected:
However, if we use the table as a subquery or CTE, the issue manifests again. The following code (continuing from above):
Produces the next output:
Note that
table1
is duplicated in theFROM
clause again.Expected behavior: It is expected to generate the following query instead (note the different FROM clause):
The text was updated successfully, but these errors were encountered: