Skip to content

Nested labeled expressions are omitted #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JacobHayes opened this issue Sep 23, 2019 · 0 comments · Fixed by #47
Closed

Nested labeled expressions are omitted #39

JacobHayes opened this issue Sep 23, 2019 · 0 comments · Fixed by #47
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@JacobHayes
Copy link
Contributor

The pybigquery driver replaces nested labeled expression with the label instead of the actual expression. The result is an "Unrecognized name" error. Otherwise, when the expression label matches the inner column name, the expression will be silently omitted for a plain column reference.

When the query is compiled and printed with literal_binds, the query looks correct (ie: the full expression is included), but for plain print, compile, and the actual execution, the expression is missing.

It seems like labeled expressions are only compiled one step past the first inner label instead of recursively.

Test case:

from contextlib import contextmanager

from sqlalchemy import MetaData, Table, case, create_engine, func, literal, select

engine = create_engine("bigquery://<your connstr here>")
# engine = create_engine("postgresql:/<your connstr here>")


@contextmanager
def test_query():
    engine.execute(
        """
        create table test as (
            select 1 as val
            union all select 2
            union all select 3
        )
    """
    )
    val = Table("test", MetaData(bind=engine), autoload=True).c.val
    sum = func.sum
    try:
        yield select(
            [
                # References `inner` instead of `val`.
                sum(sum(val.label("inner")).label("outer")).over(),
                # Also references `inner` instead of `val`.
                sum(case([[literal(True), val.label("inner")]]).label("outer")),
                # Completely omits the case expression and references `middle`.
                sum(
                    sum(
                        case([[literal(True), val.label("inner")]]).label("middle")
                    ).label("outer")
                ).over(),
            ]
        )
    finally:
        engine.execute("drop table test")


with test_query() as q:
    print("Print:")
    print(q)

    print("\nCompile:")
    print(q.compile(engine))

    print("\nLiteral compile:")
    print(q.compile(engine, compile_kwargs={"literal_binds": True}))

    print("\nResult:")
    print(tuple(engine.execute(q)))

The output from the Bigquery Engine is below - note that the query in the exception and all prints except the literal compile show the labels instead of inner expressions.

Print:
SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN %(param_1)s THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
FROM `test`

Compile:
SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN %(param_1)s THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
FROM `test`

Literal compile:
SELECT sum(sum(`test`.`val`)) OVER () AS `anon_1`, sum(CASE WHEN true THEN `test`.`val` END) AS `sum_1`, sum(sum(CASE WHEN true THEN `test`.`val` END)) OVER () AS `anon_2`
FROM `test`

Result:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 172, in execute
    self._query_job.result()
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 2939, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 734, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py", line 127, in result
    raise self._exception
google.api_core.exceptions.BadRequest: 400 Unrecognized name: `inner` at [1:16]

(job ID: <omitted>)

                                                         -----Query Job SQL Follows-----                                                       

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN @`param_1` THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
   2:FROM `test`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 174, in execute
    raise exceptions.DatabaseError(exc)
google.cloud.bigquery.dbapi.exceptions.DatabaseError: 400 Unrecognized name: `inner` at [1:16]

(job ID: <omitted>)

                                                         -----Query Job SQL Follows-----                                                       

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN @`param_1` THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
   2:FROM `test`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "src/test.py", line 54, in <module>
    print(tuple(engine.execute(q)))
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2179, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
    distilled_params,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1253, in _execute_context
    e, statement, parameters, cursor, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1473, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 174, in execute
    raise exceptions.DatabaseError(exc)
sqlalchemy.exc.DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 Unrecognized name: `inner` at [1:16]

(job ID: <omitted>)

                                                         -----Query Job SQL Follows-----                                                       

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN @`param_1` THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
   2:FROM `test`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
[SQL: SELECT sum(sum(`inner`)) OVER () AS `anon_1`, sum(CASE WHEN %(param_1)s THEN `inner` END) AS `sum_1`, sum(sum(`middle`)) OVER () AS `anon_2`
FROM `test`]
[parameters: {'param_1': True}]
(Background on this error at: http://sqlalche.me/e/4xp6)

For context (and correct behavior I think), when I change the engine to a Postgres database, the query looks correct and runs fine:

SELECT sum(sum(test.val)) OVER () AS anon_1, sum(CASE WHEN %(param_1)s THEN test.val END) AS sum_1, sum(sum(CASE WHEN %(param_2)s THEN test.val END)) OVER () AS anon_2
FROM test

Compile:
SELECT sum(sum(test.val)) OVER () AS anon_1, sum(CASE WHEN %(param_1)s THEN test.val END) AS sum_1, sum(sum(CASE WHEN %(param_2)s THEN test.val END)) OVER () AS anon_2
FROM test

Literal compile:
SELECT sum(sum(test.val)) OVER () AS anon_1, sum(CASE WHEN true THEN test.val END) AS sum_1, sum(sum(CASE WHEN true THEN test.val END)) OVER () AS anon_2
FROM test

Result:
((Decimal('6'), 6, Decimal('6')),)
@tswast tswast added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants