Skip to content

Unable compare value from records array #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elsiniestra opened this issue Jul 29, 2021 · 9 comments · Fixed by #318
Closed

Unable compare value from records array #233

elsiniestra opened this issue Jul 29, 2021 · 9 comments · Fixed by #318
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@elsiniestra
Copy link

elsiniestra commented Jul 29, 2021

I need to create a join of 2 BQ tables where one of them has an id field that has to be compared with the record id in the repeated field (array of records) of another one. I'm struggling with SQLAlchemy implementation of the SQL query that works fine. Tried to solve this problem in different ways, but nothing was successful, most relevant and similar to sql-like solution below. Probably, there are some troubles with sqlalchemy interpretation of SQL queries or result fetch either, aren't?

Environment details

  • OS type and version: MacOS, 12.0 beta
  • Python version: 3.9.1
  • pip version: 21.1.2
  • google-cloud-bigquery version: 2.22.1

Steps to reproduce

  1. Create the BQ table Mock with fields:

    • name: mock_id | type: STRING | mode: NULLABLE
  2. Create the BQ table AnotherMock with fields:

    • name: objects | type: RECORD | mode: REPEATED
      • name: object_id | type: STRING | mode: NULLABLE
  3. Run code below (It gonna fail).

  4. Run SQL below (it gonna success).

SQL example

SELECT *
FROM db.dataset.mock mock
INNER JOIN (SELECT another_mock_objects FROM db.dataset.another_mock, UNNEST(objects) as another_mock_objects) another_mock
ON mock.mock_id = another_mock.another_mock_objects.object_id

Code example

 subquery = session.query(
    func.unnest(another_mock_model.c.objects).label('another_mock_objects')
).subquery()

session.execute(
    select([]) \
        .select_from(
            mock_model
                .join(
                    another_mock_model,
                    subquery.c.another_mock_objects.object_id == mock_model.c.mock_id,
                )
        )
)

Stack trace

AttributeError: 'Comparator' object has no attribute 'object_id'

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Jul 29, 2021
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Aug 1, 2021
@jimfulton jimfulton self-assigned this Aug 9, 2021
@tswast tswast added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Aug 9, 2021
@jimfulton
Copy link
Contributor

@fallncrlss could you please share the SQLAlchemy model definitions you used for the code examples?

@elsiniestra
Copy link
Author

@fallncrlss could you please share the SQLAlchemy model definitions you used for the code examples?

In my case, due to some limitations of business logic, I should define SQLAlchemy BQ models dynamically like the example below:

from sqlalchemy import create_engine, Table, MetaData


engine = create_engine('BQ_CONNECTION_URL', pool_size=POOL_SIZE)
SomeTable = Table('some_table_name', MetaData(bind=engine), autoload=True)

@jimfulton
Copy link
Contributor

@fallncrlss please share the code you used to define Mock and AnotherMock (or whatever you actually named them). I think I know what's going on, but being able to reproduce what you did exactly would help to be sure and reduce the chance of wasting everyone's time.

@jimfulton
Copy link
Contributor

jimfulton commented Aug 19, 2021

There are two issues.

  1. Currently, record/struct data are exposed to SQLAlchemy as JSON. So, to access your record fields, you need to use mapping syntax.

    For example, given a table containing structs with an object_id field, to select records with an object id of 'x':

    select([mock_structs], mock_structs.c.object['object_id']=='x')

  2. To use unnest, you have to use alias, rather than label:

    subquery = session.query(
        func.unnest(another_mock.c.objects).alias('another_mock_objects')
    ).subquery()
    

    However, there's a bug in handling of unnest: unnest isn't handled correctly #283

@elsiniestra
Copy link
Author

@fallncrlss please share the code you used to define Mock and AnotherMock (or whatever you actually named them). I think I know what's going on, but being able to reproduce what you did exactly would help to be sure and reduce the chance of wasting everyone's time.

Mock and AnotherMock model schemas define directly in BigQuery, and in the code I define them dynamically as in the example above using Table and the table names.

@jimfulton
Copy link
Contributor

@fallncrlss please share the code you used to define Mock and AnotherMock (or whatever you actually named them). I think I know what's going on, but being able to reproduce what you did exactly would help to be sure and reduce the chance of wasting everyone's time.

Mock and AnotherMock model schemas define directly in BigQuery, and in the code I define them dynamically as in the example above using Table and the table names.

OK, fair enough. Then I think I reproduced your scenario.

@jimfulton
Copy link
Contributor

More info on what I said about using alias:

Some references invoking functions like unnest:

See: https://docs.sqlalchemy.org/en/14/tutorial/data_select.html#tutorial-functions-table-valued

@jimfulton
Copy link
Contributor

1. Currently, record/struct data are exposed to SQLAlchemy as JSON.  So, to access your record fields, you need to use mapping syntax.

Well, that doesn't work either. :( #293

@jimfulton
Copy link
Contributor

@fallncrlss see

subquery = select(
func.unnest(another_mock_table.c.objects).alias("another_mock_objects").column
).subquery()
join = mock_table.join(
subquery, subquery.c.another_mock_objects["object_id"] == mock_table.c.mock_id,
)
query = select(mock_table).select_from(join)

for an example of how to do what you want to do with sqlalchemy-bigquery 1.2.0 or later.

Note that dot notation now works if the name doesn't conflict with an internal name, so:

subquery.c.another_mock_objects.object_id == mock_table.c.mock_id

and if there's a chance of a conflict, you can use upper case to disambiguate:

subquery.c.another_mock_objects.OBJECT_ID == mock_table.c.mock_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants