Skip to content

Allow a use_query_cache override directly in create_engine or execution_options #1112

Closed
@galloviolet

Description

@galloviolet

Is your feature request related to a problem? Please describe.
If I want to execute a particular query that bypasses the BQ query cache, I have to instantiate a new client object, pass it the QueryJobConfig with use_query_cache=False, and then pass that client into create_engine, like below:

import sqlalchemy
from google.cloud import bigquery

job_config_kwargs = {}
job_config_kwargs["use_query_cache"] = False
job_config = bigquery.QueryJobConfig(**job_config_kwargs)
...
bq_client = bigquery.Client(
    default_query_job_config=job_config,
    ...
)

create_engine_kwargs = {}
create_engine_kwargs["connect_args"]["client"] = bq_client
...
bq_engine = sqlalchemy.create_engine(url=..., **create_engine_kwargs)
with bq_engine.connect() as conn:
    conn.execute("SELECT ... FROM ...")

The main issue with this is that create_engine is slow for BigQuery compared to most other dialects, perhaps because of oauth2 checking the credentials passed into the BQ client. It also means that calling code has to keep track of the job_config, bq_client and bq_engine objects.

For these reasons I would rather pass a parameter, perhaps a QueryJobConfig object, and let the dialect and the DBAPI handle the client object.

Describe the solution you'd like
I see that BigQueryDialect inherits from DefaultDialect and that Cursor.execute in the BigQuery DBAPI repo does include a parameter for job_config. I wonder if it as easy as overriding do_execute in the dialect within this repo to pass along that job_config?

The caller could then use it like this:

import sqlalchemy
from google.cloud import bigquery

bq_engine = sqlalchemy.create_engine(url=...). # Just once! I can reuse it whether I want to use the cache or not!
job_config = bigquery.QueryJobConfig(use_query_cache=False)
with bq_engine.connect().execution_options(job_config=job_config) as conn:
    conn.execute("query bypassing the cache")
with bq_engine.connect() as conn:
    conn.execute("query potentially hitting the cache")

Describe alternatives you've considered
See the pseudocode above. It works for us, but it's cumbersome and causes a performance hit.

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-sqlalchemy API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions