Skip to content

Allow a use_query_cache override directly in create_engine or execution_options #1112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gallodustin opened this issue Aug 30, 2024 · 3 comments · Fixed by #1167
Closed

Allow a use_query_cache override directly in create_engine or execution_options #1112

gallodustin opened this issue Aug 30, 2024 · 3 comments · Fixed by #1167
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@gallodustin
Copy link

Is your feature request related to a problem? Please describe.
If I want to execute a particular query that bypasses the BQ query cache, I have to instantiate a new client object, pass it the QueryJobConfig with use_query_cache=False, and then pass that client into create_engine, like below:

import sqlalchemy
from google.cloud import bigquery

job_config_kwargs = {}
job_config_kwargs["use_query_cache"] = False
job_config = bigquery.QueryJobConfig(**job_config_kwargs)
...
bq_client = bigquery.Client(
    default_query_job_config=job_config,
    ...
)

create_engine_kwargs = {}
create_engine_kwargs["connect_args"]["client"] = bq_client
...
bq_engine = sqlalchemy.create_engine(url=..., **create_engine_kwargs)
with bq_engine.connect() as conn:
    conn.execute("SELECT ... FROM ...")

The main issue with this is that create_engine is slow for BigQuery compared to most other dialects, perhaps because of oauth2 checking the credentials passed into the BQ client. It also means that calling code has to keep track of the job_config, bq_client and bq_engine objects.

For these reasons I would rather pass a parameter, perhaps a QueryJobConfig object, and let the dialect and the DBAPI handle the client object.

Describe the solution you'd like
I see that BigQueryDialect inherits from DefaultDialect and that Cursor.execute in the BigQuery DBAPI repo does include a parameter for job_config. I wonder if it as easy as overriding do_execute in the dialect within this repo to pass along that job_config?

The caller could then use it like this:

import sqlalchemy
from google.cloud import bigquery

bq_engine = sqlalchemy.create_engine(url=...). # Just once! I can reuse it whether I want to use the cache or not!
job_config = bigquery.QueryJobConfig(use_query_cache=False)
with bq_engine.connect().execution_options(job_config=job_config) as conn:
    conn.execute("query bypassing the cache")
with bq_engine.connect() as conn:
    conn.execute("query potentially hitting the cache")

Describe alternatives you've considered
See the pseudocode above. It works for us, but it's cumbersome and causes a performance hit.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Aug 30, 2024
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Sep 3, 2024
@tswast
Copy link
Collaborator

tswast commented Feb 25, 2025

Apologies for the late reply. What you describe should already be supported. There are already two ways to pass default job config parameters:

  1. Via the connection string: https://github.com/googleapis/python-bigquery-sqlalchemy/blob/6bee2e40e00b2779e1c6a8de19287a11c9c6a5a6/README.rst#connection-string-parameters
  2. Via passing in a Client object: https://github.com/googleapis/python-bigquery-sqlalchemy/blob/6bee2e40e00b2779e1c6a8de19287a11c9c6a5a6/README.rst#supplying-your-own-bigquery-client

@tswast
Copy link
Collaborator

tswast commented Feb 25, 2025

For your use case, you could follow option 2. The Client.default_query_job_config property is intentionally mutable for this sort of use case. (See the setter here: https://github.com/googleapis/python-bigquery/blob/fea49ffbf8aa1d53451864ceb7fd73189b6661cb/google/cloud/bigquery/client.py#L298)

@tswast
Copy link
Collaborator

tswast commented Feb 25, 2025

I'm not familiar with execution_options, but this seems a reasonable request as an alternative / more narrowly scoped change to default options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants