Description
Is your feature request related to a problem? Please describe.
If I want to execute a particular query that bypasses the BQ query cache, I have to instantiate a new client object, pass it the QueryJobConfig
with use_query_cache=False
, and then pass that client into create_engine
, like below:
import sqlalchemy
from google.cloud import bigquery
job_config_kwargs = {}
job_config_kwargs["use_query_cache"] = False
job_config = bigquery.QueryJobConfig(**job_config_kwargs)
...
bq_client = bigquery.Client(
default_query_job_config=job_config,
...
)
create_engine_kwargs = {}
create_engine_kwargs["connect_args"]["client"] = bq_client
...
bq_engine = sqlalchemy.create_engine(url=..., **create_engine_kwargs)
with bq_engine.connect() as conn:
conn.execute("SELECT ... FROM ...")
The main issue with this is that create_engine
is slow for BigQuery compared to most other dialects, perhaps because of oauth2
checking the credentials passed into the BQ client. It also means that calling code has to keep track of the job_config
, bq_client
and bq_engine
objects.
For these reasons I would rather pass a parameter, perhaps a QueryJobConfig
object, and let the dialect and the DBAPI handle the client object.
Describe the solution you'd like
I see that BigQueryDialect
inherits from DefaultDialect
and that Cursor.execute
in the BigQuery DBAPI repo does include a parameter for job_config
. I wonder if it as easy as overriding do_execute
in the dialect within this repo to pass along that job_config
?
The caller could then use it like this:
import sqlalchemy
from google.cloud import bigquery
bq_engine = sqlalchemy.create_engine(url=...). # Just once! I can reuse it whether I want to use the cache or not!
job_config = bigquery.QueryJobConfig(use_query_cache=False)
with bq_engine.connect().execution_options(job_config=job_config) as conn:
conn.execute("query bypassing the cache")
with bq_engine.connect() as conn:
conn.execute("query potentially hitting the cache")
Describe alternatives you've considered
See the pseudocode above. It works for us, but it's cumbersome and causes a performance hit.