Skip to content

Table creation partition/cluster support #395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jlynchMicron opened this issue Jan 12, 2022 · 3 comments · Fixed by #928
Closed

Table creation partition/cluster support #395

jlynchMicron opened this issue Jan 12, 2022 · 3 comments · Fixed by #928
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@jlynchMicron
Copy link

Is your feature request related to a problem? Please describe.
Quickly looking through the source code, it looks like there is no support for partition/cluster configuration at time of table creation. The best reference I see anywhere close to this is in the function base.py -> get_indexes. So it looks like we can just get indexing information but not set indexing.

Describe the solution you'd like
Added support for index configuration.

Describe alternatives you've considered
Will probably need to make another SQL query to check metadata if indexing is in place, and if not, set the indexing via SQLalchemy text() function.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Jan 12, 2022
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jan 12, 2022
@tswast tswast added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Jan 13, 2022
@pykenny
Copy link

pykenny commented Sep 26, 2022

I think one of the challenges is that current DDL only supports defining clustering/partitioning on table creation, but doesn't support altering them afterwards. So I'm not sure if overriding Index class would work, or we should add a new class such as Clustering and Partitioning instead.

Current API allows modifying clustering settings in-place, but not for partitioning.

Also there are several constraints compared to regular relational DB indexing:

  • At most one clusterings/partitioning is allowed per table
  • Only top-level columns and specific data types are supported
    • Clustering does not allow sorting on derived column (e.g.: result from applying some function on a column, or sum of two numerical columns)
    • In most cases, partitioning relies on derived column to specify binning (e.g.: Truncated result given specified precision on (date)time values; Applying RANGE_BUCKET on integers)
  • Clustering has limit on number of columns for assigning multi-column sorting (maximum 4 columns for now)
  • Can only specify one column for partitioning
  • Optional filtering requirement on partitioned column, which also means type checking (or runtime?) should fail on query statements without specifying filtering condition against these columns

@jlynchMicron
Copy link
Author

Thanks @chalmerlowe !!!

@chalmerlowe
Copy link
Collaborator

Thank @nlenepveu
They did all the heavy lifting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
5 participants