-
Notifications
You must be signed in to change notification settings - Fork 138
Fix partitioning by DATE column #1057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if time_partitioning.field is not None: | ||
field = time_partitioning.field | ||
if isinstance( | ||
table.columns[time_partitioning.field].type, | ||
sqlalchemy.sql.sqltypes.DATE, | ||
): | ||
return f"PARTITION BY {field}" | ||
elif isinstance( | ||
table.columns[time_partitioning.field].type, | ||
sqlalchemy.sql.sqltypes.TIMESTAMP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tswast this PR is looking at adding logic to also handle partitioning on a date above and beyond using just DATE_TRUNC and TIMESTAMP_TRUNC functions.
Looking at the breakdown :TIMESTAMP_TRUNC: and :DATE_TRUNC: of how bigquery accepts time partitioning, it feels like we are missing a variety of the options/possibilities.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chalmerlowe can you clarify what you think is missing? it looks like the API expects one of
so it seems like TIMESTAMP_TRUNC
would work for all of those and DATE
would work for MONTH
and YEAR
, so all that's missing is just plain DATE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Column Data Type | HOUR | DAY | MONTH | YEAR |
---|---|---|---|---|
DATE | N/A | Fixed by #1057 | via DATE_TRUNC | via DATE_TRUNC |
DATETIME | Currently via DATE_TRUNC TODO: use DATETIME_TRUNC | Currently via DATE_TRUNC TODO: use DATETIME_TRUNC | Currently via DATE_TRUNC TODO: use DATETIME_TRUNC | Currently via DATE_TRUNC TODO: use DATETIME_TRUNC |
TIMESTAMP | via TIMESTAMP_TRUNC | via TIMESTAMP_TRUNC | via TIMESTAMP_TRUNC | via TIMESTAMP_TRUNC |
_PARTITIONDATE | N/A | via DATE_TRUNC | via DATE_TRUNC | via DATE_TRUNC |
_PARTITIONTIME | Not supported TODO: USE TIMESTAMP_TRUNC | Not supported TODO: USE TIMESTAMP_TRUNC | Not supported TODO: USE TIMESTAMP_TRUNC | Not supported TODO: USE TIMESTAMP_TRUNC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnaul For clarity, my question was less about whether this PR is correct and more focused on the bigger question:
Since we have the function _process_time_partitioning()
and it handles two use cases (soon to be three based on this PR), does it cover all the use cases that it ought to?
Tim and I looked at this. I had a matrix to help me ID what the function does/doesn't do/should do.
Tim cleaned it up and posted it above. There are capabilities that are satisfied via an existing BigQuery function (i.e. DATE_TRUNC). But there are use cases that are not fully satisfied OR not supported at all. Tim provides suggestions for how to resolve those gaps.
NOTE: In no way am I saying you need to do all this work. This is more for me as a maintainer to figure out what I want the end state to be either through this PR or a follow-on PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes that matrix is very helpful, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious @chalmerlowe @tswast, any more thoughts on whether this PR should go ahead or wait on additional changes? To me it's distinct enough from the other "Not supported" cases above that it makes sense to keep them separate, but obviously I'm a bit biased 🙂
The reason the Kokoro check is failing is because we do not detect full test coverage. Because the test we have included here is in the I recommend adding a test to the |
there is a PR on your branch in your Fork. Can you take a look? |
Thanks @chalmerlowe, I just merged. The conventionalcommits.org check got offended by something (maybe I should have squash merged?) so I just squashed your change into mine and force pushed here |
gonna run the tests again to see if those tweaks fixed the linting failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Github won't let me merge in We have two paths:
= ) We are so close. And we can mark this down as a success. |
@chalmerlowe I looked for that option but am not seeing it anywhere...the instructions suggest it should be in this part of the UI. Maybe it would be easier if you recreated the PR from a branch in this repo? |
Ah it looks like this is not supported for PRs from organizations, just from users: https://github.com/orgs/community/discussions/5634 |
I am gonna recreate the PR. Will see. Thanks for trying. |
Thanks, I will survive either way 😅 |
Closing in favor of #1074 which @chalmerlowe should now have permissions to edit! |
Fixes #1056