-
Notifications
You must be signed in to change notification settings - Fork 138
Feat: Update partitioning by DATE, DATETIME, TIMESTAMP, _PARTITIONDATE #1113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use -- conventional-commit-lint bot |
tests/unit/test_table_options.py
Outdated
@@ -132,83 +188,89 @@ def test_table_require_partition_filter_dialect_option(faux_conn): | |||
bigquery_require_partition_filter=True, | |||
) | |||
|
|||
assert " ".join(faux_conn.test_data["execute"][-1][0].strip().split()) == ( | |||
result = " ".join(faux_conn.test_data["execute"][-1][0].strip().split()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
It seems this test is covered in test_table_time_partitioning_date_timestamp_and_datetime_dialect_option
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deduplicated.
# is DAY. This case overwrites that to avoid making a breaking change in | ||
# python-bigquery. | ||
# https://github.com/googleapis/python-bigquery/blob/a4d9534a900f13ae7355904cda05097d781f27e3/google/cloud/bigquery/table.py#L2916 | ||
if trunc_fn == "DATE_TRUNC" and partitioning_period == "DAY": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this would be added when you complete the draft, I think we need to set the default value of partitioning_period
somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unclear on why we might need to create a default value of partitioning_period
. It is set based on the value of time_partitioning.type_
on line 873
.
time_partitioning.type_
is set in the python-bigquery
library in the TimePartitioning class.
When that class is instantiated, the default for type_
in the __init__
is None
but it is then processed and immediately overwritten by DAY
as shown at the link above.
So by the time .type_
gets to us, it will either be DAY
or will be set to something else that we use to set the variable named partitioning_period
.
NOTE: I rename the variable to partitioning_period
simply to make this code more readable. type_
seemed less intuitive than partitioning_period
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I missed the part in __init__()
that sets the default value.
* chore(deps): update all dependencies * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * Update protobuf to 5.28.3 * pin google-crc32c for python 3.7/3.8 * Pin mako for python 3.8 * Pin markupsafe for Python 3.8 * Pin pyparsing for python 3.8 * Pin pyparsing for Python 3.8 --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
* chore(deps): update all dependencies * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
@Linchin Please take a look. I have taken this out of draft. As far as I recall, this should be finished and ready for final review/approval. |
assert " ".join(faux_conn.test_data["execute"][-1][0].strip().split()) == ( | ||
"CREATE TABLE `some_table` ( `id` INT64, `createdAt` TIMESTAMP )" | ||
" PARTITION BY TIMESTAMP_TRUNC(createdAt, DAY)" | ||
result = " ".join(faux_conn.test_data["execute"][-1][0].strip().split()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the second half of this test does the following, as the docstring suggests:
Confirms that if the column datatype is DATETIME but no TimePartitioning.type_
has been supplied, the system will default to DAY.
Maybe I'm missing something, but I think it's a duplicate of the test case at line 136.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between this test and the parametrized test above is minor.
Essentially, in the parameters on line 136 we explicitly provide a value of None
as a type_
argument to the TimePartitioning class.
We know the default is None
in python-bigquery
but we set the value deliberately.
In line 193, we do not set the value of type_
explicitly, we allow the default value (which should be None
) come through and thus confirm that everything works as expected whether we provide a value to type_
or not.
assert " ".join(faux_conn.test_data["execute"][-1][0].strip().split()) == ( | ||
"CREATE TABLE `some_table_2` ( `id` INT64, `createdAt` DATE )" | ||
" PARTITION BY createdAt" | ||
result = " ".join(faux_conn.test_data["execute"][-1][0].strip().split()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test, we do NOT provide a field
argument to the TimePartitioning class whereas in the tests above, we do.
This is needed to confirm that in the absence of a field
argument the SQL statement will not use a function, etc but will instead use _PARTITIONDATE
Both the tests above end up including a truncation function but this on does not (and should not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Two small suggestions, could we:
- rename the tests to reflect what they are intended to test
- the table in the first comment seems a little outdated, could we update it?
This adds some additional functionality to properly handle partitioning of columns with the following datatypes.
Where appropriate, ensures the following functions can be used with/or without the following TimePartitioningTypes (HOUR, DAY, MONTH, YEAR).
This is a nearly complete fix for #1072. A revised table from #1072 is included here:
NOTES: