Fix partitioning by DATE column #1057

bnaul · 2024-04-05T16:45:24Z

chalmerlowe · 2024-04-15T18:10:39Z

sqlalchemy_bigquery/base.py

        if time_partitioning.field is not None:
            field = time_partitioning.field
            if isinstance(
+                table.columns[time_partitioning.field].type,
+                sqlalchemy.sql.sqltypes.DATE,
+            ):
+                return f"PARTITION BY {field}"
+            elif isinstance(
                table.columns[time_partitioning.field].type,
                sqlalchemy.sql.sqltypes.TIMESTAMP,


@tswast this PR is looking at adding logic to also handle partitioning on a date above and beyond using just DATE_TRUNC and TIMESTAMP_TRUNC functions.

Looking at the breakdown :TIMESTAMP_TRUNC: and :DATE_TRUNC: of how bigquery accepts time partitioning, it feels like we are missing a variety of the options/possibilities.

Am I missing something?

@chalmerlowe can you clarify what you think is missing? it looks like the API expects one of

Supported values are: * HOUR * DAY * MONTH * YEAR

so it seems like TIMESTAMP_TRUNC would work for all of those and DATE would work for MONTH and YEAR, so all that's missing is just plain DATE

Column Data Type HOUR DAY MONTH YEAR

DATE N/A Fixed by #1057 via DATE_TRUNC via DATE_TRUNC

DATETIME Currently via DATE_TRUNC TODO: use DATETIME_TRUNC Currently via DATE_TRUNC TODO: use DATETIME_TRUNC Currently via DATE_TRUNC TODO: use DATETIME_TRUNC Currently via DATE_TRUNC TODO: use DATETIME_TRUNC

TIMESTAMP via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC

_PARTITIONDATE N/A via DATE_TRUNC via DATE_TRUNC via DATE_TRUNC

_PARTITIONTIME Not supported TODO: USE TIMESTAMP_TRUNC Not supported TODO: USE TIMESTAMP_TRUNC Not supported TODO: USE TIMESTAMP_TRUNC Not supported TODO: USE TIMESTAMP_TRUNC

@bnaul For clarity, my question was less about whether this PR is correct and more focused on the bigger question:

Since we have the function _process_time_partitioning() and it handles two use cases (soon to be three based on this PR), does it cover all the use cases that it ought to?

Tim and I looked at this. I had a matrix to help me ID what the function does/doesn't do/should do.

Tim cleaned it up and posted it above. There are capabilities that are satisfied via an existing BigQuery function (i.e. DATE_TRUNC). But there are use cases that are not fully satisfied OR not supported at all. Tim provides suggestions for how to resolve those gaps.

NOTE: In no way am I saying you need to do all this work. This is more for me as a maintainer to figure out what I want the end state to be either through this PR or a follow-on PR.

ah yes that matrix is very helpful, thanks!

Just curious @chalmerlowe @tswast, any more thoughts on whether this PR should go ahead or wait on additional changes? To me it's distinct enough from the other "Not supported" cases above that it makes sense to keep them separate, but obviously I'm a bit biased 🙂

chalmerlowe · 2024-05-02T19:22:21Z

@bnaul

Updated: Copied the matrix to a new issue. Beyond that I intend to accept this PR.
Will need to run it through the kokoro test suite and will likely take a last look at it as it stands to make sure I am not missing anything in regards to adding this nugget of functionality.

chalmerlowe · 2024-05-02T20:31:00Z

The reason the Kokoro check is failing is because we do not detect full test coverage.
The coverage tool is set to monitor our overall unit test suite and confirm that the unit tests cover all the paths through the code. We do NOT apply coverage to assessing the tests that run during system testing and are found in the system folder.

Because the test we have included here is in the system folder and we do not get credit for testing the new branch.

I recommend adding a test to the unit folder that covers this pathway.

chalmerlowe · 2024-05-06T17:46:02Z

@bnaul

I was looking at the possibility of adding a unit test or two to this and run into a snag.
Checking with @tswast offline to see if I can get some clarity on what I am seeing.

More to come.

chalmerlowe · 2024-05-09T13:42:30Z

@bnaul

there is a PR on your branch in your Fork. Can you take a look?

bnaul · 2024-05-09T13:51:12Z

Thanks @chalmerlowe, I just merged. The conventionalcommits.org check got offended by something (maybe I should have squash merged?) so I just squashed your change into mine and force pushed here

tests/system/test_sqlalchemy_bigquery.py

tests/unit/test_table_options.py

chalmerlowe · 2024-05-09T15:23:49Z

gonna run the tests again to see if those tweaks fixed the linting failure.

Fixes googleapis#1056.

chalmerlowe

LGTM

chalmerlowe · 2024-05-09T18:53:56Z

@bnaul

Github won't let me merge in main cause this is not a PR I created but one created by a third party.

We have two paths:

You can allow the maintainers of python-bigquery-sqlalchemy permission on your Fork so that I or my colleagues can merge PRs to your PR (including merging in main). Instructions here.
OR
We can go back and forth like this until we achieve mutual success.

= )

We are so close. And we can mark this down as a success.

bnaul · 2024-05-09T20:01:28Z

@chalmerlowe I looked for that option but am not seeing it anywhere...the instructions suggest it should be in this part of the UI. Maybe it would be easier if you recreated the PR from a branch in this repo?

bnaul · 2024-05-09T20:02:41Z

Ah it looks like this is not supported for PRs from organizations, just from users: https://github.com/orgs/community/discussions/5634

chalmerlowe · 2024-05-10T00:04:09Z

@bnaul

I am gonna recreate the PR.
I will do what I can to ensure that you get recognition for the contribution, but the commit history may get mangled.

Will see.

Thanks for trying.

bnaul · 2024-05-10T01:06:29Z

Thanks, I will survive either way 😅

bnaul · 2024-05-13T15:06:55Z

Closing in favor of #1074 which @chalmerlowe should now have permissions to edit!

bnaul requested review from a team as code owners April 5, 2024 16:45

bnaul requested a review from farhan0102 April 5, 2024 16:45

product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. size: xl Pull request size is extra large. and removed size: s Pull request size is small. labels Apr 5, 2024

bnaul force-pushed the date_partition branch from 1490a8a to fd9e6f1 Compare April 5, 2024 18:18

product-auto-label bot added size: s Pull request size is small. and removed size: xl Pull request size is extra large. labels Apr 5, 2024

bnaul force-pushed the date_partition branch from fd9e6f1 to 43ac9e8 Compare April 5, 2024 18:20

chalmerlowe self-assigned this Apr 8, 2024

chalmerlowe mentioned this pull request Apr 8, 2024

Error when creating tables time partitioned by DATE column with type DAY #1050

Closed

chalmerlowe reviewed Apr 15, 2024

View reviewed changes

chalmerlowe added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels May 2, 2024

yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels May 2, 2024

chalmerlowe mentioned this pull request May 2, 2024

Add capability to partition tables by the full spectrum of supported capabilities #1072

Open

bnaul force-pushed the date_partition branch from de07f7b to 8c13404 Compare May 9, 2024 13:50

bnaul requested review from tswast and chalmerlowe May 9, 2024 13:51

chalmerlowe added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 9, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 9, 2024

chalmerlowe reviewed May 9, 2024

View reviewed changes

tests/system/test_sqlalchemy_bigquery.py Outdated Show resolved Hide resolved

chalmerlowe reviewed May 9, 2024

View reviewed changes

tests/unit/test_table_options.py Show resolved Hide resolved

chalmerlowe added the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024

chalmerlowe added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels May 9, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 9, 2024

fix: Fix partitioning by DATE column

3da9bee

Fixes googleapis#1056.

bnaul force-pushed the date_partition branch from 44cb08a to 3da9bee Compare May 9, 2024 15:54

chalmerlowe added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels May 9, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 9, 2024

chalmerlowe approved these changes May 9, 2024

View reviewed changes

bnaul mentioned this pull request May 13, 2024

feat: enable partitioning by DATE column #1074

Merged

bnaul closed this May 13, 2024

chalmerlowe mentioned this pull request Sep 11, 2024

Feat: Update partitioning by DATE, DATETIME, TIMESTAMP, _PARTITIONDATE #1113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix partitioning by DATE column #1057

Fix partitioning by DATE column #1057

bnaul commented Apr 5, 2024

chalmerlowe Apr 15, 2024 •

edited

Loading

bnaul Apr 15, 2024

tswast Apr 18, 2024

chalmerlowe Apr 19, 2024

bnaul Apr 19, 2024

bnaul Apr 26, 2024

chalmerlowe commented May 2, 2024 •

edited

Loading

chalmerlowe commented May 2, 2024

chalmerlowe commented May 6, 2024

chalmerlowe commented May 9, 2024 •

edited

Loading

bnaul commented May 9, 2024

chalmerlowe commented May 9, 2024

chalmerlowe left a comment

chalmerlowe commented May 9, 2024

bnaul commented May 9, 2024 •

edited

Loading

bnaul commented May 9, 2024

chalmerlowe commented May 10, 2024

bnaul commented May 10, 2024

bnaul commented May 13, 2024

Column Data Type	HOUR	DAY	MONTH	YEAR
DATE	N/A	Fixed by #1057	via DATE_TRUNC	via DATE_TRUNC
DATETIME	Currently via DATE_TRUNC TODO: use DATETIME_TRUNC	Currently via DATE_TRUNC TODO: use DATETIME_TRUNC	Currently via DATE_TRUNC TODO: use DATETIME_TRUNC	Currently via DATE_TRUNC TODO: use DATETIME_TRUNC
TIMESTAMP	via TIMESTAMP_TRUNC	via TIMESTAMP_TRUNC	via TIMESTAMP_TRUNC	via TIMESTAMP_TRUNC
_PARTITIONDATE	N/A	via DATE_TRUNC	via DATE_TRUNC	via DATE_TRUNC
_PARTITIONTIME	Not supported TODO: USE TIMESTAMP_TRUNC	Not supported TODO: USE TIMESTAMP_TRUNC	Not supported TODO: USE TIMESTAMP_TRUNC	Not supported TODO: USE TIMESTAMP_TRUNC

Fix partitioning by DATE column #1057

Fix partitioning by DATE column #1057

Conversation

bnaul commented Apr 5, 2024

chalmerlowe Apr 15, 2024 • edited Loading

Choose a reason for hiding this comment

bnaul Apr 15, 2024

Choose a reason for hiding this comment

tswast Apr 18, 2024

Choose a reason for hiding this comment

chalmerlowe Apr 19, 2024

Choose a reason for hiding this comment

bnaul Apr 19, 2024

Choose a reason for hiding this comment

bnaul Apr 26, 2024

Choose a reason for hiding this comment

chalmerlowe commented May 2, 2024 • edited Loading

chalmerlowe commented May 2, 2024

chalmerlowe commented May 6, 2024

chalmerlowe commented May 9, 2024 • edited Loading

bnaul commented May 9, 2024

chalmerlowe commented May 9, 2024

chalmerlowe left a comment

Choose a reason for hiding this comment

chalmerlowe commented May 9, 2024

bnaul commented May 9, 2024 • edited Loading

bnaul commented May 9, 2024

chalmerlowe commented May 10, 2024

bnaul commented May 10, 2024

bnaul commented May 13, 2024

chalmerlowe Apr 15, 2024 •

edited

Loading

chalmerlowe commented May 2, 2024 •

edited

Loading

chalmerlowe commented May 9, 2024 •

edited

Loading

bnaul commented May 9, 2024 •

edited

Loading