Skip to content

Fix partitioning by DATE column #1057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

bnaul
Copy link
Contributor

@bnaul bnaul commented Apr 5, 2024

Fixes #1056

@bnaul bnaul requested review from a team as code owners April 5, 2024 16:45
@bnaul bnaul requested a review from farhan0102 April 5, 2024 16:45
@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. size: xl Pull request size is extra large. and removed size: s Pull request size is small. labels Apr 5, 2024
@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: xl Pull request size is extra large. labels Apr 5, 2024
@chalmerlowe chalmerlowe self-assigned this Apr 8, 2024
Comment on lines 831 to 840
if time_partitioning.field is not None:
field = time_partitioning.field
if isinstance(
table.columns[time_partitioning.field].type,
sqlalchemy.sql.sqltypes.DATE,
):
return f"PARTITION BY {field}"
elif isinstance(
table.columns[time_partitioning.field].type,
sqlalchemy.sql.sqltypes.TIMESTAMP,
Copy link
Collaborator

@chalmerlowe chalmerlowe Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast this PR is looking at adding logic to also handle partitioning on a date above and beyond using just DATE_TRUNC and TIMESTAMP_TRUNC functions.

Looking at the breakdown :TIMESTAMP_TRUNC: and :DATE_TRUNC: of how bigquery accepts time partitioning, it feels like we are missing a variety of the options/possibilities.

Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chalmerlowe can you clarify what you think is missing? it looks like the API expects one of

Supported values are: * HOUR * DAY * MONTH * YEAR

so it seems like TIMESTAMP_TRUNC would work for all of those and DATE would work for MONTH and YEAR, so all that's missing is just plain DATE

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column Data Type HOUR DAY MONTH YEAR
DATE N/A Fixed by #1057 via DATE_TRUNC via DATE_TRUNC
DATETIME Currently via DATE_TRUNC TODO: use  DATETIME_TRUNC Currently via DATE_TRUNC TODO: use  DATETIME_TRUNC Currently via DATE_TRUNC TODO: use  DATETIME_TRUNC Currently via DATE_TRUNC TODO: use  DATETIME_TRUNC
TIMESTAMP via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC
_PARTITIONDATE N/A via DATE_TRUNC via DATE_TRUNC via DATE_TRUNC
_PARTITIONTIME Not supported TODO: USE TIMESTAMP_TRUNC Not supported TODO: USE TIMESTAMP_TRUNC Not supported TODO: USE TIMESTAMP_TRUNC Not supported TODO: USE TIMESTAMP_TRUNC

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnaul For clarity, my question was less about whether this PR is correct and more focused on the bigger question:

Since we have the function _process_time_partitioning() and it handles two use cases (soon to be three based on this PR), does it cover all the use cases that it ought to?

Tim and I looked at this. I had a matrix to help me ID what the function does/doesn't do/should do.

Tim cleaned it up and posted it above. There are capabilities that are satisfied via an existing BigQuery function (i.e. DATE_TRUNC). But there are use cases that are not fully satisfied OR not supported at all. Tim provides suggestions for how to resolve those gaps.

NOTE: In no way am I saying you need to do all this work. This is more for me as a maintainer to figure out what I want the end state to be either through this PR or a follow-on PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes that matrix is very helpful, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious @chalmerlowe @tswast, any more thoughts on whether this PR should go ahead or wait on additional changes? To me it's distinct enough from the other "Not supported" cases above that it makes sense to keep them separate, but obviously I'm a bit biased 🙂

@chalmerlowe
Copy link
Collaborator

chalmerlowe commented May 2, 2024

@bnaul

Updated: Copied the matrix to a new issue. Beyond that I intend to accept this PR.
Will need to run it through the kokoro test suite and will likely take a last look at it as it stands to make sure I am not missing anything in regards to adding this nugget of functionality.

@chalmerlowe chalmerlowe added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels May 2, 2024
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels May 2, 2024
@chalmerlowe
Copy link
Collaborator

The reason the Kokoro check is failing is because we do not detect full test coverage.
The coverage tool is set to monitor our overall unit test suite and confirm that the unit tests cover all the paths through the code. We do NOT apply coverage to assessing the tests that run during system testing and are found in the system folder.

Because the test we have included here is in the system folder and we do not get credit for testing the new branch.

I recommend adding a test to the unit folder that covers this pathway.

@chalmerlowe
Copy link
Collaborator

@bnaul

I was looking at the possibility of adding a unit test or two to this and run into a snag.
Checking with @tswast offline to see if I can get some clarity on what I am seeing.

More to come.

@chalmerlowe
Copy link
Collaborator

chalmerlowe commented May 9, 2024

@bnaul

there is a PR on your branch in your Fork. Can you take a look?

@bnaul bnaul force-pushed the date_partition branch from de07f7b to 8c13404 Compare May 9, 2024 13:50
@bnaul
Copy link
Contributor Author

bnaul commented May 9, 2024

Thanks @chalmerlowe, I just merged. The conventionalcommits.org check got offended by something (maybe I should have squash merged?) so I just squashed your change into mine and force pushed here

@bnaul bnaul requested review from tswast and chalmerlowe May 9, 2024 13:51
@chalmerlowe chalmerlowe added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 9, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 9, 2024
@chalmerlowe chalmerlowe added the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024
@chalmerlowe chalmerlowe added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels May 9, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024
@chalmerlowe
Copy link
Collaborator

gonna run the tests again to see if those tweaks fixed the linting failure.

@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 9, 2024
@bnaul bnaul force-pushed the date_partition branch from 44cb08a to 3da9bee Compare May 9, 2024 15:54
@chalmerlowe chalmerlowe added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels May 9, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 9, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 9, 2024
Copy link
Collaborator

@chalmerlowe chalmerlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chalmerlowe
Copy link
Collaborator

@bnaul

Github won't let me merge in main cause this is not a PR I created but one created by a third party.

We have two paths:

  • You can allow the maintainers of python-bigquery-sqlalchemy permission on your Fork so that I or my colleagues can merge PRs to your PR (including merging in main). Instructions here.
    OR
  • We can go back and forth like this until we achieve mutual success.

= )

We are so close. And we can mark this down as a success.

@bnaul
Copy link
Contributor Author

bnaul commented May 9, 2024

@chalmerlowe I looked for that option but am not seeing it anywhere...the instructions suggest it should be in this part of the UI. Maybe it would be easier if you recreated the PR from a branch in this repo?
image

@bnaul
Copy link
Contributor Author

bnaul commented May 9, 2024

Ah it looks like this is not supported for PRs from organizations, just from users: https://github.com/orgs/community/discussions/5634

@chalmerlowe
Copy link
Collaborator

@bnaul

I am gonna recreate the PR.
I will do what I can to ensure that you get recognition for the contribution, but the commit history may get mangled.

Will see.

Thanks for trying.

@bnaul
Copy link
Contributor Author

bnaul commented May 10, 2024

Thanks, I will survive either way 😅

@bnaul
Copy link
Contributor Author

bnaul commented May 13, 2024

Closing in favor of #1074 which @chalmerlowe should now have permissions to edit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. size: s Pull request size is small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Time partitioning on DATE column fails with type_="DAY"
4 participants