feat: allow to set clustering and time partitioning options at table creation #928

nlenepveu · 2023-12-04T21:51:05Z

I suggest a way to pass more options at table creation to specify table partitioning and clustering, as well as options that are not yet supported.

table = sqlalchemy.Table(
    "some_table",
    sqlalchemy.Column("id", sqlalchemy.Integer),
    sqlalchemy.Column("country", sqlalchemy.Text),
    sqlalchemy.Column("town", sqlalchemy.Text),
    sqlalchemy.Column("createdAt", sqlalchemy.DateTime),
    bigquery_expiration_timestamp=datetime.datetime.fromisoformat("2038-01-01T00:00:00+00:00"),
    bigquery_partition_expiration_days=30,
    bigquery_require_partition_filter=True,
    bigquery_default_rounding_mode="ROUND_HALF_EVEN",
    bigquery_clustering_fields=["country", "town"],
    bigquery_partitioning="DATE(createdAt)",
)

Which generates this DDL script:

CREATE TABLE `some_table` (
    `id` INT64,
    `country` STRING,
    `town` STRING,
    `createdAt` DATETIME
)
PARTITION BY DATE(createdAt)
CLUSTER BY country, town
OPTIONS(expiration_timestamp=TIMESTAMP '2038-01-01 00:00:00+00:00', partition_expiration_days=30, require_partition_filter=true, default_rounding_mode='ROUND_HALF_EVEN')

After reading the contribution guidelines, I realized and agreed that the design should have been discussed before submitting this change request.. Anyway, this proposal is quite straightforward and stick to the python way to write programming interface so I hope it could get your attention.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #395 🦕

…ons (expiration_timestamp, expiration_timestamp, require_partition_filter, default_rounding_mode) via create_table dialect options

nlenepveu · 2023-12-04T22:27:57Z

This is also pretty close to what have been done in the dialect for Snowflake.
https://github.com/snowflakedb/snowflake-sqlalchemy/blob/669f6e4fdc218ea554d7cea04ff9324e7a0ef0e2/src/snowflake/sqlalchemy/base.py?#L492-L528

…xes leads to bad autogenerated version file def upgrade() -> None: # ### commands auto generated by Alembic - please adjust! ### op.drop_index('clustering', table_name='dataset.some_table') op.drop_index('partition', table_name='dataset.some_table') # ### end Alembic commands ### def downgrade() -> None: # ### commands auto generated by Alembic - please adjust! ### op.create_index('partition', 'dataset.some_table', ['createdAt'], unique=False) op.create_index('clustering', 'dataset.some_table', ['id', 'createdAt'], unique=False) # ### end Alembic commands ###

…ed table as well as other newly supported table options

parthea · 2023-12-08T17:12:17Z

@nlenepveu,
Please could you update the system test also

python-bigquery-sqlalchemy/tests/system/test_sqlalchemy_bigquery.py

Line 594 in 8952f02

    
           def test_get_indexes(inspector, inspector_using_test_dataset, bigquery_dataset):

to prepare this PR for review?

=================================== FAILURES ===================================
_______________________________ test_get_indexes _______________________________
Traceback (most recent call last):
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/_pytest/runner.py", line 341, in from_call
    result: Optional[TResult] = func()
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/_pytest/runner.py", line 262, in 
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/pluggy/_hooks.py", line 493, in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/pluggy/_manager.py", line 115, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/pluggy/_callers.py", line 152, in _multicall
    return outcome.get_result()
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/pluggy/_result.py", line 114, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/pluggy/_callers.py", line 77, in _multicall
    res = hook_impl.function(*args)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/_pytest/runner.py", line 177, in pytest_runtest_call
    raise e
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/_pytest/runner.py", line 169, in pytest_runtest_call
    item.runtest()
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/_pytest/python.py", line 1792, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/pluggy/_hooks.py", line 493, in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/pluggy/_manager.py", line 115, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/pluggy/_callers.py", line 113, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/pluggy/_callers.py", line 77, in _multicall
    res = hook_impl.function(*args)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/.nox/system-3-8/lib/python3.8/site-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
    result = testfunction(**testargs)
  File "/tmpfs/src/github/python-bigquery-sqlalchemy/tests/system/test_sqlalchemy_bigquery.py", line 597, in test_get_indexes
    assert len(indexes) == 2
AssertionError: assert 0 == 2
 +  where 0 = len([])

nlenepveu · 2023-12-09T10:45:09Z

@parthea sure, I'll get this done.

…table partitions and clustering info

chalmerlowe · 2023-12-19T20:55:23Z

@nlenepveu
I am reviewing this PR and will provide some feedback shortly.
I appreciate the hard work and effort.

sqlalchemy_bigquery/base.py

nlenepveu · 2024-01-09T15:46:03Z

@chalmerlowe We should be good

chalmerlowe · 2024-01-09T15:53:13Z

@chalmerlowe We should be good

@nlenepveu I am looking through the code again and will be focused on this for a portion of my day. I may have additional changes or questions. Just FYI.

nlenepveu · 2024-01-09T15:57:09Z

@chalmerlowe We should be good

@nlenepveu I am looking through the code again and will be focused on this for a portion of my day. I may have additional changes or questions. Just FYI.

Ok great. I'm ready to jump in again.

sqlalchemy_bigquery/base.py

chalmerlowe · 2024-01-09T16:15:14Z

sqlalchemy_bigquery/base.py

+            raise ValueError(
+                "bigquery_range_partitioning expects field data type to be INTEGER"
+            )


Suggested change

raise ValueError(

"bigquery_range_partitioning expects field data type to be INTEGER"

)

raise ValueError(

"bigquery_range_partitioning expects field (i.e. column) data type

to be INTEGER"

)

I feel like we are raising the wrong type of error here.

TypeError: Raised when an operation or function is applied to an object of inappropriate type.

ValueError: Raised when an operation or function receives an argument that has the right type but an inappropriate value...

I have not checked yet to see if any tests look for a specific kind of error in this location. IF yes, those tests will need an update, as well.

I may disagree here since we do not raise because the user provides the wrong data type: we expect the field to be a string and here the field attribute is a string. We raise an error because the value of the field is wrong, it does not reference an acceptable column name. We expect the name of a column whose data type is an INTEGER.

I think it would be misleading the user if we raise a TypeError. If we had a specific exception type, it would be something like "ColumnDataTypeError".

I see what you are saying. How about the above verbiage change to the error message to help future me remember what this error means?

chalmerlowe · 2024-01-09T16:17:53Z

sqlalchemy_bigquery/base.py

+            raise ValueError(
+                "bigquery_range_partitioning expects range_.start to be an int,"
+                f" provided {repr(range_.start)}"
+            )


Suggested change

raise ValueError(

"bigquery_range_partitioning expects range_.start to be an int,"

f" provided {repr(range_.start)}"

)

raise TypeError(

"bigquery_range_partitioning expects range_.start to be an int,"

f" provided {repr(range_.start)}"

)

Also correct the type check and the term INTEGER for range_.end in the next expression, below.

I have not checked yet to see if any tests look for a specific kind of Error in this location. IF yes, those tests will need an update, as well.

You're right about the error type. I'll get that fixed.

Regarding the error message, I choose to differentiate INTEGER (the column data type) and int (the Python variable data type). However I'm not aware of the standard way for this kind of error message in Python, so let me know if this should still be adjusted.

I am fine with using int, now that I have a better understanding of what we are referring to here.
Thanks for explaining and for your patience.

sqlalchemy_bigquery/base.py

chalmerlowe · 2024-01-09T23:07:30Z

tests/unit/test_table_options.py

+    # expect ValueError when bigquery_range_partitioning field is not an INTEGER
+    with pytest.raises(
+        ValueError,
+        match="bigquery_range_partitioning expects field \(i\.e\. column\) data type to be INTEGER",


Flake8 is giving a linter error related to the backslash escapes.
W605 invalid escape sequence '\)'

I believe if we set set the string as a raw string (using the r prefix) it will solve the problem.

Suggested change

match="bigquery_range_partitioning expects field \(i\.e\. column\) data type to be INTEGER",

match=r"bigquery_range_partitioning expects field \(i\.e\. column\) data type to be INTEGER",

Yes, just made the change! I should have run the full test suite locally..

I am running the test suite here. Hopefully this all works out.

chalmerlowe

@nlenepveu

I appreciate all your hard work.
I appreciate your time and your patience as we worked through the review process! Sorry it took so long. While I am the maintainer of this product, I am new to this responsibility and I spend most of my time maintaining python-bigquery so my familiarity with the ins and outs of this library is limited (I am still learning and this was quite the learning experience for me).

I am gonna merge this despite the failing Kokoro SQLAlchemy compliance check.

Other PRs have the same failing Kokoro SQLAlchemy compliance tests, which implies that it is some form of external change that is affecting our tests not the changes in this PR.

nlenepveu · 2024-01-10T21:10:19Z

I appreciate all your hard work.
I appreciate your time and your patience as we worked through the review process! Sorry it took so long. While I am the maintainer of this product, I am new to this responsibility and I spend most of my time maintaining python-bigquery so my familiarity with the ins and outs of this library is limited (I am still learning and this was quite the learning experience for me).

@chalmerlowe

Thank you for your message! I also really appreciate your help and guidance in improving the quality of my contribution! It was a pleasure to get this done with such a seasoned Python developer.

leblancfg · 2024-01-12T20:00:52Z

Thank you for this @nlenepveu 🙇🏻

nlenepveu added 2 commits December 4, 2023 18:28

refactor: standardize bigquery options handling to manage more options

077dc3b

feat: handle table partitioning, table clustering and more table opti…

aae6359

…ons (expiration_timestamp, expiration_timestamp, require_partition_filter, default_rounding_mode) via create_table dialect options

nlenepveu requested review from a team as code owners December 4, 2023 21:51

nlenepveu requested a review from PhongChuong December 4, 2023 21:51

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. labels Dec 4, 2023

nlenepveu changed the title ~~Allow to set clustering and partitioning options at table creation~~ feat: allow to set clustering and partitioning options at table creation Dec 4, 2023

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Dec 4, 2023

docs: update README to describe how to create clustered and partition…

16c63e9

…ed table as well as other newly supported table options

parthea added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Dec 8, 2023

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 8, 2023

yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 8, 2023

nlenepveu and others added 4 commits December 9, 2023 12:06

test: adjust system tests since indexes are no longer populated from …

2cae630

…table partitions and clustering info

test: alembic now supports creating partitioned tables

913c4fc

test: run integration tests with all the new create_table options

39bbd56

Merge branch 'main' into main

9d00844

kiraksi added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 18, 2023

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 18, 2023

chalmerlowe mentioned this pull request Dec 19, 2023

feat: Add "time_partitioning" and "clustering_fields" in table creation process #891

Closed

chalmerlowe self-assigned this Dec 19, 2023

chalmerlowe reviewed Jan 9, 2024

View reviewed changes

sqlalchemy_bigquery/base.py Outdated Show resolved Hide resolved

chalmerlowe reviewed Jan 9, 2024

View reviewed changes

sqlalchemy_bigquery/base.py Outdated Show resolved Hide resolved

nlenepveu added 2 commits January 9, 2024 16:41

chore: no magic numbers

995d1e5

chore: consistency in docstrings

a9b8d27

chalmerlowe reviewed Jan 9, 2024

View reviewed changes

sqlalchemy_bigquery/base.py Outdated Show resolved Hide resolved

chalmerlowe reviewed Jan 9, 2024

View reviewed changes

sqlalchemy_bigquery/base.py Outdated Show resolved Hide resolved

nlenepveu added 2 commits January 9, 2024 17:34

chore: no magic number

37c1eb0

chore: better error types

badece4

nlenepveu requested a review from chalmerlowe January 9, 2024 17:59

chalmerlowe added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 9, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 9, 2024

chalmerlowe reviewed Jan 9, 2024

View reviewed changes

chore: fix W605 invalid escape sequence

8184c38

chalmerlowe added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 9, 2024

yoshi-kokoro removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Jan 9, 2024

chalmerlowe added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 10, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 10, 2024

chalmerlowe added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 10, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 10, 2024

chalmerlowe approved these changes Jan 10, 2024

View reviewed changes

chalmerlowe merged commit c2c2958 into googleapis:main Jan 10, 2024

release-please bot mentioned this pull request Jan 10, 2024

chore(main): release 1.10.0 #936

Merged

	match="bigquery_range_partitioning expects field \(i\.e\. column\) data type to be INTEGER",
	match=r"bigquery_range_partitioning expects field \(i\.e\. column\) data type to be INTEGER",

feat: allow to set clustering and time partitioning options at table creation #928

feat: allow to set clustering and time partitioning options at table creation #928

Uh oh!

Conversation

nlenepveu commented Dec 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nlenepveu commented Dec 4, 2023

Uh oh!

parthea commented Dec 8, 2023

Uh oh!

nlenepveu commented Dec 9, 2023

Uh oh!

chalmerlowe commented Dec 19, 2023

Uh oh!

Uh oh!

Uh oh!

nlenepveu commented Jan 9, 2024

Uh oh!

chalmerlowe commented Jan 9, 2024

Uh oh!

nlenepveu commented Jan 9, 2024

Uh oh!

Uh oh!

chalmerlowe Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nlenepveu Jan 9, 2024

Choose a reason for hiding this comment

Uh oh!

chalmerlowe Jan 9, 2024

Choose a reason for hiding this comment

Uh oh!

nlenepveu Jan 9, 2024

Choose a reason for hiding this comment

Uh oh!

chalmerlowe Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nlenepveu Jan 9, 2024

Choose a reason for hiding this comment

Uh oh!

chalmerlowe Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chalmerlowe Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nlenepveu Jan 9, 2024

Choose a reason for hiding this comment

Uh oh!

chalmerlowe Jan 9, 2024

Choose a reason for hiding this comment

Uh oh!

chalmerlowe left a comment

Choose a reason for hiding this comment

Uh oh!

nlenepveu commented Jan 10, 2024

Uh oh!

leblancfg commented Jan 12, 2024

Uh oh!

Uh oh!

nlenepveu commented Dec 4, 2023 •

edited

Loading

chalmerlowe Jan 9, 2024 •

edited

Loading

chalmerlowe Jan 9, 2024 •

edited

Loading

chalmerlowe Jan 9, 2024 •

edited

Loading

chalmerlowe Jan 9, 2024 •

edited

Loading