-
Notifications
You must be signed in to change notification settings - Fork 138
feat: allow to set clustering and time partitioning options at table creation #928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ons (expiration_timestamp, expiration_timestamp, require_partition_filter, default_rounding_mode) via create_table dialect options
This is also pretty close to what have been done in the dialect for Snowflake. |
…xes leads to bad autogenerated version file def upgrade() -> None: # ### commands auto generated by Alembic - please adjust! ### op.drop_index('clustering', table_name='dataset.some_table') op.drop_index('partition', table_name='dataset.some_table') # ### end Alembic commands ### def downgrade() -> None: # ### commands auto generated by Alembic - please adjust! ### op.create_index('partition', 'dataset.some_table', ['createdAt'], unique=False) op.create_index('clustering', 'dataset.some_table', ['id', 'createdAt'], unique=False) # ### end Alembic commands ###
…ed table as well as other newly supported table options
@nlenepveu,
|
@parthea sure, I'll get this done. |
@nlenepveu |
@chalmerlowe We should be good |
@nlenepveu I am looking through the code again and will be focused on this for a portion of my day. I may have additional changes or questions. Just FYI. |
Ok great. I'm ready to jump in again. |
sqlalchemy_bigquery/base.py
Outdated
raise ValueError( | ||
"bigquery_range_partitioning expects field data type to be INTEGER" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError( | |
"bigquery_range_partitioning expects field data type to be INTEGER" | |
) | |
raise ValueError( | |
"bigquery_range_partitioning expects field (i.e. column) data type | |
to be INTEGER" | |
) |
I feel like we are raising the wrong type of error here.
TypeError: Raised when an operation or function is applied to an object of inappropriate type.
ValueError: Raised when an operation or function receives an argument that has
the right type but an inappropriate value...
I have not checked yet to see if any tests look for a specific kind of error in this location. IF yes, those tests will need an update, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may disagree here since we do not raise because the user provides the wrong data type: we expect the field to be a string and here the field attribute is a string. We raise an error because the value of the field is wrong, it does not reference an acceptable column name. We expect the name of a column whose data type is an INTEGER.
I think it would be misleading the user if we raise a TypeError. If we had a specific exception type, it would be something like "ColumnDataTypeError".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are saying. How about the above verbiage change to the error message to help future me remember what this error means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
sqlalchemy_bigquery/base.py
Outdated
raise ValueError( | ||
"bigquery_range_partitioning expects range_.start to be an int," | ||
f" provided {repr(range_.start)}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError( | |
"bigquery_range_partitioning expects range_.start to be an int," | |
f" provided {repr(range_.start)}" | |
) | |
raise TypeError( | |
"bigquery_range_partitioning expects range_.start to be an int," | |
f" provided {repr(range_.start)}" | |
) |
Also correct the type check and the term INTEGER for range_.end
in the next expression, below.
I have not checked yet to see if any tests look for a specific kind of Error in this location. IF yes, those tests will need an update, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right about the error type. I'll get that fixed.
Regarding the error message, I choose to differentiate INTEGER (the column data type) and int (the Python variable data type). However I'm not aware of the standard way for this kind of error message in Python, so let me know if this should still be adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with using int
, now that I have a better understanding of what we are referring to here.
Thanks for explaining and for your patience.
tests/unit/test_table_options.py
Outdated
# expect ValueError when bigquery_range_partitioning field is not an INTEGER | ||
with pytest.raises( | ||
ValueError, | ||
match="bigquery_range_partitioning expects field \(i\.e\. column\) data type to be INTEGER", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flake8 is giving a linter error related to the backslash escapes.
W605 invalid escape sequence '\)'
I believe if we set set the string as a raw
string (using the r
prefix) it will solve the problem.
match="bigquery_range_partitioning expects field \(i\.e\. column\) data type to be INTEGER", | |
match=r"bigquery_range_partitioning expects field \(i\.e\. column\) data type to be INTEGER", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just made the change! I should have run the full test suite locally..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am running the test suite here. Hopefully this all works out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate all your hard work.
I appreciate your time and your patience as we worked through the review process! Sorry it took so long. While I am the maintainer of this product, I am new to this responsibility and I spend most of my time maintaining python-bigquery
so my familiarity with the ins and outs of this library is limited (I am still learning and this was quite the learning experience for me).
I am gonna merge this despite the failing Kokoro SQLAlchemy compliance
check.
Other PRs have the same failing Kokoro SQLAlchemy compliance
tests, which implies that it is some form of external change that is affecting our tests not the changes in this PR.
Thank you for your message! I also really appreciate your help and guidance in improving the quality of my contribution! It was a pleasure to get this done with such a seasoned Python developer. |
Thank you for this @nlenepveu 🙇🏻 |
I suggest a way to pass more options at table creation to specify table partitioning and clustering, as well as options that are not yet supported.
Which generates this DDL script:
After reading the contribution guidelines, I realized and agreed that the design should have been discussed before submitting this change request.. Anyway, this proposal is quite straightforward and stick to the python way to write programming interface so I hope it could get your attention.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #395 🦕