-
Notifications
You must be signed in to change notification settings - Fork 48
feat: Add deferred data uploading #1720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
): | ||
mapping = { | ||
local_data.schema.items[i].column: bq_data.table.physical_schema[i].name | ||
for i in range(len(local_data.schema)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add any safeguards here in case the lengths don't match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an assert for exact schema length
# Step 1: Upload all previously un-uploaded data | ||
for leaf in original_root.unique_nodes(): | ||
if isinstance(leaf, nodes.ReadLocalNode): | ||
if leaf.local_data_source.metadata.total_bytes > 5000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create a constant for this 5000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if node.offsets_col is not None: | ||
scan_list = scan_list.append( | ||
bq_source.table.physical_schema[-1].name, | ||
bigframes.dtypes.INT_DTYPE, | ||
node.offsets_col, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason that offsets_col isn't in source_mapping
, already? Could you add a comment explaining it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a bit clumsy due to offsets_col only really being a thing for the local node, so when we upload, we will set the final physical column to be those offsets, but then we just have to add to the end of the scan_list for the ReadTableNode. Added a comment to explain. Probably some room to respec the leaf nodes for a bit more of a 1:1 mapping.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕