Skip to content

fix: Self-join optimization doesn't needlessly invalidate caching #797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 21, 2024

Conversation

TrevorBergeron
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 17, 2024
@TrevorBergeron TrevorBergeron marked this pull request as ready for review June 18, 2024 17:18
@TrevorBergeron TrevorBergeron requested review from a team as code owners June 18, 2024 17:18
Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but left comments to rename or add more comments to describe the optimization strategy.

if self.root != right.root:
return None
raise ValueError("Cannot merge expressions with different roots")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to raise InternalError and ask customer to share their user case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far we don't do this for other similar errors - couldn't find InternalError anywhere in the code base. Maybe we should automatically wrap all exceptions with feedback link? Seems out of scope

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be SystemError in the python exceptions?

@@ -311,3 +326,25 @@ def get_node_column_ids(node: nodes.BigFrameNode) -> Tuple[str, ...]:
import bigframes.core

return tuple(bigframes.core.ArrayValue(node).column_ids)


def common_subtree(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe call common_rewritable_substree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reanamed to common_selection_root

elif isinstance(node, nodes.OrderByNode):
return cls.from_node(node.child, target).order_with(node.by)
else:
raise ValueError(f"Cannot rewrite node {node}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to raise InternalError or add an assert that the target node must be the subtree of the given node? Or rename from_node into from_subnode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be entirely impossible to reach this error through any public api. renamed to from_node_span

if rewrite_common_node is None:
return join_node
left_side = SquashedSelect.from_node(join_node.left_child, rewrite_common_node)
right_side = SquashedSelect.from_node(join_node.right_child, rewrite_common_node)
if left_side.can_join(right_side, join_node.join):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename it as can_merge, I thought it was checking if two trees are joinable or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@TrevorBergeron TrevorBergeron enabled auto-merge (squash) June 20, 2024 21:53
@TrevorBergeron TrevorBergeron merged commit 1b96b80 into main Jun 21, 2024
23 checks passed
@TrevorBergeron TrevorBergeron deleted the safer_rewrite branch June 21, 2024 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants