-
Notifications
You must be signed in to change notification settings - Fork 49
fix: Self-join optimization doesn't needlessly invalidate caching #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but left comments to rename or add more comments to describe the optimization strategy.
if self.root != right.root: | ||
return None | ||
raise ValueError("Cannot merge expressions with different roots") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to raise InternalError
and ask customer to share their user case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far we don't do this for other similar errors - couldn't find InternalError
anywhere in the code base. Maybe we should automatically wrap all exceptions with feedback link? Seems out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be SystemError
in the python exceptions?
bigframes/core/rewrite.py
Outdated
@@ -311,3 +326,25 @@ def get_node_column_ids(node: nodes.BigFrameNode) -> Tuple[str, ...]: | |||
import bigframes.core | |||
|
|||
return tuple(bigframes.core.ArrayValue(node).column_ids) | |||
|
|||
|
|||
def common_subtree( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe call common_rewritable_substree
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reanamed to common_selection_root
elif isinstance(node, nodes.OrderByNode): | ||
return cls.from_node(node.child, target).order_with(node.by) | ||
else: | ||
raise ValueError(f"Cannot rewrite node {node}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to raise InternalError
or add an assert that the target node must be the subtree of the given node
? Or rename from_node
into from_subnode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be entirely impossible to reach this error through any public api. renamed to from_node_span
bigframes/core/rewrite.py
Outdated
if rewrite_common_node is None: | ||
return join_node | ||
left_side = SquashedSelect.from_node(join_node.left_child, rewrite_common_node) | ||
right_side = SquashedSelect.from_node(join_node.right_child, rewrite_common_node) | ||
if left_side.can_join(right_side, join_node.join): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename it as can_merge
, I thought it was checking if two trees are joinable or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
3660668
to
5fcde75
Compare
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕