Skip to content

perf: Automatically condense internal expression representation #516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 29, 2024

Conversation

TrevorBergeron
Copy link
Contributor

… schema system.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 26, 2024
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Mar 26, 2024
@TrevorBergeron TrevorBergeron changed the title perf: Automatically squash internal projection nodes and use internal… perf: Automatically condense internal expression representation Apr 24, 2024
@TrevorBergeron TrevorBergeron marked this pull request as ready for review April 24, 2024 18:07
@TrevorBergeron TrevorBergeron requested review from a team as code owners April 24, 2024 18:07
@TrevorBergeron TrevorBergeron requested review from chelsea-lin and removed request for junyazhang April 24, 2024 18:09
@GarrettWu GarrettWu removed their assignment Apr 24, 2024
@@ -183,7 +183,7 @@ def project_to_id(self, expression: ex.Expression, output_id: str):
child=self.node,
assignments=tuple(exprs),
)
)
).rewrite_projection()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to pick up these functions and ask for rewriting the projection? Can we add the rewrite_projection to the ArrayValue constructor? In case other developers who will add new function here and miss the rewrite_projection while missing context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to constructor in new revision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, this isn't really possible, as ArrayValue is an immutable dataclass, and mutation (by rewriting) is blocked even in the initializer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for trying that out! Maybe adding some comments for why we can add the rewrite_projection here? So other developers can have more context when they add something similar?

if join_type == "right":
new_ordering = right.ordering
reverse_root = right.reverse_root
elif join_type == "outer":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to redefine the reverse_root for the outer join case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outer join we treat as left order taking precedence, so it inherits reverse_root from the self.

return OrderedIR(
self._table,
columns=self.columns,
hidden_ordering_columns=[*self._hidden_ordering_columns, *new_baked_cols],
Copy link
Contributor

@chelsea-lin chelsea-lin Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the intention to omit the existing _hidden_ordering_columns here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we are rewriting the hidden columns here, so we are recreating the set of hidden columns and discarding the old set.

Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -183,7 +183,7 @@ def project_to_id(self, expression: ex.Expression, output_id: str):
child=self.node,
assignments=tuple(exprs),
)
)
).rewrite_projection()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for trying that out! Maybe adding some comments for why we can add the rewrite_projection here? So other developers can have more context when they add something similar?

@TrevorBergeron TrevorBergeron enabled auto-merge (squash) April 29, 2024 16:37
@TrevorBergeron TrevorBergeron merged commit 03c1b0d into main Apr 29, 2024
15 of 16 checks passed
@TrevorBergeron TrevorBergeron deleted the auto_merge_project branch April 29, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants