-
Notifications
You must be signed in to change notification settings - Fork 58
perf: Improve axis=1 aggregation performance #2036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -353,6 +354,36 @@ def _(self, op: ops.ScalarOp, input: pl.Expr) -> pl.Expr: | |||
assert isinstance(op, json_ops.JSONDecode) | |||
return input.str.json_decode(_DTYPE_MAPPING[op.to_type]) | |||
|
|||
@compile_op.register(arr_ops.ToArrayOp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any engine or system tests for these two new ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly, relying on existing aggregate axis=1 tests, though have now added some engine tests for the new ops
@@ -699,6 +699,9 @@ def visit_ArrayFilter(self, op, *, arg, body, param): | |||
def visit_ArrayMap(self, op, *, arg, body, param): | |||
return self.f.array(sg.select(body).from_(self._unnest(arg, as_=param))) | |||
|
|||
def visit_ArrayReduce(self, op, *, arg, body, param): | |||
return sg.select(body).from_(self._unnest(arg, as_=param)).subquery() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use self.f.array(...)
rather than subquery()
similar as other visit_Array*
methods in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those other methods output an array, arrayreduce produces a single scalar for each input array, so I don't think we want self.f.array()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking!
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕