Skip to content

[SPARK-11864] [SQL] Improve performance of max/min #9846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Nov 19, 2015

This PR has the following optimization:

  1. The greatest/least already does the null-check, so the If and IsNull are not necessary.

  2. In greatest/least, it should initialize the result using the first child (removing one block).

  3. For primitive types, the generated greater expression is too complicated (a > b ? 1 : (a < b) ? -1 : 0) > 0), should be as simple as a > b

Combine these optimization, this could improve the performance of ss_max query by 30%.

@nongli
Copy link
Contributor

nongli commented Nov 19, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46358 has finished for PR 9846 at commit 7f7e33d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46362 has finished for PR 9846 at commit 593a361.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 20, 2015

Thanks - merging this.

@asfgit asfgit closed this in ee21407 Nov 20, 2015
asfgit pushed a commit that referenced this pull request Nov 20, 2015
This PR has the following optimization:

1) The greatest/least already does the null-check, so the `If` and `IsNull` are not necessary.

2) In greatest/least, it should initialize the result using the first child (removing one block).

3) For primitive types, the generated greater expression is too complicated (`a > b ? 1 : (a < b) ? -1 : 0) > 0`), should be as simple as `a > b`

Combine these optimization, this could improve the performance of `ss_max` query by 30%.

Author: Davies Liu <davies@databricks.com>

Closes #9846 from davies/improve_max.

(cherry picked from commit ee21407)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants