Skip to content

Commit b592422

Browse files
committed
Relax overly strict rules in select_outer_pathkeys_for_merge()
The select_outer_pathkeys_for_merge function made an attempt to build the merge join pathkeys in the same order as query_pathkeys. This was done as it may have led to no sort being required for an ORDER BY or GROUP BY clause in the upper planner. However, this restriction seems overly strict as it required that we match the query_pathkeys entirely or we don't bother putting the merge join pathkeys in that order. Here we relax this rule so that we use a prefix of the query_pathkeys providing that prefix matches all of the join quals. This may provide the upper planner with partially sorted input which will allow the use of incremental sorts instead of full sorts. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvrtZu0PHVfDPFM4Yx3jNR2Wuwosv+T2zqa7LrhhBr2rRg@mail.gmail.com
1 parent 2865b40 commit b592422

File tree

3 files changed

+82
-7
lines changed

3 files changed

+82
-7
lines changed

src/backend/optimizer/path/pathkeys.c

Lines changed: 33 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1890,11 +1890,13 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
18901890
* Since we assume here that a sort is required, there is no particular use
18911891
* in matching any available ordering of the outerrel. (joinpath.c has an
18921892
* entirely separate code path for considering sort-free mergejoins.) Rather,
1893-
* it's interesting to try to match the requested query_pathkeys so that a
1894-
* second output sort may be avoided; and failing that, we try to list "more
1895-
* popular" keys (those with the most unmatched EquivalenceClass peers)
1896-
* earlier, in hopes of making the resulting ordering useful for as many
1897-
* higher-level mergejoins as possible.
1893+
* it's interesting to try to match, or match a prefix of the requested
1894+
* query_pathkeys so that a second output sort may be avoided or an
1895+
* incremental sort may be done instead. We can get away with just a prefix
1896+
* of the query_pathkeys when that prefix covers the entire join condition.
1897+
* Failing that, we try to list "more popular" keys (those with the most
1898+
* unmatched EquivalenceClass peers) earlier, in hopes of making the resulting
1899+
* ordering useful for as many higher-level mergejoins as possible.
18981900
*/
18991901
List *
19001902
select_outer_pathkeys_for_merge(PlannerInfo *root,
@@ -1964,11 +1966,16 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
19641966

19651967
/*
19661968
* Find out if we have all the ECs mentioned in query_pathkeys; if so we
1967-
* can generate a sort order that's also useful for final output. There is
1968-
* no percentage in a partial match, though, so we have to have 'em all.
1969+
* can generate a sort order that's also useful for final output. If we
1970+
* only have a prefix of the query_pathkeys, and that prefix is the entire
1971+
* join condition, then it's useful to use the prefix as the pathkeys as
1972+
* this increases the chances that an incremental sort will be able to be
1973+
* used by the upper planner.
19691974
*/
19701975
if (root->query_pathkeys)
19711976
{
1977+
int matches = 0;
1978+
19721979
foreach(lc, root->query_pathkeys)
19731980
{
19741981
PathKey *query_pathkey = (PathKey *) lfirst(lc);
@@ -1981,6 +1988,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
19811988
}
19821989
if (j >= necs)
19831990
break; /* didn't find match */
1991+
1992+
matches++;
19841993
}
19851994
/* if we got to the end of the list, we have them all */
19861995
if (lc == NULL)
@@ -2003,6 +2012,23 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
20032012
}
20042013
}
20052014
}
2015+
2016+
/*
2017+
* If we didn't match to all of the query_pathkeys, but did match to
2018+
* all of the join clauses then we'll make use of these as partially
2019+
* sorted input is better than nothing for the upper planner as it may
2020+
* lead to incremental sorts instead of full sorts.
2021+
*/
2022+
else if (matches == nClauses)
2023+
{
2024+
pathkeys = list_copy_head(root->query_pathkeys, matches);
2025+
2026+
/* we have all of the join pathkeys, so nothing more to do */
2027+
pfree(ecs);
2028+
pfree(scores);
2029+
2030+
return pathkeys;
2031+
}
20062032
}
20072033

20082034
/*

src/test/regress/expected/join.out

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2437,6 +2437,37 @@ select count(*) from
24372437
10000
24382438
(1 row)
24392439

2440+
set enable_hashjoin = 0;
2441+
set enable_nestloop = 0;
2442+
set enable_hashagg = 0;
2443+
--
2444+
-- Check that we use the pathkeys from a prefix of the group by / order by
2445+
-- clause for the join pathkeys when that prefix covers all join quals. We
2446+
-- expect this to lead to an incremental sort for the group by / order by.
2447+
--
2448+
explain (costs off)
2449+
select x.thousand, x.twothousand, count(*)
2450+
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
2451+
group by x.thousand, x.twothousand
2452+
order by x.thousand desc, x.twothousand;
2453+
QUERY PLAN
2454+
----------------------------------------------------------------------------------
2455+
GroupAggregate
2456+
Group Key: x.thousand, x.twothousand
2457+
-> Incremental Sort
2458+
Sort Key: x.thousand DESC, x.twothousand
2459+
Presorted Key: x.thousand
2460+
-> Merge Join
2461+
Merge Cond: (y.thousand = x.thousand)
2462+
-> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
2463+
-> Sort
2464+
Sort Key: x.thousand DESC
2465+
-> Seq Scan on tenk1 x
2466+
(11 rows)
2467+
2468+
reset enable_hashagg;
2469+
reset enable_nestloop;
2470+
reset enable_hashjoin;
24402471
--
24412472
-- Clean up
24422473
--

src/test/regress/sql/join.sql

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -472,6 +472,24 @@ select count(*) from
472472
(select * from tenk1 y order by y.unique2) y
473473
on x.thousand = y.unique2 and x.twothousand = y.hundred and x.fivethous = y.unique2;
474474

475+
set enable_hashjoin = 0;
476+
set enable_nestloop = 0;
477+
set enable_hashagg = 0;
478+
479+
--
480+
-- Check that we use the pathkeys from a prefix of the group by / order by
481+
-- clause for the join pathkeys when that prefix covers all join quals. We
482+
-- expect this to lead to an incremental sort for the group by / order by.
483+
--
484+
explain (costs off)
485+
select x.thousand, x.twothousand, count(*)
486+
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
487+
group by x.thousand, x.twothousand
488+
order by x.thousand desc, x.twothousand;
489+
490+
reset enable_hashagg;
491+
reset enable_nestloop;
492+
reset enable_hashjoin;
475493

476494
--
477495
-- Clean up

0 commit comments

Comments
 (0)