Skip to content

Commit 99efd8d

Browse files
committed
Fix array size allocation for HashAggregate hash keys.
When there were duplicate columns in the hash key list, the array sizes could be miscomputed, resulting in access off the end of the array. Adjust the computation to ensure the array is always large enough. (I considered whether the duplicates could be removed in planning, but I can't rule out the possibility that duplicate columns might have different hash functions assigned. Simpler to just make sure it works at execution time regardless.) Bug apparently introduced in fc4b3de as part of narrowing down the tuples stored in the hashtable. Reported by Colm McHugh of Salesforce, though I didn't use their patch. Backpatch back to version 10 where the bug was introduced. Discussion: https://postgr.es/m/CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
1 parent 2ccebcd commit 99efd8d

File tree

3 files changed

+47
-7
lines changed

3 files changed

+47
-7
lines changed

src/backend/executor/nodeAgg.c

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1932,9 +1932,14 @@ build_hash_table(AggState *aggstate)
19321932
* by themselves, and secondly ctids for row-marks.
19331933
*
19341934
* To eliminate duplicates, we build a bitmapset of the needed columns, and
1935-
* then build an array of the columns included in the hashtable. Note that
1936-
* the array is preserved over ExecReScanAgg, so we allocate it in the
1937-
* per-query context (unlike the hash table itself).
1935+
* then build an array of the columns included in the hashtable. We might
1936+
* still have duplicates if the passed-in grpColIdx has them, which can happen
1937+
* in edge cases from semijoins/distinct; these can't always be removed,
1938+
* because it's not certain that the duplicate cols will be using the same
1939+
* hash function.
1940+
*
1941+
* Note that the array is preserved over ExecReScanAgg, so we allocate it in
1942+
* the per-query context (unlike the hash table itself).
19381943
*/
19391944
static void
19401945
find_hash_columns(AggState *aggstate)
@@ -1954,6 +1959,7 @@ find_hash_columns(AggState *aggstate)
19541959
AttrNumber *grpColIdx = perhash->aggnode->grpColIdx;
19551960
List *hashTlist = NIL;
19561961
TupleDesc hashDesc;
1962+
int maxCols;
19571963
int i;
19581964

19591965
perhash->largestGrpColIdx = 0;
@@ -1978,15 +1984,24 @@ find_hash_columns(AggState *aggstate)
19781984
colnos = bms_del_member(colnos, attnum);
19791985
}
19801986
}
1981-
/* Add in all the grouping columns */
1982-
for (i = 0; i < perhash->numCols; i++)
1983-
colnos = bms_add_member(colnos, grpColIdx[i]);
1987+
1988+
/*
1989+
* Compute maximum number of input columns accounting for possible
1990+
* duplications in the grpColIdx array, which can happen in some edge
1991+
* cases where HashAggregate was generated as part of a semijoin or a
1992+
* DISTINCT.
1993+
*/
1994+
maxCols = bms_num_members(colnos) + perhash->numCols;
19841995

19851996
perhash->hashGrpColIdxInput =
1986-
palloc(bms_num_members(colnos) * sizeof(AttrNumber));
1997+
palloc(maxCols * sizeof(AttrNumber));
19871998
perhash->hashGrpColIdxHash =
19881999
palloc(perhash->numCols * sizeof(AttrNumber));
19892000

2001+
/* Add all the grouping columns to colnos */
2002+
for (i = 0; i < perhash->numCols; i++)
2003+
colnos = bms_add_member(colnos, grpColIdx[i]);
2004+
19902005
/*
19912006
* First build mapping for columns directly hashed. These are the
19922007
* first, because they'll be accessed when computing hash values and

src/test/regress/expected/aggregates.out

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2100,3 +2100,21 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
21002100
ba | 0 | 1
21012101
(2 rows)
21022102

2103+
-- Make sure that generation of HashAggregate for uniqification purposes
2104+
-- does not lead to array overflow due to unexpected duplicate hash keys
2105+
-- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
2106+
explain (costs off)
2107+
select 1 from tenk1
2108+
where (hundred, thousand) in (select twothousand, twothousand from onek);
2109+
QUERY PLAN
2110+
-------------------------------------------------------------
2111+
Hash Join
2112+
Hash Cond: (tenk1.hundred = onek.twothousand)
2113+
-> Seq Scan on tenk1
2114+
Filter: (hundred = thousand)
2115+
-> Hash
2116+
-> HashAggregate
2117+
Group Key: onek.twothousand, onek.twothousand
2118+
-> Seq Scan on onek
2119+
(8 rows)
2120+

src/test/regress/sql/aggregates.sql

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -926,3 +926,10 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*)
926926
select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
927927
from unnest(array['a','b']) u(v)
928928
group by v||'a' order by 1;
929+
930+
-- Make sure that generation of HashAggregate for uniqification purposes
931+
-- does not lead to array overflow due to unexpected duplicate hash keys
932+
-- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
933+
explain (costs off)
934+
select 1 from tenk1
935+
where (hundred, thousand) in (select twothousand, twothousand from onek);

0 commit comments

Comments
 (0)