Skip to content

Commit f7da492

Browse files
committed
Fix array size allocation for HashAggregate hash keys.
When there were duplicate columns in the hash key list, the array sizes could be miscomputed, resulting in access off the end of the array. Adjust the computation to ensure the array is always large enough. (I considered whether the duplicates could be removed in planning, but I can't rule out the possibility that duplicate columns might have different hash functions assigned. Simpler to just make sure it works at execution time regardless.) Bug apparently introduced in fc4b3de as part of narrowing down the tuples stored in the hashtable. Reported by Colm McHugh of Salesforce, though I didn't use their patch. Backpatch back to version 10 where the bug was introduced. Discussion: https://postgr.es/m/CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
1 parent a7b2fca commit f7da492

File tree

3 files changed

+47
-7
lines changed

3 files changed

+47
-7
lines changed

src/backend/executor/nodeAgg.c

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1311,9 +1311,14 @@ build_hash_table(AggState *aggstate)
13111311
* by themselves, and secondly ctids for row-marks.
13121312
*
13131313
* To eliminate duplicates, we build a bitmapset of the needed columns, and
1314-
* then build an array of the columns included in the hashtable. Note that
1315-
* the array is preserved over ExecReScanAgg, so we allocate it in the
1316-
* per-query context (unlike the hash table itself).
1314+
* then build an array of the columns included in the hashtable. We might
1315+
* still have duplicates if the passed-in grpColIdx has them, which can happen
1316+
* in edge cases from semijoins/distinct; these can't always be removed,
1317+
* because it's not certain that the duplicate cols will be using the same
1318+
* hash function.
1319+
*
1320+
* Note that the array is preserved over ExecReScanAgg, so we allocate it in
1321+
* the per-query context (unlike the hash table itself).
13171322
*/
13181323
static void
13191324
find_hash_columns(AggState *aggstate)
@@ -1334,6 +1339,7 @@ find_hash_columns(AggState *aggstate)
13341339
AttrNumber *grpColIdx = perhash->aggnode->grpColIdx;
13351340
List *hashTlist = NIL;
13361341
TupleDesc hashDesc;
1342+
int maxCols;
13371343
int i;
13381344

13391345
perhash->largestGrpColIdx = 0;
@@ -1358,15 +1364,24 @@ find_hash_columns(AggState *aggstate)
13581364
colnos = bms_del_member(colnos, attnum);
13591365
}
13601366
}
1361-
/* Add in all the grouping columns */
1362-
for (i = 0; i < perhash->numCols; i++)
1363-
colnos = bms_add_member(colnos, grpColIdx[i]);
1367+
1368+
/*
1369+
* Compute maximum number of input columns accounting for possible
1370+
* duplications in the grpColIdx array, which can happen in some edge
1371+
* cases where HashAggregate was generated as part of a semijoin or a
1372+
* DISTINCT.
1373+
*/
1374+
maxCols = bms_num_members(colnos) + perhash->numCols;
13641375

13651376
perhash->hashGrpColIdxInput =
1366-
palloc(bms_num_members(colnos) * sizeof(AttrNumber));
1377+
palloc(maxCols * sizeof(AttrNumber));
13671378
perhash->hashGrpColIdxHash =
13681379
palloc(perhash->numCols * sizeof(AttrNumber));
13691380

1381+
/* Add all the grouping columns to colnos */
1382+
for (i = 0; i < perhash->numCols; i++)
1383+
colnos = bms_add_member(colnos, grpColIdx[i]);
1384+
13701385
/*
13711386
* First build mapping for columns directly hashed. These are the
13721387
* first, because they'll be accessed when computing hash values and

src/test/regress/expected/aggregates.out

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2140,3 +2140,21 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
21402140
ba | 0 | 1
21412141
(2 rows)
21422142

2143+
-- Make sure that generation of HashAggregate for uniqification purposes
2144+
-- does not lead to array overflow due to unexpected duplicate hash keys
2145+
-- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
2146+
explain (costs off)
2147+
select 1 from tenk1
2148+
where (hundred, thousand) in (select twothousand, twothousand from onek);
2149+
QUERY PLAN
2150+
-------------------------------------------------------------
2151+
Hash Join
2152+
Hash Cond: (tenk1.hundred = onek.twothousand)
2153+
-> Seq Scan on tenk1
2154+
Filter: (hundred = thousand)
2155+
-> Hash
2156+
-> HashAggregate
2157+
Group Key: onek.twothousand, onek.twothousand
2158+
-> Seq Scan on onek
2159+
(8 rows)
2160+

src/test/regress/sql/aggregates.sql

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -944,3 +944,10 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*)
944944
select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
945945
from unnest(array['a','b']) u(v)
946946
group by v||'a' order by 1;
947+
948+
-- Make sure that generation of HashAggregate for uniqification purposes
949+
-- does not lead to array overflow due to unexpected duplicate hash keys
950+
-- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
951+
explain (costs off)
952+
select 1 from tenk1
953+
where (hundred, thousand) in (select twothousand, twothousand from onek);

0 commit comments

Comments
 (0)