Skip to content

Commit 37450f2

Browse files
committed
Fix incorrect hash table resizing code in simplehash.h
This fixes a bug in simplehash.h which caused an incorrect size mask to be used when the hash table grew to SH_MAX_SIZE (2^32). The code was incorrectly setting the size mask to 0 when the hash tables reached the maximum possible number of buckets. This would result always trying to use the 0th bucket causing an infinite loop of trying to grow the hash table due to there being too many collisions. Seemingly it's not that common for simplehash tables to ever grow this big as this bug dates back to v10 and nobody seems to have noticed it before. However, probably the most likely place that people would notice it would be doing a large in-memory Hash Aggregate with something close to at least 2^31 groups. After this fix, the code now works correctly with up to within 98% of 2^32 groups and will fail with the following error when trying to insert any more items into the hash table: ERROR: hash table size exceeded However, the work_mem (or hash_mem_multiplier in newer versions) settings will generally cause Hash Aggregates to spill to disk long before reaching that many groups. The minimal test case I did took a work_mem setting of over 192GB to hit the bug. simplehash hash tables are used in a few other places such as Bitmap Index Scans, however, again the size that the hash table can become there is also limited to work_mem and it would take a relation of around 16TB (2^31) pages and a very large work_mem setting to hit this. With smaller work_mem values the table would become lossy and never grow large enough to hit the problem. Author: Yura Sokolov Reviewed-by: David Rowley, Ranier Vilela Discussion: https://postgr.es/m/b1f7f32737c3438136f64b26f4852b96@postgrespro.ru Backpatch-through: 10, where simplehash.h was added
1 parent 88cbbbf commit 37450f2

File tree

1 file changed

+5
-9
lines changed

1 file changed

+5
-9
lines changed

src/include/lib/simplehash.h

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -198,8 +198,8 @@ SH_SCOPE void SH_DESTROY(SH_TYPE * tb);
198198
/* void <prefix>_reset(<prefix>_hash *tb) */
199199
SH_SCOPE void SH_RESET(SH_TYPE * tb);
200200

201-
/* void <prefix>_grow(<prefix>_hash *tb) */
202-
SH_SCOPE void SH_GROW(SH_TYPE * tb, uint32 newsize);
201+
/* void <prefix>_grow(<prefix>_hash *tb, uint64 newsize) */
202+
SH_SCOPE void SH_GROW(SH_TYPE * tb, uint64 newsize);
203203

204204
/* <element> *<prefix>_insert(<prefix>_hash *tb, <key> key, bool *found) */
205205
SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found);
@@ -302,7 +302,7 @@ SH_SCOPE void SH_STAT(SH_TYPE * tb);
302302
* the hashtable.
303303
*/
304304
static inline void
305-
SH_COMPUTE_PARAMETERS(SH_TYPE * tb, uint32 newsize)
305+
SH_COMPUTE_PARAMETERS(SH_TYPE * tb, uint64 newsize)
306306
{
307307
uint64 size;
308308

@@ -322,11 +322,7 @@ SH_COMPUTE_PARAMETERS(SH_TYPE * tb, uint32 newsize)
322322

323323
/* now set size */
324324
tb->size = size;
325-
326-
if (tb->size == SH_MAX_SIZE)
327-
tb->sizemask = 0;
328-
else
329-
tb->sizemask = tb->size - 1;
325+
tb->sizemask = (uint32) (size - 1);
330326

331327
/*
332328
* Compute the next threshold at which we need to grow the hash table
@@ -476,7 +472,7 @@ SH_RESET(SH_TYPE * tb)
476472
* performance-wise, when known at some point.
477473
*/
478474
SH_SCOPE void
479-
SH_GROW(SH_TYPE * tb, uint32 newsize)
475+
SH_GROW(SH_TYPE * tb, uint64 newsize)
480476
{
481477
uint64 oldsize = tb->size;
482478
SH_ELEMENT_TYPE *olddata = tb->data;

0 commit comments

Comments
 (0)