Skip to content

Commit 9094eb2

Browse files
committed
Fix some performance issues in GIN query startup.
If a GIN index search had a lot of search keys (for example, "jsonbcol ?| array[]" with tens of thousands of array elements), both ginFillScanKey() and startScanKey() took O(N^2) time. Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS. The problem in ginFillScanKey() is the brute-force search key de-duplication done in ginFillScanEntry(). The most expedient solution seems to be to just stop trying to de-duplicate once there are "too many" search keys. We could imagine working harder, say by using a sort-and-unique algorithm instead of brute force compare-all-the-keys. But it seems unlikely to be worth the trouble. There is no correctness issue here, since the code already allowed duplicate keys if any extra_data is present. The problem in startScanKey() is the loop that attempts to identify the first non-required search key. In the submitted test case, that vainly tests all the key positions, and each iteration takes O(N) time. One part of that is that it's reinitializing the entryRes[] array from scratch each time, which is entirely unnecessary given that the triConsistentFn isn't supposed to scribble on its input. We can easily adjust the array contents incrementally instead. The other part of it is that the triConsistentFn may itself take O(N) time (and does in this test case). This is all extremely brute force: in simple cases with AND or OR semantics, we could know without any looping whatever that all or none of the keys are required. But GIN opclasses don't have any API for exposing that knowledge, so at least in the short run there is little to be done about that. Put in a CHECK_FOR_INTERRUPTS so that at least the loop is cancelable. These two changes together resolve the primary complaint that the test query doesn't respond promptly to cancel interrupts. Also, while they don't completely eliminate the O(N^2) behavior, they do provide quite a nice speedup for mid-sized examples. Bug: #18831 Reported-by: Niek <niek.brasa@hitachienergy.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18831-e845ac44ebc5dd36@postgresql.org Backpatch-through: 13
1 parent 3f4c5e3 commit 9094eb2

File tree

2 files changed

+12
-5
lines changed

2 files changed

+12
-5
lines changed

src/backend/access/gin/ginget.c

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -556,16 +556,18 @@ startScanKey(GinState *ginstate, GinScanOpaque so, GinScanKey key)
556556
qsort_arg(entryIndexes, key->nentries, sizeof(int),
557557
entryIndexByFrequencyCmp, key);
558558

559+
for (i = 1; i < key->nentries; i++)
560+
key->entryRes[entryIndexes[i]] = GIN_MAYBE;
559561
for (i = 0; i < key->nentries - 1; i++)
560562
{
561563
/* Pass all entries <= i as FALSE, and the rest as MAYBE */
562-
for (j = 0; j <= i; j++)
563-
key->entryRes[entryIndexes[j]] = GIN_FALSE;
564-
for (j = i + 1; j < key->nentries; j++)
565-
key->entryRes[entryIndexes[j]] = GIN_MAYBE;
564+
key->entryRes[entryIndexes[i]] = GIN_FALSE;
566565

567566
if (key->triConsistentFn(key) == GIN_FALSE)
568567
break;
568+
569+
/* Make this loop interruptible in case there are many keys */
570+
CHECK_FOR_INTERRUPTS();
569571
}
570572
/* i is now the last required entry. */
571573

src/backend/access/gin/ginscan.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,8 +68,13 @@ ginFillScanEntry(GinScanOpaque so, OffsetNumber attnum,
6868
*
6969
* Entries with non-null extra_data are never considered identical, since
7070
* we can't know exactly what the opclass might be doing with that.
71+
*
72+
* Also, give up de-duplication once we have 100 entries. That avoids
73+
* spending O(N^2) time on probably-fruitless de-duplication of large
74+
* search-key sets. The threshold of 100 is arbitrary but matches
75+
* predtest.c's threshold for what's a large array.
7176
*/
72-
if (extra_data == NULL)
77+
if (extra_data == NULL && so->totalentries < 100)
7378
{
7479
for (i = 0; i < so->totalentries; i++)
7580
{

0 commit comments

Comments
 (0)