Skip to content

Commit 17d787a

Browse files
committed
Items on GIN data pages are no longer always 6 bytes; update gincostestimate.
Also improve the comments a bit.
1 parent 588fb50 commit 17d787a

File tree

1 file changed

+16
-17
lines changed

1 file changed

+16
-17
lines changed

src/backend/utils/adt/selfuncs.c

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -7291,31 +7291,30 @@ gincostestimate(PG_FUNCTION_ARGS)
72917291
*indexStartupCost = (entryPagesFetched + dataPagesFetched) * spc_random_page_cost;
72927292

72937293
/*
7294-
* Now we compute the number of data pages fetched while the scan
7295-
* proceeds.
7294+
* Now compute the number of data pages fetched during the scan.
7295+
*
7296+
* We assume every entry to have the same number of items, and that there
7297+
* is no overlap between them. (XXX: tsvector and array opclasses collect
7298+
* statistics on the frequency of individual keys; it would be nice to
7299+
* use those here.)
72967300
*/
7297-
7298-
/* data pages scanned for each exact (non-partial) matched entry */
72997301
dataPagesFetched = ceil(numDataPages * counts.exactEntries / numEntries);
73007302

73017303
/*
7302-
* Estimate number of data pages read, using selectivity estimation and
7303-
* capacity of data page.
7304+
* If there is a lot of overlap among the entries, in particular if one
7305+
* of the entries is very frequent, the above calculation can grossly
7306+
* under-estimate. As a simple cross-check, calculate a lower bound
7307+
* based on the overall selectivity of the quals. At a minimum, we must
7308+
* read one item pointer for each matching entry.
7309+
*
7310+
* The width of each item pointer varies, based on the level of
7311+
* compression. We don't have statistics on that, but an average of
7312+
* around 3 bytes per item is fairly typical.
73047313
*/
73057314
dataPagesFetchedBySel = ceil(*indexSelectivity *
7306-
(numTuples / (BLCKSZ / SizeOfIptrData)));
7307-
7315+
(numTuples / (BLCKSZ / 3)));
73087316
if (dataPagesFetchedBySel > dataPagesFetched)
7309-
{
7310-
/*
7311-
* At least one of entries is very frequent and, unfortunately, we
7312-
* couldn't get statistic about entries (only tsvector has such
7313-
* statistics). So, we obviously have too small estimation of pages
7314-
* fetched from data tree. Re-estimate it from known capacity of data
7315-
* pages
7316-
*/
73177317
dataPagesFetched = dataPagesFetchedBySel;
7318-
}
73197318

73207319
/* Account for cache effects, the same as above */
73217320
if (outer_scans > 1 || counts.arrayScans > 1)

0 commit comments

Comments
 (0)