Skip to content

Commit 9f3665f

Browse files
Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop caring about stats involving the number of inserted tuples). Also remove the vacuum_cleanup_index_scale_factor GUC/param on the master branch (though just disable them on postgres 13). The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM partially responsible for deciding when pg_class.reltuples stats needed to be updated. This seems contrary to the spirit of the index AM API, though -- it is not actually necessary for an index AM's bulk delete and cleanup callbacks to provide accurate stats when it happens to be inconvenient. The core code owns that. (Index AMs have the authority to perform or not perform certain kinds of deferred cleanup based on their own considerations, such as page deletion and recycling, but that has little to do with pg_class.reltuples/num_index_tuples.) This issue was fairly harmless until the introduction of the autovacuum_vacuum_insert_threshold feature by commit b07642d, which had an undesirable interaction with the vacuum_cleanup_index_scale_factor mechanism: it made insert-driven autovacuums perform full index scans, even though there is no real benefit to doing so. This has been tied to a regression with an append-only insert benchmark [1]. Also have remaining cases that perform a full scan of an index during a cleanup-only nbtree VACUUM indicate that the final tuple count is only an estimate. This prevents vacuumlazy.c from setting the index's pg_class.reltuples in those cases (it will now only update pg_class when vacuumlazy.c had TIDs for nbtree to bulk delete). This arguably fixes an oversight in deduplication-related bugfix commit 48e1291. [1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
1 parent 845ac7f commit 9f3665f

File tree

19 files changed

+42
-222
lines changed

19 files changed

+42
-222
lines changed

doc/src/sgml/config.sgml

Lines changed: 0 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -8544,46 +8544,6 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
85448544
</listitem>
85458545
</varlistentry>
85468546

8547-
<varlistentry id="guc-vacuum-cleanup-index-scale-factor" xreflabel="vacuum_cleanup_index_scale_factor">
8548-
<term><varname>vacuum_cleanup_index_scale_factor</varname> (<type>floating point</type>)
8549-
<indexterm>
8550-
<primary><varname>vacuum_cleanup_index_scale_factor</varname></primary>
8551-
<secondary>configuration parameter</secondary>
8552-
</indexterm>
8553-
</term>
8554-
<listitem>
8555-
<para>
8556-
Specifies the fraction of the total number of heap tuples counted in
8557-
the previous statistics collection that can be inserted without
8558-
incurring an index scan at the <command>VACUUM</command> cleanup stage.
8559-
This setting currently applies to B-tree indexes only.
8560-
</para>
8561-
8562-
<para>
8563-
If no tuples were deleted from the heap, B-tree indexes are still
8564-
scanned at the <command>VACUUM</command> cleanup stage when the
8565-
index's statistics are stale. Index statistics are considered
8566-
stale if the number of newly inserted tuples exceeds the
8567-
<varname>vacuum_cleanup_index_scale_factor</varname>
8568-
fraction of the total number of heap tuples detected by the previous
8569-
statistics collection. The total number of heap tuples is stored in
8570-
the index meta-page. Note that the meta-page does not include this data
8571-
until <command>VACUUM</command> finds no dead tuples, so B-tree index
8572-
scan at the cleanup stage can only be skipped if the second and
8573-
subsequent <command>VACUUM</command> cycles detect no dead tuples.
8574-
</para>
8575-
8576-
<para>
8577-
The value can range from <literal>0</literal> to
8578-
<literal>10000000000</literal>.
8579-
When <varname>vacuum_cleanup_index_scale_factor</varname> is set to
8580-
<literal>0</literal>, index scans are never skipped during
8581-
<command>VACUUM</command> cleanup. The default value is <literal>0.1</literal>.
8582-
</para>
8583-
8584-
</listitem>
8585-
</varlistentry>
8586-
85878547
<varlistentry id="guc-bytea-output" xreflabel="bytea_output">
85888548
<term><varname>bytea_output</varname> (<type>enum</type>)
85898549
<indexterm>

doc/src/sgml/ref/create_index.sgml

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -456,20 +456,6 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
456456
</note>
457457
</listitem>
458458
</varlistentry>
459-
460-
<varlistentry id="index-reloption-vacuum-cleanup-index-scale-factor" xreflabel="vacuum_cleanup_index_scale_factor">
461-
<term><literal>vacuum_cleanup_index_scale_factor</literal> (<type>floating point</type>)
462-
<indexterm>
463-
<primary><varname>vacuum_cleanup_index_scale_factor</varname></primary>
464-
<secondary>storage parameter</secondary>
465-
</indexterm>
466-
</term>
467-
<listitem>
468-
<para>
469-
Per-index value for <xref linkend="guc-vacuum-cleanup-index-scale-factor"/>.
470-
</para>
471-
</listitem>
472-
</varlistentry>
473459
</variablelist>
474460

475461
<para>

src/backend/access/common/reloptions.c

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -461,15 +461,6 @@ static relopt_real realRelOpts[] =
461461
},
462462
0, -1.0, DBL_MAX
463463
},
464-
{
465-
{
466-
"vacuum_cleanup_index_scale_factor",
467-
"Number of tuple inserts prior to index cleanup as a fraction of reltuples.",
468-
RELOPT_KIND_BTREE,
469-
ShareUpdateExclusiveLock
470-
},
471-
-1, 0.0, 1e10
472-
},
473464
/* list terminator */
474465
{{NULL}}
475466
};

src/backend/access/nbtree/nbtinsert.c

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1332,8 +1332,6 @@ _bt_insertonpg(Relation rel,
13321332
xlmeta.fastroot = metad->btm_fastroot;
13331333
xlmeta.fastlevel = metad->btm_fastlevel;
13341334
xlmeta.last_cleanup_num_delpages = metad->btm_last_cleanup_num_delpages;
1335-
xlmeta.last_cleanup_num_heap_tuples =
1336-
metad->btm_last_cleanup_num_heap_tuples;
13371335
xlmeta.allequalimage = metad->btm_allequalimage;
13381336

13391337
XLogRegisterBuffer(2, metabuf,
@@ -2549,7 +2547,6 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
25492547
md.fastroot = rootblknum;
25502548
md.fastlevel = metad->btm_level;
25512549
md.last_cleanup_num_delpages = metad->btm_last_cleanup_num_delpages;
2552-
md.last_cleanup_num_heap_tuples = metad->btm_last_cleanup_num_heap_tuples;
25532550
md.allequalimage = metad->btm_allequalimage;
25542551

25552552
XLogRegisterBufData(2, (char *) &md, sizeof(xl_btree_metadata));

src/backend/access/nbtree/nbtpage.c

Lines changed: 13 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -175,26 +175,15 @@ _bt_getmeta(Relation rel, Buffer metabuf)
175175
* _bt_vacuum_needs_cleanup() to decide whether or not a btvacuumscan()
176176
* call should go ahead for an entire VACUUM operation.
177177
*
178-
* See btvacuumcleanup() and _bt_vacuum_needs_cleanup() for details of
179-
* the two fields that we maintain here.
180-
*
181-
* The information that we maintain for btvacuumcleanup() describes the
182-
* state of the index (as well as the table it indexes) just _after_ the
183-
* ongoing VACUUM operation. The next _bt_vacuum_needs_cleanup() call
184-
* will consider the information we saved for it during the next VACUUM
185-
* operation (assuming that there will be no btbulkdelete() call during
186-
* the next VACUUM operation -- if there is then the question of skipping
187-
* btvacuumscan() doesn't even arise).
178+
* See btvacuumcleanup() and _bt_vacuum_needs_cleanup() for the
179+
* definition of num_delpages.
188180
*/
189181
void
190-
_bt_set_cleanup_info(Relation rel, BlockNumber num_delpages,
191-
float8 num_heap_tuples)
182+
_bt_set_cleanup_info(Relation rel, BlockNumber num_delpages)
192183
{
193184
Buffer metabuf;
194185
Page metapg;
195186
BTMetaPageData *metad;
196-
bool rewrite = false;
197-
XLogRecPtr recptr;
198187

199188
/*
200189
* On-disk compatibility note: The btm_last_cleanup_num_delpages metapage
@@ -209,21 +198,20 @@ _bt_set_cleanup_info(Relation rel, BlockNumber num_delpages,
209198
* in reality there are only one or two. The worst that can happen is
210199
* that there will be a call to btvacuumscan a little earlier, which will
211200
* set btm_last_cleanup_num_delpages to a sane value when we're called.
201+
*
202+
* Note also that the metapage's btm_last_cleanup_num_heap_tuples field is
203+
* no longer used as of PostgreSQL 14. We set it to -1.0 on rewrite, just
204+
* to be consistent.
212205
*/
213206
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
214207
metapg = BufferGetPage(metabuf);
215208
metad = BTPageGetMeta(metapg);
216209

217-
/* Always dynamically upgrade index/metapage when BTREE_MIN_VERSION */
218-
if (metad->btm_version < BTREE_NOVAC_VERSION)
219-
rewrite = true;
220-
else if (metad->btm_last_cleanup_num_delpages != num_delpages)
221-
rewrite = true;
222-
else if (metad->btm_last_cleanup_num_heap_tuples != num_heap_tuples)
223-
rewrite = true;
224-
225-
if (!rewrite)
210+
/* Don't miss chance to upgrade index/metapage when BTREE_MIN_VERSION */
211+
if (metad->btm_version >= BTREE_NOVAC_VERSION &&
212+
metad->btm_last_cleanup_num_delpages == num_delpages)
226213
{
214+
/* Usually means index continues to have num_delpages of 0 */
227215
_bt_relbuf(rel, metabuf);
228216
return;
229217
}
@@ -240,13 +228,14 @@ _bt_set_cleanup_info(Relation rel, BlockNumber num_delpages,
240228

241229
/* update cleanup-related information */
242230
metad->btm_last_cleanup_num_delpages = num_delpages;
243-
metad->btm_last_cleanup_num_heap_tuples = num_heap_tuples;
231+
metad->btm_last_cleanup_num_heap_tuples = -1.0;
244232
MarkBufferDirty(metabuf);
245233

246234
/* write wal record if needed */
247235
if (RelationNeedsWAL(rel))
248236
{
249237
xl_btree_metadata md;
238+
XLogRecPtr recptr;
250239

251240
XLogBeginInsert();
252241
XLogRegisterBuffer(0, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
@@ -258,7 +247,6 @@ _bt_set_cleanup_info(Relation rel, BlockNumber num_delpages,
258247
md.fastroot = metad->btm_fastroot;
259248
md.fastlevel = metad->btm_fastlevel;
260249
md.last_cleanup_num_delpages = num_delpages;
261-
md.last_cleanup_num_heap_tuples = num_heap_tuples;
262250
md.allequalimage = metad->btm_allequalimage;
263251

264252
XLogRegisterBufData(0, (char *) &md, sizeof(xl_btree_metadata));
@@ -443,7 +431,6 @@ _bt_getroot(Relation rel, int access)
443431
md.fastroot = rootblkno;
444432
md.fastlevel = 0;
445433
md.last_cleanup_num_delpages = 0;
446-
md.last_cleanup_num_heap_tuples = -1.0;
447434
md.allequalimage = metad->btm_allequalimage;
448435

449436
XLogRegisterBufData(2, (char *) &md, sizeof(xl_btree_metadata));
@@ -2632,7 +2619,6 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
26322619
xlmeta.fastroot = metad->btm_fastroot;
26332620
xlmeta.fastlevel = metad->btm_fastlevel;
26342621
xlmeta.last_cleanup_num_delpages = metad->btm_last_cleanup_num_delpages;
2635-
xlmeta.last_cleanup_num_heap_tuples = metad->btm_last_cleanup_num_heap_tuples;
26362622
xlmeta.allequalimage = metad->btm_allequalimage;
26372623

26382624
XLogRegisterBufData(4, (char *) &xlmeta, sizeof(xl_btree_metadata));

src/backend/access/nbtree/nbtree.c

Lines changed: 21 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -789,11 +789,8 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
789789
Buffer metabuf;
790790
Page metapg;
791791
BTMetaPageData *metad;
792-
BTOptions *relopts;
793-
float8 cleanup_scale_factor;
794792
uint32 btm_version;
795793
BlockNumber prev_num_delpages;
796-
float8 prev_num_heap_tuples;
797794

798795
/*
799796
* Copy details from metapage to local variables quickly.
@@ -816,32 +813,8 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
816813
}
817814

818815
prev_num_delpages = metad->btm_last_cleanup_num_delpages;
819-
prev_num_heap_tuples = metad->btm_last_cleanup_num_heap_tuples;
820816
_bt_relbuf(info->index, metabuf);
821817

822-
/*
823-
* If the underlying table has received a sufficiently high number of
824-
* insertions since the last VACUUM operation that called btvacuumscan(),
825-
* then have the current VACUUM operation call btvacuumscan() now. This
826-
* happens when the statistics are deemed stale.
827-
*
828-
* XXX: We should have a more principled way of determining what
829-
* "staleness" means. The vacuum_cleanup_index_scale_factor GUC (and the
830-
* index-level storage param) seem hard to tune in a principled way.
831-
*/
832-
relopts = (BTOptions *) info->index->rd_options;
833-
cleanup_scale_factor = (relopts &&
834-
relopts->vacuum_cleanup_index_scale_factor >= 0)
835-
? relopts->vacuum_cleanup_index_scale_factor
836-
: vacuum_cleanup_index_scale_factor;
837-
838-
if (cleanup_scale_factor <= 0 ||
839-
info->num_heap_tuples < 0 ||
840-
prev_num_heap_tuples <= 0 ||
841-
(info->num_heap_tuples - prev_num_heap_tuples) /
842-
prev_num_heap_tuples >= cleanup_scale_factor)
843-
return true;
844-
845818
/*
846819
* Trigger cleanup in rare cases where prev_num_delpages exceeds 5% of the
847820
* total size of the index. We can reasonably expect (though are not
@@ -925,48 +898,45 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
925898

926899
/*
927900
* Since we aren't going to actually delete any leaf items, there's no
928-
* need to go through all the vacuum-cycle-ID pushups here
901+
* need to go through all the vacuum-cycle-ID pushups here.
902+
*
903+
* Posting list tuples are a source of inaccuracy for cleanup-only
904+
* scans. btvacuumscan() will assume that the number of index tuples
905+
* from each page can be used as num_index_tuples, even though
906+
* num_index_tuples is supposed to represent the number of TIDs in the
907+
* index. This naive approach can underestimate the number of tuples
908+
* in the index significantly.
909+
*
910+
* We handle the problem by making num_index_tuples an estimate in
911+
* cleanup-only case.
929912
*/
930913
stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
931914
btvacuumscan(info, stats, NULL, NULL, 0);
915+
stats->estimated_count = true;
932916
}
933917

934918
/*
935919
* By here, we know for sure that this VACUUM operation won't be skipping
936-
* its btvacuumscan() call. Maintain the count of the current number of
937-
* heap tuples in the metapage. Also maintain the num_delpages value.
920+
* its btvacuumscan() call. Maintain num_delpages value in metapage.
938921
* This information will be used by _bt_vacuum_needs_cleanup() during
939922
* future VACUUM operations that don't need to call btbulkdelete().
940923
*
941924
* num_delpages is the number of deleted pages now in the index that were
942925
* not safe to place in the FSM to be recycled just yet. We expect that
943926
* it will almost certainly be possible to place all of these pages in the
944-
* FSM during the next VACUUM operation. That factor alone might cause
945-
* _bt_vacuum_needs_cleanup() to force the next VACUUM to proceed with a
946-
* btvacuumscan() call.
947-
*
948-
* Note: We must delay the _bt_set_cleanup_info() call until this late
949-
* stage of VACUUM (the btvacuumcleanup() phase), to keep num_heap_tuples
950-
* accurate. The btbulkdelete()-time num_heap_tuples value is generally
951-
* just pg_class.reltuples for the heap relation _before_ VACUUM began.
952-
* In general cleanup info should describe the state of the index/table
953-
* _after_ VACUUM finishes.
927+
* FSM during the next VACUUM operation. _bt_vacuum_needs_cleanup() will
928+
* force the next VACUUM to consider this before allowing btvacuumscan()
929+
* to be skipped entirely.
954930
*/
955931
Assert(stats->pages_deleted >= stats->pages_free);
956932
num_delpages = stats->pages_deleted - stats->pages_free;
957-
_bt_set_cleanup_info(info->index, num_delpages, info->num_heap_tuples);
933+
_bt_set_cleanup_info(info->index, num_delpages);
958934

959935
/*
960936
* It's quite possible for us to be fooled by concurrent page splits into
961937
* double-counting some index tuples, so disbelieve any total that exceeds
962938
* the underlying heap's count ... if we know that accurately. Otherwise
963939
* this might just make matters worse.
964-
*
965-
* Posting list tuples are another source of inaccuracy. Cleanup-only
966-
* btvacuumscan calls assume that the number of index tuples can be used
967-
* as num_index_tuples, even though num_index_tuples is supposed to
968-
* represent the number of TIDs in the index. This naive approach can
969-
* underestimate the number of tuples in the index.
970940
*/
971941
if (!info->estimated_count)
972942
{
@@ -1016,7 +986,6 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
1016986
* pages in the index at the end of the VACUUM command.)
1017987
*/
1018988
stats->num_pages = 0;
1019-
stats->estimated_count = false;
1020989
stats->num_index_tuples = 0;
1021990
stats->pages_deleted = 0;
1022991
stats->pages_free = 0;
@@ -1421,7 +1390,10 @@ btvacuumpage(BTVacState *vstate, BlockNumber scanblkno)
14211390
* We don't count the number of live TIDs during cleanup-only calls to
14221391
* btvacuumscan (i.e. when callback is not set). We count the number
14231392
* of index tuples directly instead. This avoids the expense of
1424-
* directly examining all of the tuples on each page.
1393+
* directly examining all of the tuples on each page. VACUUM will
1394+
* treat num_index_tuples as an estimate in cleanup-only case, so it
1395+
* doesn't matter that this underestimates num_index_tuples
1396+
* significantly in some cases.
14251397
*/
14261398
if (minoff > maxoff)
14271399
attempt_pagedel = (blkno == scanblkno);

src/backend/access/nbtree/nbtutils.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2105,8 +2105,6 @@ btoptions(Datum reloptions, bool validate)
21052105
{
21062106
static const relopt_parse_elt tab[] = {
21072107
{"fillfactor", RELOPT_TYPE_INT, offsetof(BTOptions, fillfactor)},
2108-
{"vacuum_cleanup_index_scale_factor", RELOPT_TYPE_REAL,
2109-
offsetof(BTOptions, vacuum_cleanup_index_scale_factor)},
21102108
{"deduplicate_items", RELOPT_TYPE_BOOL,
21112109
offsetof(BTOptions, deduplicate_items)}
21122110

src/backend/access/nbtree/nbtxlog.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ _bt_restore_meta(XLogReaderState *record, uint8 block_id)
113113
/* Cannot log BTREE_MIN_VERSION index metapage without upgrade */
114114
Assert(md->btm_version >= BTREE_NOVAC_VERSION);
115115
md->btm_last_cleanup_num_delpages = xlrec->last_cleanup_num_delpages;
116-
md->btm_last_cleanup_num_heap_tuples = xlrec->last_cleanup_num_heap_tuples;
116+
md->btm_last_cleanup_num_heap_tuples = -1.0;
117117
md->btm_allequalimage = xlrec->allequalimage;
118118

119119
pageop = (BTPageOpaque) PageGetSpecialPointer(metapg);

src/backend/access/rmgrdesc/nbtdesc.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -113,9 +113,8 @@ btree_desc(StringInfo buf, XLogReaderState *record)
113113

114114
xlrec = (xl_btree_metadata *) XLogRecGetBlockData(record, 0,
115115
NULL);
116-
appendStringInfo(buf, "last_cleanup_num_delpages %u; last_cleanup_num_heap_tuples: %f",
117-
xlrec->last_cleanup_num_delpages,
118-
xlrec->last_cleanup_num_heap_tuples);
116+
appendStringInfo(buf, "last_cleanup_num_delpages %u",
117+
xlrec->last_cleanup_num_delpages);
119118
break;
120119
}
121120
}

src/backend/utils/init/globals.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -148,5 +148,3 @@ int64 VacuumPageDirty = 0;
148148

149149
int VacuumCostBalance = 0; /* working state for vacuum */
150150
bool VacuumCostActive = false;
151-
152-
double vacuum_cleanup_index_scale_factor;

src/backend/utils/misc/guc.c

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3703,16 +3703,6 @@ static struct config_real ConfigureNamesReal[] =
37033703
NULL, NULL, NULL
37043704
},
37053705

3706-
{
3707-
{"vacuum_cleanup_index_scale_factor", PGC_USERSET, CLIENT_CONN_STATEMENT,
3708-
gettext_noop("Number of tuple inserts prior to index cleanup as a fraction of reltuples."),
3709-
NULL
3710-
},
3711-
&vacuum_cleanup_index_scale_factor,
3712-
0.1, 0.0, 1e10,
3713-
NULL, NULL, NULL
3714-
},
3715-
37163706
{
37173707
{"log_statement_sample_rate", PGC_SUSET, LOGGING_WHEN,
37183708
gettext_noop("Fraction of statements exceeding log_min_duration_sample to be logged."),

src/backend/utils/misc/postgresql.conf.sample

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -672,9 +672,6 @@
672672
#vacuum_freeze_table_age = 150000000
673673
#vacuum_multixact_freeze_min_age = 5000000
674674
#vacuum_multixact_freeze_table_age = 150000000
675-
#vacuum_cleanup_index_scale_factor = 0.1 # fraction of total number of tuples
676-
# before index cleanup, 0 always performs
677-
# index cleanup
678675
#bytea_output = 'hex' # hex, escape
679676
#xmlbinary = 'base64'
680677
#xmloption = 'content'

0 commit comments

Comments
 (0)