Skip to content

Commit 61a86ed

Browse files
Don't overlook indexes during parallel VACUUM.
Commit b4af70c, which simplified state managed by VACUUM, performed refactoring of parallel VACUUM in passing. Confusion about the exact details of the tasks that the leader process is responsible for led to code that made it possible for parallel VACUUM to miss a subset of the table's indexes entirely. Specifically, indexes that fell under the min_parallel_index_scan_size size cutoff were missed. These indexes are supposed to be vacuumed by the leader (alongside any parallel unsafe indexes), but weren't vacuumed at all. Affected indexes could easily end up with duplicate heap TIDs, once heap TIDs were recycled for new heap tuples. This had generic symptoms that might be seen with almost any index corruption involving structural inconsistencies between an index and its table. To fix, make sure that the parallel VACUUM leader process performs any required index vacuuming for indexes that happen to be below the size cutoff. Also document the design of parallel VACUUM with these below-size-cutoff indexes. It's unclear how many users might be affected by this bug. There had to be at least three indexes on the table to hit the bug: a smaller index, plus at least two additional indexes that themselves exceed the size cutoff. Cases with just one additional index would not run into trouble, since the parallel VACUUM cost model requires two larger-than-cutoff indexes on the table to apply any parallel processing. Note also that autovacuum was not affected, since it never uses parallel processing. Test case based on tests from a larger patch to test parallel VACUUM by Masahiko Sawada. Many thanks to Kamigishi Rei for her invaluable help with tracking this problem down. Author: Peter Geoghegan <pg@bowt.ie> Author: Masahiko Sawada <sawada.mshk@gmail.com> Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp> Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk> Diagnosed-By: Andres Freund <andres@anarazel.de> Bug: #17245 Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de Backpatch: 14-, where the refactoring commit appears.
1 parent 16a5677 commit 61a86ed

File tree

5 files changed

+132
-26
lines changed

5 files changed

+132
-26
lines changed

src/backend/access/heap/vacuumlazy.c

Lines changed: 35 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -454,7 +454,7 @@ static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
454454
TransactionId *visibility_cutoff_xid, bool *all_frozen);
455455
static int compute_parallel_vacuum_workers(LVRelState *vacrel,
456456
int nrequested,
457-
bool *can_parallel_vacuum);
457+
bool *will_parallel_vacuum);
458458
static void update_index_statistics(LVRelState *vacrel);
459459
static LVParallelState *begin_parallel_vacuum(LVRelState *vacrel,
460460
BlockNumber nblocks,
@@ -2644,8 +2644,8 @@ do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
26442644
vacrel->lps->lvshared->first_time = false;
26452645

26462646
/*
2647-
* We can only provide an approximate value of num_heap_tuples in vacuum
2648-
* cases.
2647+
* We can only provide an approximate value of num_heap_tuples, at least
2648+
* for now. Matches serial VACUUM case.
26492649
*/
26502650
vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
26512651
vacrel->lps->lvshared->estimated_count = true;
@@ -2833,7 +2833,7 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
28332833
if (idx >= vacrel->nindexes)
28342834
break;
28352835

2836-
/* Get the index statistics of this index from DSM */
2836+
/* Get the index statistics space from DSM, if any */
28372837
shared_istat = parallel_stats_for_idx(lvshared, idx);
28382838

28392839
/* Skip indexes not participating in parallelism */
@@ -2866,8 +2866,15 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
28662866
}
28672867

28682868
/*
2869-
* Vacuum or cleanup indexes that can be processed by only the leader process
2870-
* because these indexes don't support parallel operation at that phase.
2869+
* Perform parallel processing of indexes in leader process.
2870+
*
2871+
* Handles index vacuuming (or index cleanup) for indexes that are not
2872+
* parallel safe. It's possible that this will vary for a given index, based
2873+
* on details like whether we're performing for_cleanup processing right now.
2874+
*
2875+
* Also performs processing of smaller indexes that fell under the size cutoff
2876+
* enforced by compute_parallel_vacuum_workers(). These indexes never get a
2877+
* slot for statistics in DSM.
28712878
*/
28722879
static void
28732880
do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
@@ -2887,17 +2894,15 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
28872894
IndexBulkDeleteResult *istat;
28882895

28892896
shared_istat = parallel_stats_for_idx(lvshared, idx);
2890-
2891-
/* Skip already-complete indexes */
2892-
if (shared_istat != NULL)
2893-
continue;
2894-
28952897
indrel = vacrel->indrels[idx];
28962898

28972899
/*
2898-
* We're only here for the unsafe indexes
2900+
* We're only here for the indexes that parallel workers won't
2901+
* process. Note that the shared_istat test ensures that we process
2902+
* indexes that fell under initial size cutoff.
28992903
*/
2900-
if (parallel_processing_is_safe(indrel, lvshared))
2904+
if (shared_istat != NULL &&
2905+
parallel_processing_is_safe(indrel, lvshared))
29012906
continue;
29022907

29032908
/* Do vacuum or cleanup of the index */
@@ -3734,12 +3739,12 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
37343739
* nrequested is the number of parallel workers that user requested. If
37353740
* nrequested is 0, we compute the parallel degree based on nindexes, that is
37363741
* the number of indexes that support parallel vacuum. This function also
3737-
* sets can_parallel_vacuum to remember indexes that participate in parallel
3742+
* sets will_parallel_vacuum to remember indexes that participate in parallel
37383743
* vacuum.
37393744
*/
37403745
static int
37413746
compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
3742-
bool *can_parallel_vacuum)
3747+
bool *will_parallel_vacuum)
37433748
{
37443749
int nindexes_parallel = 0;
37453750
int nindexes_parallel_bulkdel = 0;
@@ -3765,7 +3770,7 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
37653770
RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
37663771
continue;
37673772

3768-
can_parallel_vacuum[idx] = true;
3773+
will_parallel_vacuum[idx] = true;
37693774

37703775
if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
37713776
nindexes_parallel_bulkdel++;
@@ -3843,7 +3848,7 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
38433848
LVDeadTuples *dead_tuples;
38443849
BufferUsage *buffer_usage;
38453850
WalUsage *wal_usage;
3846-
bool *can_parallel_vacuum;
3851+
bool *will_parallel_vacuum;
38473852
long maxtuples;
38483853
Size est_shared;
38493854
Size est_deadtuples;
@@ -3861,15 +3866,15 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
38613866
/*
38623867
* Compute the number of parallel vacuum workers to launch
38633868
*/
3864-
can_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
3869+
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
38653870
parallel_workers = compute_parallel_vacuum_workers(vacrel,
38663871
nrequested,
3867-
can_parallel_vacuum);
3872+
will_parallel_vacuum);
38683873

38693874
/* Can't perform vacuum in parallel */
38703875
if (parallel_workers <= 0)
38713876
{
3872-
pfree(can_parallel_vacuum);
3877+
pfree(will_parallel_vacuum);
38733878
return lps;
38743879
}
38753880

@@ -3897,7 +3902,7 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
38973902
Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
38983903

38993904
/* Skip indexes that don't participate in parallel vacuum */
3900-
if (!can_parallel_vacuum[idx])
3905+
if (!will_parallel_vacuum[idx])
39013906
continue;
39023907

39033908
if (indrel->rd_indam->amusemaintenanceworkmem)
@@ -3974,7 +3979,7 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
39743979
memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
39753980
for (int idx = 0; idx < nindexes; idx++)
39763981
{
3977-
if (!can_parallel_vacuum[idx])
3982+
if (!will_parallel_vacuum[idx])
39783983
continue;
39793984

39803985
/* Set NOT NULL as this index does support parallelism */
@@ -4017,7 +4022,7 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
40174022
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
40184023
}
40194024

4020-
pfree(can_parallel_vacuum);
4025+
pfree(will_parallel_vacuum);
40214026
return lps;
40224027
}
40234028

@@ -4047,8 +4052,8 @@ end_parallel_vacuum(LVRelState *vacrel)
40474052
shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
40484053

40494054
/*
4050-
* Skip unused slot. The statistics of this index are already stored
4051-
* in local memory.
4055+
* Skip index -- it must have been processed by the leader, from
4056+
* inside do_serial_processing_for_unsafe_indexes()
40524057
*/
40534058
if (shared_istat == NULL)
40544059
continue;
@@ -4072,6 +4077,11 @@ end_parallel_vacuum(LVRelState *vacrel)
40724077

40734078
/*
40744079
* Return shared memory statistics for index at offset 'getidx', if any
4080+
*
4081+
* Returning NULL indicates that compute_parallel_vacuum_workers() determined
4082+
* that the index is a totally unsuitable target for all parallel processing
4083+
* up front. For example, the index could be < min_parallel_index_scan_size
4084+
* cutoff.
40754085
*/
40764086
static LVSharedIndStats *
40774087
parallel_stats_for_idx(LVShared *lvshared, int getidx)

src/include/commands/vacuum.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040

4141
/*
4242
* bulkdelete can be performed in parallel. This option can be used by
43-
* IndexAm's that need to scan the index to delete the tuples.
43+
* index AMs that need to scan indexes to delete tuples.
4444
*/
4545
#define VACUUM_OPTION_PARALLEL_BULKDEL (1 << 0)
4646

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
SET max_parallel_maintenance_workers TO 4;
2+
SET min_parallel_index_scan_size TO '128kB';
3+
-- Bug #17245: Make sure that we don't totally fail to VACUUM individual indexes that
4+
-- happen to be below min_parallel_index_scan_size during parallel VACUUM:
5+
CREATE TABLE parallel_vacuum_table (a int) WITH (autovacuum_enabled = off);
6+
INSERT INTO parallel_vacuum_table SELECT i from generate_series(1, 10000) i;
7+
-- Parallel VACUUM will never be used unless there are at least two indexes
8+
-- that exceed min_parallel_index_scan_size. Create two such indexes, and
9+
-- a third index that is smaller than min_parallel_index_scan_size.
10+
CREATE INDEX regular_sized_index ON parallel_vacuum_table(a);
11+
CREATE INDEX typically_sized_index ON parallel_vacuum_table(a);
12+
-- Note: vacuum_in_leader_small_index can apply deduplication, making it ~3x
13+
-- smaller than the other indexes
14+
CREATE INDEX vacuum_in_leader_small_index ON parallel_vacuum_table((1));
15+
-- Verify (as best we can) that the cost model for parallel VACUUM
16+
-- will make our VACUUM run in parallel, while always leaving it up to the
17+
-- parallel leader to handle the vacuum_in_leader_small_index index:
18+
SELECT EXISTS (
19+
SELECT 1
20+
FROM pg_class
21+
WHERE oid = 'vacuum_in_leader_small_index'::regclass AND
22+
pg_relation_size(oid) <
23+
pg_size_bytes(current_setting('min_parallel_index_scan_size'))
24+
) as leader_will_handle_small_index;
25+
leader_will_handle_small_index
26+
--------------------------------
27+
t
28+
(1 row)
29+
30+
SELECT count(*) as trigger_parallel_vacuum_nindexes
31+
FROM pg_class
32+
WHERE oid in ('regular_sized_index'::regclass, 'typically_sized_index'::regclass) AND
33+
pg_relation_size(oid) >=
34+
pg_size_bytes(current_setting('min_parallel_index_scan_size'));
35+
trigger_parallel_vacuum_nindexes
36+
----------------------------------
37+
2
38+
(1 row)
39+
40+
-- Parallel VACUUM with B-Tree page deletions, ambulkdelete calls:
41+
DELETE FROM parallel_vacuum_table;
42+
VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table;
43+
-- Since vacuum_in_leader_small_index uses deduplication, we expect an
44+
-- assertion failure with bug #17245 (in the absence of bugfix):
45+
INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
46+
RESET max_parallel_maintenance_workers;
47+
RESET min_parallel_index_scan_size;
48+
-- Deliberately don't drop table, to get further coverage from tools like
49+
-- pg_amcheck in some testing scenarios

src/test/regress/parallel_schedule

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ test: rules psql psql_crosstab amutils stats_ext collate.linux.utf8
9696
# run by itself so it can run parallel workers
9797
test: select_parallel
9898
test: write_parallel
99+
test: vacuum_parallel
99100

100101
# no relation related tests can be put in this group
101102
test: publication subscription
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
SET max_parallel_maintenance_workers TO 4;
2+
SET min_parallel_index_scan_size TO '128kB';
3+
4+
-- Bug #17245: Make sure that we don't totally fail to VACUUM individual indexes that
5+
-- happen to be below min_parallel_index_scan_size during parallel VACUUM:
6+
CREATE TABLE parallel_vacuum_table (a int) WITH (autovacuum_enabled = off);
7+
INSERT INTO parallel_vacuum_table SELECT i from generate_series(1, 10000) i;
8+
9+
-- Parallel VACUUM will never be used unless there are at least two indexes
10+
-- that exceed min_parallel_index_scan_size. Create two such indexes, and
11+
-- a third index that is smaller than min_parallel_index_scan_size.
12+
CREATE INDEX regular_sized_index ON parallel_vacuum_table(a);
13+
CREATE INDEX typically_sized_index ON parallel_vacuum_table(a);
14+
-- Note: vacuum_in_leader_small_index can apply deduplication, making it ~3x
15+
-- smaller than the other indexes
16+
CREATE INDEX vacuum_in_leader_small_index ON parallel_vacuum_table((1));
17+
18+
-- Verify (as best we can) that the cost model for parallel VACUUM
19+
-- will make our VACUUM run in parallel, while always leaving it up to the
20+
-- parallel leader to handle the vacuum_in_leader_small_index index:
21+
SELECT EXISTS (
22+
SELECT 1
23+
FROM pg_class
24+
WHERE oid = 'vacuum_in_leader_small_index'::regclass AND
25+
pg_relation_size(oid) <
26+
pg_size_bytes(current_setting('min_parallel_index_scan_size'))
27+
) as leader_will_handle_small_index;
28+
SELECT count(*) as trigger_parallel_vacuum_nindexes
29+
FROM pg_class
30+
WHERE oid in ('regular_sized_index'::regclass, 'typically_sized_index'::regclass) AND
31+
pg_relation_size(oid) >=
32+
pg_size_bytes(current_setting('min_parallel_index_scan_size'));
33+
34+
-- Parallel VACUUM with B-Tree page deletions, ambulkdelete calls:
35+
DELETE FROM parallel_vacuum_table;
36+
VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table;
37+
38+
-- Since vacuum_in_leader_small_index uses deduplication, we expect an
39+
-- assertion failure with bug #17245 (in the absence of bugfix):
40+
INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
41+
42+
RESET max_parallel_maintenance_workers;
43+
RESET min_parallel_index_scan_size;
44+
45+
-- Deliberately don't drop table, to get further coverage from tools like
46+
-- pg_amcheck in some testing scenarios

0 commit comments

Comments
 (0)