Skip to content

Commit 1d6a03e

Browse files
committed
Fix race conditions with drop of reused pgstats entries
This fixes a set of race conditions with cumulative statistics where a shared stats entry could be dropped while it should still be valid in the event when it is reused: an entry may refer to a different object but requires the same hash key. This can happen with various stats kinds, like: - Replication slots that compute internally an index number, for different slot names. - Stats kinds that use an OID in the object key, where a wraparound causes the same key to be used if an OID is used for the same object. - As of PostgreSQL 18, custom pgstats kinds could also be an issue, depending on their implementation. This issue is fixed by introducing a counter called "generation" in the shared entries via PgStatShared_HashEntry, initialized at 0 when an entry is created and incremented when the same entry is reused, to avoid concurrent issues on drop because of other backends still holding a reference to it. This "generation" is copied to the local copy that a backend holds when looking at an object, then cross-checked with the shared entry to make sure that the entry is not dropped even if its "refcount" justifies that if it has been reused. This problem could show up when a backend shuts down and needs to discard any entries it still holds, causing statistics to be removed when they should not, or even an assertion failure. Another report involved a failure in a standby after an OID wraparound, where the startup process would FATAL on a "can only drop stats once", stopping recovery abruptly. The buildfarm has been sporadically complaining about the problem, as well, but the window is hard to reach with the in-core tests. Note that the issue can be reproduced easily by adding a sleep before dshash_find() in pgstat_release_entry_ref() to enlarge the problematic window while repeating test_decoding's isolation test oldest_xmin a couple of times, for example, as pointed out by Alexander Lakhin. Reported-by: Alexander Lakhin, Peter Smith Author: Kyotaro Horiguchi, Michael Paquier Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/CAA4eK1KxuMVyAryz_Vk5yq3ejgKYcL6F45Hj9ZnMNBS-g+PuZg@mail.gmail.com Discussion: https://postgr.es/m/17947-b9554521ad963c9c@postgresql.org Backpatch-through: 15
1 parent 73731b2 commit 1d6a03e

File tree

2 files changed

+54
-5
lines changed

2 files changed

+54
-5
lines changed

src/backend/utils/activity/pgstat_shmem.c

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,11 @@ pgstat_init_entry(PgStat_Kind kind,
278278
* further if a longer lived reference is needed.
279279
*/
280280
pg_atomic_init_u32(&shhashent->refcount, 1);
281+
282+
/*
283+
* Initialize "generation" to 0, as freshly created.
284+
*/
285+
pg_atomic_init_u32(&shhashent->generation, 0);
281286
shhashent->dropped = false;
282287

283288
chunk = dsa_allocate0(pgStatLocal.dsa, pgstat_get_kind_info(kind)->shared_size);
@@ -301,6 +306,12 @@ pgstat_reinit_entry(PgStat_Kind kind, PgStatShared_HashEntry *shhashent)
301306

302307
/* mark as not dropped anymore */
303308
pg_atomic_fetch_add_u32(&shhashent->refcount, 1);
309+
310+
/*
311+
* Increment "generation", to let any backend with local references know
312+
* that what they point to is outdated.
313+
*/
314+
pg_atomic_fetch_add_u32(&shhashent->generation, 1);
304315
shhashent->dropped = false;
305316

306317
/* reinitialize content */
@@ -341,6 +352,7 @@ pgstat_acquire_entry_ref(PgStat_EntryRef *entry_ref,
341352

342353
entry_ref->shared_stats = shheader;
343354
entry_ref->shared_entry = shhashent;
355+
entry_ref->generation = pg_atomic_read_u32(&shhashent->generation);
344356
}
345357

346358
/*
@@ -506,7 +518,8 @@ pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, bool create,
506518
* case are replication slot stats, where a new slot can be
507519
* created with the same index just after dropping. But oid
508520
* wraparound can lead to other cases as well. We just reset the
509-
* stats to their plain state.
521+
* stats to their plain state, while incrementing its "generation"
522+
* in the shared entry for any remaining local references.
510523
*/
511524
shheader = pgstat_reinit_entry(kind, shhashent);
512525
pgstat_acquire_entry_ref(entry_ref, shhashent, shheader);
@@ -573,10 +586,27 @@ pgstat_release_entry_ref(PgStat_HashKey key, PgStat_EntryRef *entry_ref,
573586
if (!shent)
574587
elog(ERROR, "could not find just referenced shared stats entry");
575588

576-
Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0);
577-
Assert(entry_ref->shared_entry == shent);
578-
579-
pgstat_free_entry(shent, NULL);
589+
/*
590+
* This entry may have been reinitialized while trying to release
591+
* it, so double-check that it has not been reused while holding a
592+
* lock on its shared entry.
593+
*/
594+
if (pg_atomic_read_u32(&entry_ref->shared_entry->generation) ==
595+
entry_ref->generation)
596+
{
597+
/* Same "generation", so we're OK with the removal */
598+
Assert(pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0);
599+
Assert(entry_ref->shared_entry == shent);
600+
pgstat_free_entry(shent, NULL);
601+
}
602+
else
603+
{
604+
/*
605+
* Shared stats entry has been reinitialized, so do not drop
606+
* its shared entry, only release its lock.
607+
*/
608+
dshash_release_lock(pgStatLocal.shared_hash, shent);
609+
}
580610
}
581611
}
582612

src/include/utils/pgstat_internal.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,19 @@ typedef struct PgStatShared_HashEntry
9393
*/
9494
pg_atomic_uint32 refcount;
9595

96+
/*
97+
* Counter tracking the number of times the entry has been reused.
98+
*
99+
* Set to 0 when the entry is created, and incremented by one each time
100+
* the shared entry is reinitialized with pgstat_reinit_entry().
101+
*
102+
* May only be incremented / decremented while holding at least a shared
103+
* lock on the dshash partition containing the entry. Like refcount, it
104+
* needs to be an atomic variable because multiple backends can increment
105+
* the generation with just a shared lock.
106+
*/
107+
pg_atomic_uint32 generation;
108+
96109
/*
97110
* Pointer to shared stats. The stats entry always starts with
98111
* PgStatShared_Common, embedded in a larger struct containing the
@@ -132,6 +145,12 @@ typedef struct PgStat_EntryRef
132145
*/
133146
PgStatShared_Common *shared_stats;
134147

148+
/*
149+
* Copy of PgStatShared_HashEntry->generation, keeping locally track of
150+
* the shared stats entry "generation" retrieved (number of times reused).
151+
*/
152+
uint32 generation;
153+
135154
/*
136155
* Pending statistics data that will need to be flushed to shared memory
137156
* stats eventually. Each stats kind utilizing pending data defines what

0 commit comments

Comments
 (0)