Skip to content

Commit 3cb6462

Browse files
committed
Use a ResourceOwner to track buffer pins in all cases.
Historically, we've allowed auxiliary processes to take buffer pins without tracking them in a ResourceOwner. However, that creates problems for error recovery. In particular, we've seen multiple reports of assertion crashes in the startup process when it gets an error while holding a buffer pin, as for example if it gets ENOSPC during a write. In a non-assert build, the process would simply exit without releasing the pin at all. We've gotten away with that so far just because a failure exit of the startup process translates to a database crash anyhow; but any similar behavior in other aux processes could result in stuck pins and subsequent problems in vacuum. To improve this, institute a policy that we must *always* have a resowner backing any attempt to pin a buffer, which we can enforce just by removing the previous special-case code in resowner.c. Add infrastructure to make it easy to create a process-lifespan AuxProcessResourceOwner and clear out its contents at appropriate times. Replace existing ad-hoc resowner management in bgwriter.c and other aux processes with that. (Thus, while the startup process gains a resowner where it had none at all before, some other aux process types are replacing an ad-hoc resowner with this code.) Also use the AuxProcessResourceOwner to manage buffer pins taken during StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap process or a standalone backend rather than a true auxiliary process. In passing, remove some other ad-hoc resource owner creations that had gotten cargo-culted into various other places. As far as I can tell that was all unnecessary, and if it had been necessary it was incomplete, due to lacking any provision for clearing those resowners later. (Also worth noting in this connection is that a process that hasn't called InitBufferPoolBackend has no business accessing buffers; so there's more to do than just add the resowner if we want to touch buffers in processes not covered by this patch.) Although this fixes a very old bug, no back-patch, because there's no evidence of any significant problem in non-assert builds. Patch by me, pursuant to a report from Justin Pryzby. Thanks to Robert Haas and Kyotaro Horiguchi for reviews. Discussion: https://postgr.es/m/20180627233939.GA10276@telsasoft.com
1 parent 6b38717 commit 3cb6462

File tree

16 files changed

+142
-90
lines changed

16 files changed

+142
-90
lines changed

src/backend/access/transam/parallel.c

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@
3737
#include "utils/guc.h"
3838
#include "utils/inval.h"
3939
#include "utils/memutils.h"
40-
#include "utils/resowner.h"
4140
#include "utils/snapmgr.h"
4241
#include "utils/typcache.h"
4342

@@ -1220,16 +1219,19 @@ ParallelWorkerMain(Datum main_arg)
12201219
Assert(ParallelWorkerNumber == -1);
12211220
memcpy(&ParallelWorkerNumber, MyBgworkerEntry->bgw_extra, sizeof(int));
12221221

1223-
/* Set up a memory context and resource owner. */
1224-
Assert(CurrentResourceOwner == NULL);
1225-
CurrentResourceOwner = ResourceOwnerCreate(NULL, "parallel toplevel");
1222+
/* Set up a memory context to work in, just for cleanliness. */
12261223
CurrentMemoryContext = AllocSetContextCreate(TopMemoryContext,
12271224
"Parallel worker",
12281225
ALLOCSET_DEFAULT_SIZES);
12291226

12301227
/*
1231-
* Now that we have a resource owner, we can attach to the dynamic shared
1232-
* memory segment and read the table of contents.
1228+
* Attach to the dynamic shared memory segment for the parallel query, and
1229+
* find its table of contents.
1230+
*
1231+
* Note: at this point, we have not created any ResourceOwner in this
1232+
* process. This will result in our DSM mapping surviving until process
1233+
* exit, which is fine. If there were a ResourceOwner, it would acquire
1234+
* ownership of the mapping, but we have no need for that.
12331235
*/
12341236
seg = dsm_attach(DatumGetUInt32(main_arg));
12351237
if (seg == NULL)
@@ -1393,7 +1395,7 @@ ParallelWorkerMain(Datum main_arg)
13931395
/* Must exit parallel mode to pop active snapshot. */
13941396
ExitParallelMode();
13951397

1396-
/* Must pop active snapshot so resowner.c doesn't complain. */
1398+
/* Must pop active snapshot so snapmgr.c doesn't complain. */
13971399
PopActiveSnapshot();
13981400

13991401
/* Shut down the parallel-worker transaction. */

src/backend/access/transam/xlog.c

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6360,6 +6360,15 @@ StartupXLOG(void)
63606360
bool fast_promoted = false;
63616361
struct stat st;
63626362

6363+
/*
6364+
* We should have an aux process resource owner to use, and we should not
6365+
* be in a transaction that's installed some other resowner.
6366+
*/
6367+
Assert(AuxProcessResourceOwner != NULL);
6368+
Assert(CurrentResourceOwner == NULL ||
6369+
CurrentResourceOwner == AuxProcessResourceOwner);
6370+
CurrentResourceOwner = AuxProcessResourceOwner;
6371+
63636372
/*
63646373
* Verify XLOG status looks valid.
63656374
*/
@@ -8472,6 +8481,15 @@ GetNextXidAndEpoch(TransactionId *xid, uint32 *epoch)
84728481
void
84738482
ShutdownXLOG(int code, Datum arg)
84748483
{
8484+
/*
8485+
* We should have an aux process resource owner to use, and we should not
8486+
* be in a transaction that's installed some other resowner.
8487+
*/
8488+
Assert(AuxProcessResourceOwner != NULL);
8489+
Assert(CurrentResourceOwner == NULL ||
8490+
CurrentResourceOwner == AuxProcessResourceOwner);
8491+
CurrentResourceOwner = AuxProcessResourceOwner;
8492+
84758493
/* Don't be chatty in standalone mode */
84768494
ereport(IsPostmasterEnvironment ? LOG : NOTICE,
84778495
(errmsg("shutting down")));

src/backend/bootstrap/bootstrap.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,13 @@ AuxiliaryProcessMain(int argc, char *argv[])
403403
/* finish setting up bufmgr.c */
404404
InitBufferPoolBackend();
405405

406+
/*
407+
* Auxiliary processes don't run transactions, but they may need a
408+
* resource owner anyway to manage buffer pins acquired outside
409+
* transactions (and, perhaps, other things in future).
410+
*/
411+
CreateAuxProcessResourceOwner();
412+
406413
/* Initialize backend status information */
407414
pgstat_initialize();
408415
pgstat_bestart();

src/backend/postmaster/autovacuum.c

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -522,13 +522,9 @@ AutoVacLauncherMain(int argc, char *argv[])
522522
pgstat_report_wait_end();
523523
AbortBufferIO();
524524
UnlockBuffers();
525-
if (CurrentResourceOwner)
526-
{
527-
ResourceOwnerRelease(CurrentResourceOwner,
528-
RESOURCE_RELEASE_BEFORE_LOCKS,
529-
false, true);
530-
/* we needn't bother with the other ResourceOwnerRelease phases */
531-
}
525+
/* this is probably dead code, but let's be safe: */
526+
if (AuxProcessResourceOwner)
527+
ReleaseAuxProcessResources(false);
532528
AtEOXact_Buffers(false);
533529
AtEOXact_SMgr();
534530
AtEOXact_Files(false);

src/backend/postmaster/bgwriter.c

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -141,12 +141,6 @@ BackgroundWriterMain(void)
141141
/* We allow SIGQUIT (quickdie) at all times */
142142
sigdelset(&BlockSig, SIGQUIT);
143143

144-
/*
145-
* Create a resource owner to keep track of our resources (currently only
146-
* buffer pins).
147-
*/
148-
CurrentResourceOwner = ResourceOwnerCreate(NULL, "Background Writer");
149-
150144
/*
151145
* We just started, assume there has been either a shutdown or
152146
* end-of-recovery snapshot.
@@ -191,11 +185,7 @@ BackgroundWriterMain(void)
191185
ConditionVariableCancelSleep();
192186
AbortBufferIO();
193187
UnlockBuffers();
194-
/* buffer pins are released here: */
195-
ResourceOwnerRelease(CurrentResourceOwner,
196-
RESOURCE_RELEASE_BEFORE_LOCKS,
197-
false, true);
198-
/* we needn't bother with the other ResourceOwnerRelease phases */
188+
ReleaseAuxProcessResources(false);
199189
AtEOXact_Buffers(false);
200190
AtEOXact_SMgr();
201191
AtEOXact_Files(false);

src/backend/postmaster/checkpointer.c

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -231,12 +231,6 @@ CheckpointerMain(void)
231231
*/
232232
last_checkpoint_time = last_xlog_switch_time = (pg_time_t) time(NULL);
233233

234-
/*
235-
* Create a resource owner to keep track of our resources (currently only
236-
* buffer pins).
237-
*/
238-
CurrentResourceOwner = ResourceOwnerCreate(NULL, "Checkpointer");
239-
240234
/*
241235
* Create a memory context that we will do all our work in. We do this so
242236
* that we can reset the context during error recovery and thereby avoid
@@ -275,11 +269,7 @@ CheckpointerMain(void)
275269
pgstat_report_wait_end();
276270
AbortBufferIO();
277271
UnlockBuffers();
278-
/* buffer pins are released here: */
279-
ResourceOwnerRelease(CurrentResourceOwner,
280-
RESOURCE_RELEASE_BEFORE_LOCKS,
281-
false, true);
282-
/* we needn't bother with the other ResourceOwnerRelease phases */
272+
ReleaseAuxProcessResources(false);
283273
AtEOXact_Buffers(false);
284274
AtEOXact_SMgr();
285275
AtEOXact_Files(false);

src/backend/postmaster/walwriter.c

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -129,12 +129,6 @@ WalWriterMain(void)
129129
/* We allow SIGQUIT (quickdie) at all times */
130130
sigdelset(&BlockSig, SIGQUIT);
131131

132-
/*
133-
* Create a resource owner to keep track of our resources (not clear that
134-
* we need this, but may as well have one).
135-
*/
136-
CurrentResourceOwner = ResourceOwnerCreate(NULL, "Wal Writer");
137-
138132
/*
139133
* Create a memory context that we will do all our work in. We do this so
140134
* that we can reset the context during error recovery and thereby avoid
@@ -172,11 +166,7 @@ WalWriterMain(void)
172166
pgstat_report_wait_end();
173167
AbortBufferIO();
174168
UnlockBuffers();
175-
/* buffer pins are released here: */
176-
ResourceOwnerRelease(CurrentResourceOwner,
177-
RESOURCE_RELEASE_BEFORE_LOCKS,
178-
false, true);
179-
/* we needn't bother with the other ResourceOwnerRelease phases */
169+
ReleaseAuxProcessResources(false);
180170
AtEOXact_Buffers(false);
181171
AtEOXact_SMgr();
182172
AtEOXact_Files(false);

src/backend/replication/logical/logicalfuncs.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -279,8 +279,6 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
279279
*/
280280
startptr = MyReplicationSlot->data.restart_lsn;
281281

282-
CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner, "logical decoding");
283-
284282
/* invalidate non-timetravel entries */
285283
InvalidateSystemCaches();
286284

@@ -320,6 +318,11 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
320318

321319
tuplestore_donestoring(tupstore);
322320

321+
/*
322+
* Logical decoding could have clobbered CurrentResourceOwner during
323+
* transaction management, so restore the executor's value. (This is
324+
* a kluge, but it's not worth cleaning up right now.)
325+
*/
323326
CurrentResourceOwner = old_resowner;
324327

325328
/*

src/backend/replication/logical/worker.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1595,17 +1595,18 @@ ApplyWorkerMain(Datum main_arg)
15951595
pqsignal(SIGTERM, die);
15961596
BackgroundWorkerUnblockSignals();
15971597

1598+
/*
1599+
* We don't currently need any ResourceOwner in a walreceiver process, but
1600+
* if we did, we could call CreateAuxProcessResourceOwner here.
1601+
*/
1602+
15981603
/* Initialise stats to a sanish value */
15991604
MyLogicalRepWorker->last_send_time = MyLogicalRepWorker->last_recv_time =
16001605
MyLogicalRepWorker->reply_time = GetCurrentTimestamp();
16011606

16021607
/* Load the libpq-specific functions */
16031608
load_file("libpqwalreceiver", false);
16041609

1605-
Assert(CurrentResourceOwner == NULL);
1606-
CurrentResourceOwner = ResourceOwnerCreate(NULL,
1607-
"logical replication apply");
1608-
16091610
/* Run as replica session replication role. */
16101611
SetConfigOption("session_replication_role", "replica",
16111612
PGC_SUSET, PGC_S_OVERRIDE);

src/backend/replication/slotfuncs.c

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -365,9 +365,6 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
365365
logical_read_local_xlog_page,
366366
NULL, NULL, NULL);
367367

368-
CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner,
369-
"logical decoding");
370-
371368
/* invalidate non-timetravel entries */
372369
InvalidateSystemCaches();
373370

@@ -402,6 +399,11 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
402399
CHECK_FOR_INTERRUPTS();
403400
}
404401

402+
/*
403+
* Logical decoding could have clobbered CurrentResourceOwner during
404+
* transaction management, so restore the executor's value. (This is
405+
* a kluge, but it's not worth cleaning up right now.)
406+
*/
405407
CurrentResourceOwner = old_resowner;
406408

407409
if (ctx->reader->EndRecPtr != InvalidXLogRecPtr)

src/backend/replication/walreceiver.c

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -292,12 +292,6 @@ WalReceiverMain(void)
292292
if (WalReceiverFunctions == NULL)
293293
elog(ERROR, "libpqwalreceiver didn't initialize correctly");
294294

295-
/*
296-
* Create a resource owner to keep track of our resources (not clear that
297-
* we need this, but may as well have one).
298-
*/
299-
CurrentResourceOwner = ResourceOwnerCreate(NULL, "Wal Receiver");
300-
301295
/* Unblock signals (they were blocked when the postmaster forked us) */
302296
PG_SETMASK(&UnBlockSig);
303297

src/backend/replication/walsender.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,6 @@
9090
#include "utils/pg_lsn.h"
9191
#include "utils/portal.h"
9292
#include "utils/ps_status.h"
93-
#include "utils/resowner.h"
9493
#include "utils/timeout.h"
9594
#include "utils/timestamp.h"
9695

@@ -264,8 +263,10 @@ InitWalSender(void)
264263
/* Create a per-walsender data structure in shared memory */
265264
InitWalSenderSlot();
266265

267-
/* Set up resource owner */
268-
CurrentResourceOwner = ResourceOwnerCreate(NULL, "walsender top-level resource owner");
266+
/*
267+
* We don't currently need any ResourceOwner in a walsender process, but
268+
* if we did, we could call CreateAuxProcessResourceOwner here.
269+
*/
269270

270271
/*
271272
* Let postmaster know that we're a WAL sender. Once we've declared us as

src/backend/utils/init/postinit.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -632,8 +632,19 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
632632
* We are either a bootstrap process or a standalone backend. Either
633633
* way, start up the XLOG machinery, and register to have it closed
634634
* down at exit.
635+
*
636+
* We don't yet have an aux-process resource owner, but StartupXLOG
637+
* and ShutdownXLOG will need one. Hence, create said resource owner
638+
* (and register a callback to clean it up after ShutdownXLOG runs).
635639
*/
640+
CreateAuxProcessResourceOwner();
641+
636642
StartupXLOG();
643+
/* Release (and warn about) any buffer pins leaked in StartupXLOG */
644+
ReleaseAuxProcessResources(true);
645+
/* Reset CurrentResourceOwner to nothing for the moment */
646+
CurrentResourceOwner = NULL;
647+
637648
on_shmem_exit(ShutdownXLOG, 0);
638649
}
639650

0 commit comments

Comments
 (0)