Skip to content

Commit 26e0079

Browse files
committed
Fix race between DROP TABLESPACE and checkpointing.
Commands like ALTER TABLE SET TABLESPACE may leave files for the next checkpoint to clean up. If such files are not removed by the time DROP TABLESPACE is called, we request a checkpoint so that they are deleted. However, there is presently a window before checkpoint start where new unlink requests won't be scheduled until the following checkpoint. This means that the checkpoint forced by DROP TABLESPACE might not remove the files we expect it to remove, and the following ERROR will be emitted: ERROR: tablespace "mytblspc" is not empty To fix, add a call to AbsorbSyncRequests() just before advancing the unlink cycle counter. This ensures that any unlink requests forwarded prior to checkpoint start (i.e., when ckpt_started is incremented) will be processed by the current checkpoint. Since AbsorbSyncRequests() performs memory allocations, it cannot be called within a critical section, so we also need to move SyncPreCheckpoint() to before CreateCheckPoint()'s critical section. This is an old bug, so back-patch to all supported versions. Author: Nathan Bossart <nathandbossart@gmail.com> Reported-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13
1 parent dc5b3bd commit 26e0079

File tree

2 files changed

+21
-8
lines changed

2 files changed

+21
-8
lines changed

src/backend/access/transam/xlog.c

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9020,6 +9020,14 @@ CreateCheckPoint(int flags)
90209020
MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
90219021
CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
90229022

9023+
/*
9024+
* Let smgr prepare for checkpoint; this has to happen outside the
9025+
* critical section and before we determine the REDO pointer. Note that
9026+
* smgr must not do anything that'd have to be undone if we decide no
9027+
* checkpoint is needed.
9028+
*/
9029+
SyncPreCheckpoint();
9030+
90239031
/*
90249032
* Use a critical section to force system panic if we have trouble.
90259033
*/
@@ -9034,13 +9042,6 @@ CreateCheckPoint(int flags)
90349042
LWLockRelease(ControlFileLock);
90359043
}
90369044

9037-
/*
9038-
* Let smgr prepare for checkpoint; this has to happen before we determine
9039-
* the REDO pointer. Note that smgr must not do anything that'd have to
9040-
* be undone if we decide no checkpoint is needed.
9041-
*/
9042-
SyncPreCheckpoint();
9043-
90449045
/* Begin filling in the checkpoint WAL record */
90459046
MemSet(&checkPoint, 0, sizeof(checkPoint));
90469047
checkPoint.time = (pg_time_t) time(NULL);

src/backend/storage/sync/sync.c

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,14 +173,26 @@ InitSync(void)
173173
* counter is incremented here.
174174
*
175175
* This must be called *before* the checkpoint REDO point is determined.
176-
* That ensures that we won't delete files too soon.
176+
* That ensures that we won't delete files too soon. Since this calls
177+
* AbsorbSyncRequests(), which performs memory allocations, it cannot be
178+
* called within a critical section.
177179
*
178180
* Note that we can't do anything here that depends on the assumption
179181
* that the checkpoint will be completed.
180182
*/
181183
void
182184
SyncPreCheckpoint(void)
183185
{
186+
/*
187+
* Operations such as DROP TABLESPACE assume that the next checkpoint will
188+
* process all recently forwarded unlink requests, but if they aren't
189+
* absorbed prior to advancing the cycle counter, they won't be processed
190+
* until a future checkpoint. The following absorb ensures that any
191+
* unlink requests forwarded before the checkpoint began will be processed
192+
* in the current checkpoint.
193+
*/
194+
AbsorbSyncRequests();
195+
184196
/*
185197
* Any unlink requests arriving after this point will be assigned the next
186198
* cycle counter, and won't be unlinked until next checkpoint.

0 commit comments

Comments
 (0)