Skip to content

Commit 9695835

Browse files
committed
Fix data loss when restarting the bulk_write facility
If a user started a bulk write operation on a fork with existing data to append data in bulk, the bulk_write machinery would zero out all previously written pages up to the last page written by the new bulk_write operation. This is not an issue for PostgreSQL itself, because we never use the bulk_write facility on a non-empty fork. But there are use cases where it makes sense. TimescaleDB extension is known to do that to merge partitions, for example. Backpatch to v17, where the bulk_write machinery was introduced. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reported-By: Erik Nordström <erik@timescale.com> Reviewed-by: Erik Nordström <erik@timescale.com> Discussion: https://www.postgresql.org/message-id/CACAa4VJ%2BQY4pY7M0ECq29uGkrOygikYtao1UG9yCDFosxaps9g@mail.gmail.com
1 parent e6d6f2e commit 9695835

File tree

1 file changed

+11
-8
lines changed

1 file changed

+11
-8
lines changed

src/backend/storage/smgr/bulk_write.c

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,10 @@
44
* Efficiently and reliably populate a new relation
55
*
66
* The assumption is that no other backends access the relation while we are
7-
* loading it, so we can take some shortcuts. Do not mix operations through
8-
* the regular buffer manager and the bulk loading interface!
7+
* loading it, so we can take some shortcuts. Pages already present in the
8+
* indicated fork when the bulk write operation is started are not modified
9+
* unless explicitly written to. Do not mix operations through the regular
10+
* buffer manager and the bulk loading interface!
911
*
1012
* We bypass the buffer manager to avoid the locking overhead, and call
1113
* smgrextend() directly. A downside is that the pages will need to be
@@ -69,7 +71,7 @@ struct BulkWriteState
6971
PendingWrite pending_writes[MAX_PENDING_WRITES];
7072

7173
/* Current size of the relation */
72-
BlockNumber pages_written;
74+
BlockNumber relsize;
7375

7476
/* The RedoRecPtr at the time that the bulk operation started */
7577
XLogRecPtr start_RedoRecPtr;
@@ -106,7 +108,7 @@ smgr_bulk_start_smgr(SMgrRelation smgr, ForkNumber forknum, bool use_wal)
106108
state->use_wal = use_wal;
107109

108110
state->npending = 0;
109-
state->pages_written = 0;
111+
state->relsize = smgrnblocks(smgr, forknum);
110112

111113
state->start_RedoRecPtr = GetRedoRecPtr();
112114

@@ -280,7 +282,7 @@ smgr_bulk_flush(BulkWriteState *bulkstate)
280282

281283
PageSetChecksumInplace(page, blkno);
282284

283-
if (blkno >= bulkstate->pages_written)
285+
if (blkno >= bulkstate->relsize)
284286
{
285287
/*
286288
* If we have to write pages nonsequentially, fill in the space
@@ -289,17 +291,18 @@ smgr_bulk_flush(BulkWriteState *bulkstate)
289291
* space will read as zeroes anyway), but it should help to avoid
290292
* fragmentation. The dummy pages aren't WAL-logged though.
291293
*/
292-
while (blkno > bulkstate->pages_written)
294+
while (blkno > bulkstate->relsize)
293295
{
294296
/* don't set checksum for all-zero page */
295297
smgrextend(bulkstate->smgr, bulkstate->forknum,
296-
bulkstate->pages_written++,
298+
bulkstate->relsize,
297299
&zero_buffer,
298300
true);
301+
bulkstate->relsize++;
299302
}
300303

301304
smgrextend(bulkstate->smgr, bulkstate->forknum, blkno, page, true);
302-
bulkstate->pages_written = pending_writes[i].blkno + 1;
305+
bulkstate->relsize++;
303306
}
304307
else
305308
smgrwrite(bulkstate->smgr, bulkstate->forknum, blkno, page, true);

0 commit comments

Comments
 (0)