Skip to content

Commit b82573d

Browse files
committed
Fix WAL-logging of FSM and VM truncation.
When a relation is truncated, it is important that the FSM is truncated as well. Otherwise, after recovery, the FSM can return a page that has been truncated away, leading to errors like: ERROR: could not read block 28991 in file "base/16390/572026": read only 0 of 8192 bytes We were using MarkBufferDirtyHint() to dirty the buffer holding the last remaining page of the FSM, but during recovery, that might in fact not dirty the page, and the FSM update might be lost. To fix, use the stronger MarkBufferDirty() function. MarkBufferDirty() requires us to do WAL-logging ourselves, to protect from a torn page, if checksumming is enabled. Also fix an oversight in visibilitymap_truncate: it also needs to WAL-log when checksumming is enabled. Analysis by Pavan Deolasee. Discussion: <CABOikdNr5vKucqyZH9s1Mh0XebLs_jRhKv6eJfNnD2wxTn=_9A@mail.gmail.com> Backpatch to 9.3, where we got data checksums.
1 parent 219d476 commit b82573d

File tree

2 files changed

+36
-1
lines changed

2 files changed

+36
-1
lines changed

src/backend/access/heap/visibilitymap.c

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -474,6 +474,9 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
474474

475475
LockBuffer(mapBuffer, BUFFER_LOCK_EXCLUSIVE);
476476

477+
/* NO EREPORT(ERROR) from here till changes are logged */
478+
START_CRIT_SECTION();
479+
477480
/* Clear out the unwanted bytes. */
478481
MemSet(&map[truncByte + 1], 0, MAPSIZE - (truncByte + 1));
479482

@@ -489,7 +492,20 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
489492
*/
490493
map[truncByte] &= (1 << truncBit) - 1;
491494

495+
/*
496+
* Truncation of a relation is WAL-logged at a higher-level, and we
497+
* will be called at WAL replay. But if checksums are enabled, we need
498+
* to still write a WAL record to protect against a torn page, if the
499+
* page is flushed to disk before the truncation WAL record. We cannot
500+
* use MarkBufferDirtyHint here, because that will not dirty the page
501+
* during recovery.
502+
*/
492503
MarkBufferDirty(mapBuffer);
504+
if (!InRecovery && RelationNeedsWAL(rel) && XLogHintBitIsNeeded())
505+
log_newpage_buffer(mapBuffer, false);
506+
507+
END_CRIT_SECTION();
508+
493509
UnlockReleaseBuffer(mapBuffer);
494510
}
495511
else

src/backend/storage/freespace/freespace.c

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
#include "postgres.h"
2525

2626
#include "access/htup_details.h"
27+
#include "access/xlog.h"
2728
#include "access/xlogutils.h"
2829
#include "miscadmin.h"
2930
#include "storage/freespace.h"
@@ -285,8 +286,26 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
285286
if (!BufferIsValid(buf))
286287
return; /* nothing to do; the FSM was already smaller */
287288
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
289+
290+
/* NO EREPORT(ERROR) from here till changes are logged */
291+
START_CRIT_SECTION();
292+
288293
fsm_truncate_avail(BufferGetPage(buf), first_removed_slot);
289-
MarkBufferDirtyHint(buf, false);
294+
295+
/*
296+
* Truncation of a relation is WAL-logged at a higher-level, and we
297+
* will be called at WAL replay. But if checksums are enabled, we need
298+
* to still write a WAL record to protect against a torn page, if the
299+
* page is flushed to disk before the truncation WAL record. We cannot
300+
* use MarkBufferDirtyHint here, because that will not dirty the page
301+
* during recovery.
302+
*/
303+
MarkBufferDirty(buf);
304+
if (!InRecovery && RelationNeedsWAL(rel) && XLogHintBitIsNeeded())
305+
log_newpage_buffer(buf, false);
306+
307+
END_CRIT_SECTION();
308+
290309
UnlockReleaseBuffer(buf);
291310

292311
new_nfsmblocks = fsm_logical_to_physical(first_removed_address) + 1;

0 commit comments

Comments
 (0)