Skip to content

Commit fb28104

Browse files
committed
Prevent hard failures of standbys caused by recycled WAL segments
When a standby's WAL receiver stops reading WAL from a WAL stream, it writes data to the current WAL segment without having priorily zero'ed the page currently written to, which can cause the WAL reader to read junk data from a past recycled segment and then it would try to get a record from it. While sanity checks in place provide most of the protection needed, in some rare circumstances, with chances increasing when a record header crosses a page boundary, then the startup process could fail violently on an allocation failure, as follows: FATAL: invalid memory alloc request size XXX This is confusing for the user and also unhelpful as this requires in the worst case a manual restart of the instance, impacting potentially the availability of the cluster, and this also makes WAL data look like it is in a corrupted state. The chances of seeing failures are higher if the connection between the standby and its root node is unstable, causing WAL pages to be written in the middle. A couple of approaches have been discussed, like zero-ing new WAL pages within the WAL receiver itself but this has the disadvantage of impacting performance of any existing instances as this breaks the sequential writes done by the WAL receiver. This commit deals with the problem with a more simple approach, which has no performance impact without reducing the detection of the problem: if a record is found with a length higher than 1GB for backends, then do not try any allocation and report a soft failure which will force the standby to retry reading WAL. It could be possible that the allocation call passes and that an unnecessary amount of memory is allocated, however follow-up checks on records would just fail, making this allocation short-lived anyway. This patch owes a great deal to Tsunakawa Takayuki for reporting the failure first, and then discussing a couple of potential approaches to the problem. Backpatch down to 9.5, which is where palloc_extended has been introduced. Reported-by: Tsunakawa Takayuki Reviewed-by: Tsunakawa Takayuki Author: Michael Paquier Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05
1 parent 416e3e3 commit fb28104

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

src/backend/access/transam/xlogreader.c

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@
2525
#include "common/pg_lzcompress.h"
2626
#include "replication/origin.h"
2727

28+
#ifndef FRONTEND
29+
#include "utils/memutils.h"
30+
#endif
31+
2832
static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
2933

3034
static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
@@ -158,6 +162,25 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength)
158162
newSize += XLOG_BLCKSZ - (newSize % XLOG_BLCKSZ);
159163
newSize = Max(newSize, 5 * Max(BLCKSZ, XLOG_BLCKSZ));
160164

165+
#ifndef FRONTEND
166+
167+
/*
168+
* Note that in much unlucky circumstances, the random data read from a
169+
* recycled segment can cause this routine to be called with a size
170+
* causing a hard failure at allocation. For a standby, this would cause
171+
* the instance to stop suddenly with a hard failure, preventing it to
172+
* retry fetching WAL from one of its sources which could allow it to move
173+
* on with replay without a manual restart. If the data comes from a past
174+
* recycled segment and is still valid, then the allocation may succeed
175+
* but record checks are going to fail so this would be short-lived. If
176+
* the allocation fails because of a memory shortage, then this is not a
177+
* hard failure either per the guarantee given by MCXT_ALLOC_NO_OOM.
178+
*/
179+
if (!AllocSizeIsValid(newSize))
180+
return false;
181+
182+
#endif
183+
161184
if (state->readRecordBuf)
162185
pfree(state->readRecordBuf);
163186
state->readRecordBuf =

0 commit comments

Comments
 (0)