Skip to content

Commit bab9599

Browse files
committed
Fix race condition that lead to WALInsertLock deadlock with commit_delay.
If a call to WaitForXLogInsertionsToFinish() returned a value in the middle of a page, and another backend then started to insert a record to the same page, and then you called WaitXLogInsertionsToFinish() again, the second call might return a smaller value than the first call. The problem was in GetXLogBuffer(), which always updated the insertingAt value to the beginning of the requested page, not the actual requested location. Because of that, the second call might return a xlog pointer to the beginning of the page, while the first one returned a later position on the same page. XLogFlush() performs two calls to WaitXLogInsertionsToFinish() in succession, and holds WALWriteLock on the second call, which can deadlock if the second call to WaitXLogInsertionsToFinish() blocks. Reported by Spiros Ioannou. Backpatch to 9.4, where the more scalable WALInsertLock mechanism, and this bug, was introduced.
1 parent e39a3b2 commit bab9599

File tree

1 file changed

+24
-3
lines changed
  • src/backend/access/transam

1 file changed

+24
-3
lines changed

src/backend/access/transam/xlog.c

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1839,11 +1839,32 @@ GetXLogBuffer(XLogRecPtr ptr)
18391839
endptr = XLogCtl->xlblocks[idx];
18401840
if (expectedEndPtr != endptr)
18411841
{
1842+
XLogRecPtr initializedUpto;
1843+
18421844
/*
1843-
* Let others know that we're finished inserting the record up to the
1844-
* page boundary.
1845+
* Before calling AdvanceXLInsertBuffer(), which can block, let others
1846+
* know how far we're finished with inserting the record.
1847+
*
1848+
* NB: If 'ptr' points to just after the page header, advertise a
1849+
* position at the beginning of the page rather than 'ptr' itself. If
1850+
* there are no other insertions running, someone might try to flush
1851+
* up to our advertised location. If we advertised a position after
1852+
* the page header, someone might try to flush the page header, even
1853+
* though page might actually not be initialized yet. As the first
1854+
* inserter on the page, we are effectively responsible for making
1855+
* sure that it's initialized, before we let insertingAt to move past
1856+
* the page header.
18451857
*/
1846-
WALInsertLockUpdateInsertingAt(expectedEndPtr - XLOG_BLCKSZ);
1858+
if (ptr % XLOG_BLCKSZ == SizeOfXLogShortPHD &&
1859+
ptr % XLOG_SEG_SIZE > XLOG_BLCKSZ)
1860+
initializedUpto = ptr - SizeOfXLogShortPHD;
1861+
else if (ptr % XLOG_BLCKSZ == SizeOfXLogLongPHD &&
1862+
ptr % XLOG_SEG_SIZE < XLOG_BLCKSZ)
1863+
initializedUpto = ptr - SizeOfXLogLongPHD;
1864+
else
1865+
initializedUpto = ptr;
1866+
1867+
WALInsertLockUpdateInsertingAt(initializedUpto);
18471868

18481869
AdvanceXLInsertBuffer(ptr, false);
18491870
endptr = XLogCtl->xlblocks[idx];

0 commit comments

Comments
 (0)