Skip to content

Commit fd6a3f3

Browse files
committed
Reconsider when to wait for WAL flushes/syncrep during commit.
Up to now RecordTransactionCommit() waited for WAL to be flushed (if synchronous_commit != off) and to be synchronously replicated (if enabled), even if a transaction did not have a xid assigned. The primary reason for that is that sequence's nextval() did not assign a xid, but are worthwhile to wait for on commit. This can be problematic because sometimes read only transactions do write WAL, e.g. HOT page prune records. That then could lead to read only transactions having to wait during commit. Not something people expect in a read only transaction. This lead to such strange symptoms as backends being seemingly stuck during connection establishment when all synchronous replicas are down. Especially annoying when said stuck connection is the standby trying to reconnect to allow syncrep again... This behavior also is involved in a rather complicated <= 9.4 bug where the transaction started by catchup interrupt processing waited for syncrep using latches, but didn't get the wakeup because it was already running inside the same overloaded signal handler. Fix the issue here doesn't properly solve that issue, merely papers over the problems. In 9.5 catchup interrupts aren't processed out of signal handlers anymore. To fix all this, make nextval() acquire a top level xid, and only wait for transaction commit if a transaction both acquired a xid and emitted WAL records. If only a xid has been assigned we don't uselessly want to wait just because of writes to temporary/unlogged tables; if only WAL has been written we don't want to wait just because of HOT prunes. The xid assignment in nextval() is unlikely to cause overhead in real-world workloads. For one it only happens SEQ_LOG_VALS/32 values anyway, for another only usage of nextval() without using the result in an insert or similar is affected. Discussion: 20150223165359.GF30784@awork2.anarazel.de, 369698E947874884A77849D8FE3680C2@maumau, 5CF4ABBA67674088B3941894E22A0D25@maumau Per complaint from maumau and Thom Brown Backpatch all the way back; 9.0 doesn't have syncrep, but it seems better to be consistent behavior across all maintained branches.
1 parent a7920b8 commit fd6a3f3

File tree

2 files changed

+40
-12
lines changed

2 files changed

+40
-12
lines changed

src/backend/access/transam/xact.c

+17-12
Original file line numberDiff line numberDiff line change
@@ -1031,10 +1031,9 @@ RecordTransactionCommit(void)
10311031

10321032
/*
10331033
* If we didn't create XLOG entries, we're done here; otherwise we
1034-
* should flush those entries the same as a commit record. (An
1035-
* example of a possible record that wouldn't cause an XID to be
1036-
* assigned is a sequence advance record due to nextval() --- we want
1037-
* to flush that to disk before reporting commit.)
1034+
* should trigger flushing those entries the same as a commit record
1035+
* would. This will primarily happen for HOT pruning and the like; we
1036+
* want these to be flushed to disk in due time.
10381037
*/
10391038
if (!wrote_xlog)
10401039
goto cleanup;
@@ -1153,11 +1152,13 @@ RecordTransactionCommit(void)
11531152
/*
11541153
* Check if we want to commit asynchronously. We can allow the XLOG flush
11551154
* to happen asynchronously if synchronous_commit=off, or if the current
1156-
* transaction has not performed any WAL-logged operation. The latter
1157-
* case can arise if the current transaction wrote only to temporary
1158-
* and/or unlogged tables. In case of a crash, the loss of such a
1159-
* transaction will be irrelevant since temp tables will be lost anyway,
1160-
* and unlogged tables will be truncated. (Given the foregoing, you might
1155+
* transaction has not performed any WAL-logged operation or didn't assign
1156+
* a xid. The transaction can end up not writing any WAL, even if it has
1157+
* a xid, if it only wrote to temporary and/or unlogged tables. It can
1158+
* end up having written WAL without an xid if it did HOT pruning. In
1159+
* case of a crash, the loss of such a transaction will be irrelevant;
1160+
* temp tables will be lost anyway, unlogged tables will be truncated and
1161+
* HOT pruning will be done again later. (Given the foregoing, you might
11611162
* think that it would be unnecessary to emit the XLOG record at all in
11621163
* this case, but we don't currently try to do that. It would certainly
11631164
* cause problems at least in Hot Standby mode, where the
@@ -1173,7 +1174,8 @@ RecordTransactionCommit(void)
11731174
* if all to-be-deleted tables are temporary though, since they are lost
11741175
* anyway if we crash.)
11751176
*/
1176-
if ((wrote_xlog && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
1177+
if ((wrote_xlog && markXidCommitted &&
1178+
synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
11771179
forceSyncCommit || nrels > 0)
11781180
{
11791181
XLogFlush(XactLastRecEnd);
@@ -1222,12 +1224,15 @@ RecordTransactionCommit(void)
12221224
latestXid = TransactionIdLatest(xid, nchildren, children);
12231225

12241226
/*
1225-
* Wait for synchronous replication, if required.
1227+
* Wait for synchronous replication, if required. Similar to the decision
1228+
* above about using committing asynchronously we only want to wait if
1229+
* this backend assigned a xid and wrote WAL. No need to wait if a xid
1230+
* was assigned due to temporary/unlogged tables or due to HOT pruning.
12261231
*
12271232
* Note that at this stage we have marked clog, but still show as running
12281233
* in the procarray and continue to hold locks.
12291234
*/
1230-
if (wrote_xlog)
1235+
if (wrote_xlog && markXidCommitted)
12311236
SyncRepWaitForLSN(XactLastRecEnd);
12321237

12331238
/* Reset XactLastRecEnd until the next transaction writes something */

src/backend/commands/sequence.c

+23
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include "access/htup_details.h"
1818
#include "access/multixact.h"
1919
#include "access/transam.h"
20+
#include "access/xact.h"
2021
#include "access/xlog.h"
2122
#include "access/xloginsert.h"
2223
#include "access/xlogutils.h"
@@ -358,6 +359,10 @@ fill_seq_with_data(Relation rel, HeapTuple tuple)
358359
tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
359360
ItemPointerSet(&tuple->t_data->t_ctid, 0, FirstOffsetNumber);
360361

362+
/* check the comment above nextval_internal()'s equivalent call. */
363+
if (RelationNeedsWAL(rel))
364+
GetTopTransactionId();
365+
361366
START_CRIT_SECTION();
362367

363368
MarkBufferDirty(buf);
@@ -438,6 +443,10 @@ AlterSequence(AlterSeqStmt *stmt)
438443
/* Note that we do not change the currval() state */
439444
elm->cached = elm->last;
440445

446+
/* check the comment above nextval_internal()'s equivalent call. */
447+
if (RelationNeedsWAL(seqrel))
448+
GetTopTransactionId();
449+
441450
/* Now okay to update the on-disk tuple */
442451
START_CRIT_SECTION();
443452

@@ -679,6 +688,16 @@ nextval_internal(Oid relid)
679688

680689
last_used_seq = elm;
681690

691+
/*
692+
* If something needs to be WAL logged, acquire an xid, so this
693+
* transaction's commit will trigger a WAL flush and wait for
694+
* syncrep. It's sufficient to ensure the toplevel transaction has a xid,
695+
* no need to assign xids subxacts, that'll already trigger a appropriate
696+
* wait. (Have to do that here, so we're outside the critical section)
697+
*/
698+
if (logit && RelationNeedsWAL(seqrel))
699+
GetTopTransactionId();
700+
682701
/* ready to change the on-disk (or really, in-buffer) tuple */
683702
START_CRIT_SECTION();
684703

@@ -867,6 +886,10 @@ do_setval(Oid relid, int64 next, bool iscalled)
867886
/* In any case, forget any future cached numbers */
868887
elm->cached = elm->last;
869888

889+
/* check the comment above nextval_internal()'s equivalent call. */
890+
if (RelationNeedsWAL(seqrel))
891+
GetTopTransactionId();
892+
870893
/* ready to change the on-disk (or really, in-buffer) tuple */
871894
START_CRIT_SECTION();
872895

0 commit comments

Comments
 (0)