Skip to content

Commit f69c959

Browse files
committed
Do not decode TOAST data for table rewrites
During table rewrites (VACUUM FULL and CLUSTER), the main heap is logged using XLOG / FPI records, and thus (correctly) ignored in decoding. But the associated TOAST table is WAL-logged as plain INSERT records, and so was logically decoded and passed to reorder buffer. That has severe consequences with TOAST tables of non-trivial size. Firstly, reorder buffer has to keep all those changes, possibly spilling them to a file, incurring I/O costs and disk space. Secondly, ReoderBufferCommit() was stashing all those TOAST chunks into a hash table, which got discarded only after processing the row from the main heap. But as the main heap is not decoded for rewrites, this never happened, so all the TOAST data accumulated in memory, resulting either in excessive memory consumption or OOM. The fix is simple, as commit e9edc1b already introduced infrastructure (namely HEAP_INSERT_NO_LOGICAL flag) to skip logical decoding of TOAST tables, but it only applied it to system tables. So simply use it for all TOAST data in raw_heap_insert(). That would however solve only the memory consumption issue - the TOAST changes would still be decoded and added to the reorder buffer, and spilled to disk (although without TOAST tuple data, so much smaller). But we can solve that by tweaking DecodeInsert() to just ignore such INSERT records altogether, using XLH_INSERT_CONTAINS_NEW_TUPLE flag, instead of skipping them later in ReorderBufferCommit(). Review: Masahiko Sawada Discussion: https://www.postgresql.org/message-id/flat/1a17c643-e9af-3dba-486b-fbe31bc1823a%402ndquadrant.com Backpatch: 9.4-, where logical decoding was introduced
1 parent d1ce4ed commit f69c959

File tree

3 files changed

+24
-24
lines changed

3 files changed

+24
-24
lines changed

src/backend/access/heap/rewriteheap.c

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -659,12 +659,11 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
659659
options |= HEAP_INSERT_SKIP_WAL;
660660

661661
/*
662-
* The new relfilenode's relcache entrye doesn't have the necessary
663-
* information to determine whether a relation should emit data for
664-
* logical decoding. Force it to off if necessary.
662+
* While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
663+
* for the TOAST table are not logically decoded. The main heap is
664+
* WAL-logged as XLOG FPI records, which are not logically decoded.
665665
*/
666-
if (!RelationIsLogicallyLogged(state->rs_old_rel))
667-
options |= HEAP_INSERT_NO_LOGICAL;
666+
options |= HEAP_INSERT_NO_LOGICAL;
668667

669668
heaptup = toast_insert_or_update(state->rs_new_rel, tup, NULL,
670669
options);

src/backend/replication/logical/decode.c

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -665,13 +665,23 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
665665
static void
666666
DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
667667
{
668+
Size datalen;
669+
char *tupledata;
670+
Size tuplelen;
668671
XLogReaderState *r = buf->record;
669672
xl_heap_insert *xlrec;
670673
ReorderBufferChange *change;
671674
RelFileNode target_node;
672675

673676
xlrec = (xl_heap_insert *) XLogRecGetData(r);
674677

678+
/*
679+
* Ignore insert records without new tuples (this does happen when
680+
* raw_heap_insert marks the TOAST record as HEAP_INSERT_NO_LOGICAL).
681+
*/
682+
if (!(xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE))
683+
return;
684+
675685
/* only interested in our database */
676686
XLogRecGetBlockTag(r, 0, &target_node, NULL, NULL);
677687
if (target_node.dbNode != ctx->slot->data.database)
@@ -690,17 +700,13 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
690700

691701
memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
692702

693-
if (xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE)
694-
{
695-
Size datalen;
696-
char *tupledata = XLogRecGetBlockData(r, 0, &datalen);
697-
Size tuplelen = datalen - SizeOfHeapHeader;
703+
tupledata = XLogRecGetBlockData(r, 0, &datalen);
704+
tuplelen = datalen - SizeOfHeapHeader;
698705

699-
change->data.tp.newtuple =
700-
ReorderBufferGetTupleBuf(ctx->reorder, tuplelen);
706+
change->data.tp.newtuple =
707+
ReorderBufferGetTupleBuf(ctx->reorder, tuplelen);
701708

702-
DecodeXLogTuple(tupledata, datalen, change->data.tp.newtuple);
703-
}
709+
DecodeXLogTuple(tupledata, datalen, change->data.tp.newtuple);
704710

705711
change->data.tp.clear_toast_afterwards = true;
706712

src/backend/replication/logical/reorderbuffer.c

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1598,17 +1598,12 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
15981598
* transaction's changes. Otherwise it will get
15991599
* freed/reused while restoring spooled data from
16001600
* disk.
1601-
*
1602-
* But skip doing so if there's no tuple-data. That
1603-
* happens if a non-mapped system catalog with a toast
1604-
* table is rewritten.
16051601
*/
1606-
if (change->data.tp.newtuple != NULL)
1607-
{
1608-
dlist_delete(&change->node);
1609-
ReorderBufferToastAppendChunk(rb, txn, relation,
1610-
change);
1611-
}
1602+
Assert(change->data.tp.newtuple != NULL);
1603+
1604+
dlist_delete(&change->node);
1605+
ReorderBufferToastAppendChunk(rb, txn, relation,
1606+
change);
16121607
}
16131608

16141609
change_done:

0 commit comments

Comments
 (0)