Skip to content

Commit 57aa5b2

Browse files
committed
Add GUC to enable compression of full page images stored in WAL.
When newly-added GUC parameter, wal_compression, is on, the PostgreSQL server compresses a full page image written to WAL when full_page_writes is on or during a base backup. A compressed page image will be decompressed during WAL replay. Turning this parameter on can reduce the WAL volume without increasing the risk of unrecoverable data corruption, but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay. This commit changes the WAL format (so bumping WAL version number) so that the one-byte flag indicating whether a full page image is compressed or not is included in its header information. This means that the commit increases the WAL volume one-byte per a full page image even if WAL compression is not used at all. We can save that one-byte by borrowing one-bit from the existing field like hole_offset in the header and using it as the flag, for example. But which would reduce the code readability and the extensibility of the feature. Per discussion, it's not worth paying those prices to save only one-byte, so we decided to add the one-byte flag to the header. This commit doesn't introduce any new compression algorithm like lz4. Currently a full page image is compressed using the existing PGLZ algorithm. Per discussion, we decided to use it at least in the first version of the feature because there were no performance reports showing that its compression ratio is unacceptably lower than that of other algorithm. Of course, in the future, it's worth considering the support of other compression algorithm for the better compression. Rahila Syed and Michael Paquier, reviewed in various versions by myself, Andres Freund, Robert Haas, Abhijit Menon-Sen and many others.
1 parent 2fbb286 commit 57aa5b2

File tree

11 files changed

+320
-39
lines changed

11 files changed

+320
-39
lines changed

contrib/pg_xlogdump/pg_xlogdump.c

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -359,18 +359,17 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
359359
rec_len = XLogRecGetDataLen(record) + SizeOfXLogRecord;
360360

361361
/*
362-
* Calculate the amount of FPI data in the record. Each backup block
363-
* takes up BLCKSZ bytes, minus the "hole" length.
362+
* Calculate the amount of FPI data in the record.
364363
*
365364
* XXX: We peek into xlogreader's private decoded backup blocks for the
366-
* hole_length. It doesn't seem worth it to add an accessor macro for
367-
* this.
365+
* bimg_len indicating the length of FPI data. It doesn't seem worth it to
366+
* add an accessor macro for this.
368367
*/
369368
fpi_len = 0;
370369
for (block_id = 0; block_id <= record->max_block_id; block_id++)
371370
{
372371
if (XLogRecHasBlockImage(record, block_id))
373-
fpi_len += BLCKSZ - record->blocks[block_id].hole_length;
372+
fpi_len += record->blocks[block_id].bimg_len;
374373
}
375374

376375
/* Update per-rmgr statistics */
@@ -465,9 +464,22 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
465464
blk);
466465
if (XLogRecHasBlockImage(record, block_id))
467466
{
468-
printf(" (FPW); hole: offset: %u, length: %u\n",
469-
record->blocks[block_id].hole_offset,
470-
record->blocks[block_id].hole_length);
467+
if (record->blocks[block_id].bimg_info &
468+
BKPIMAGE_IS_COMPRESSED)
469+
{
470+
printf(" (FPW); hole: offset: %u, length: %u, compression saved: %u\n",
471+
record->blocks[block_id].hole_offset,
472+
record->blocks[block_id].hole_length,
473+
BLCKSZ -
474+
record->blocks[block_id].hole_length -
475+
record->blocks[block_id].bimg_len);
476+
}
477+
else
478+
{
479+
printf(" (FPW); hole: offset: %u, length: %u\n",
480+
record->blocks[block_id].hole_offset,
481+
record->blocks[block_id].hole_length);
482+
}
471483
}
472484
putchar('\n');
473485
}

doc/src/sgml/config.sgml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2282,6 +2282,30 @@ include_dir 'conf.d'
22822282
</listitem>
22832283
</varlistentry>
22842284

2285+
<varlistentry id="guc-wal-compression" xreflabel="wal_compression">
2286+
<term><varname>wal_compression</varname> (<type>boolean</type>)
2287+
<indexterm>
2288+
<primary><varname>wal_compression</> configuration parameter</primary>
2289+
</indexterm>
2290+
</term>
2291+
<listitem>
2292+
<para>
2293+
When this parameter is <literal>on</>, the <productname>PostgreSQL</>
2294+
server compresses a full page image written to WAL when
2295+
<xref linkend="guc-full-page-writes"> is on or during a base backup.
2296+
A compressed page image will be decompressed during WAL replay.
2297+
The default value is <literal>off</>.
2298+
</para>
2299+
2300+
<para>
2301+
Turning this parameter on can reduce the WAL volume without
2302+
increasing the risk of unrecoverable data corruption,
2303+
but at the cost of some extra CPU spent on the compression during
2304+
WAL logging and on the decompression during WAL replay.
2305+
</para>
2306+
</listitem>
2307+
</varlistentry>
2308+
22852309
<varlistentry id="guc-wal-buffers" xreflabel="wal_buffers">
22862310
<term><varname>wal_buffers</varname> (<type>integer</type>)
22872311
<indexterm>

src/backend/access/transam/xlog.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ char *XLogArchiveCommand = NULL;
8989
bool EnableHotStandby = false;
9090
bool fullPageWrites = true;
9191
bool wal_log_hints = false;
92+
bool wal_compression = false;
9293
bool log_checkpoints = false;
9394
int sync_method = DEFAULT_SYNC_METHOD;
9495
int wal_level = WAL_LEVEL_MINIMAL;

src/backend/access/transam/xloginsert.c

Lines changed: 122 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,16 @@
2424
#include "access/xlog_internal.h"
2525
#include "access/xloginsert.h"
2626
#include "catalog/pg_control.h"
27+
#include "common/pg_lzcompress.h"
2728
#include "miscadmin.h"
2829
#include "storage/bufmgr.h"
2930
#include "storage/proc.h"
3031
#include "utils/memutils.h"
3132
#include "pg_trace.h"
3233

34+
/* Buffer size required to store a compressed version of backup block image */
35+
#define PGLZ_MAX_BLCKSZ PGLZ_MAX_OUTPUT(BLCKSZ)
36+
3337
/*
3438
* For each block reference registered with XLogRegisterBuffer, we fill in
3539
* a registered_buffer struct.
@@ -50,6 +54,9 @@ typedef struct
5054

5155
XLogRecData bkp_rdatas[2]; /* temporary rdatas used to hold references to
5256
* backup block data in XLogRecordAssemble() */
57+
58+
/* buffer to store a compressed version of backup block image */
59+
char compressed_page[PGLZ_MAX_BLCKSZ];
5360
} registered_buffer;
5461

5562
static registered_buffer *registered_buffers;
@@ -96,6 +103,8 @@ static MemoryContext xloginsert_cxt;
96103
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
97104
XLogRecPtr RedoRecPtr, bool doPageWrites,
98105
XLogRecPtr *fpw_lsn);
106+
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
107+
uint16 hole_length, char *dest, uint16 *dlen);
99108

100109
/*
101110
* Begin constructing a WAL record. This must be called before the
@@ -482,7 +491,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
482491
bool needs_data;
483492
XLogRecordBlockHeader bkpb;
484493
XLogRecordBlockImageHeader bimg;
494+
XLogRecordBlockCompressHeader cbimg;
485495
bool samerel;
496+
bool is_compressed = false;
497+
uint16 hole_length;
498+
uint16 hole_offset;
486499

487500
if (!regbuf->in_use)
488501
continue;
@@ -529,9 +542,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
529542
if (needs_backup)
530543
{
531544
Page page = regbuf->page;
545+
uint16 compressed_len;
532546

533547
/*
534-
* The page needs to be backed up, so set up *bimg
548+
* The page needs to be backed up, so calculate its hole length
549+
* and offset.
535550
*/
536551
if (regbuf->flags & REGBUF_STANDARD)
537552
{
@@ -543,50 +558,81 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
543558
upper > lower &&
544559
upper <= BLCKSZ)
545560
{
546-
bimg.hole_offset = lower;
547-
bimg.hole_length = upper - lower;
561+
hole_offset = lower;
562+
hole_length = upper - lower;
548563
}
549564
else
550565
{
551566
/* No "hole" to compress out */
552-
bimg.hole_offset = 0;
553-
bimg.hole_length = 0;
567+
hole_offset = 0;
568+
hole_length = 0;
554569
}
555570
}
556571
else
557572
{
558573
/* Not a standard page header, don't try to eliminate "hole" */
559-
bimg.hole_offset = 0;
560-
bimg.hole_length = 0;
574+
hole_offset = 0;
575+
hole_length = 0;
576+
}
577+
578+
/*
579+
* Try to compress a block image if wal_compression is enabled
580+
*/
581+
if (wal_compression)
582+
{
583+
is_compressed =
584+
XLogCompressBackupBlock(page, hole_offset, hole_length,
585+
regbuf->compressed_page,
586+
&compressed_len);
561587
}
562588

563589
/* Fill in the remaining fields in the XLogRecordBlockHeader struct */
564590
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
565591

566-
total_len += BLCKSZ - bimg.hole_length;
567-
568592
/*
569593
* Construct XLogRecData entries for the page content.
570594
*/
571595
rdt_datas_last->next = &regbuf->bkp_rdatas[0];
572596
rdt_datas_last = rdt_datas_last->next;
573-
if (bimg.hole_length == 0)
597+
598+
bimg.bimg_info = (hole_length == 0) ? 0 : BKPIMAGE_HAS_HOLE;
599+
600+
if (is_compressed)
574601
{
575-
rdt_datas_last->data = page;
576-
rdt_datas_last->len = BLCKSZ;
602+
bimg.length = compressed_len;
603+
bimg.hole_offset = hole_offset;
604+
bimg.bimg_info |= BKPIMAGE_IS_COMPRESSED;
605+
if (hole_length != 0)
606+
cbimg.hole_length = hole_length;
607+
608+
rdt_datas_last->data = regbuf->compressed_page;
609+
rdt_datas_last->len = compressed_len;
577610
}
578611
else
579612
{
580-
/* must skip the hole */
581-
rdt_datas_last->data = page;
582-
rdt_datas_last->len = bimg.hole_offset;
613+
bimg.length = BLCKSZ - hole_length;
614+
bimg.hole_offset = hole_offset;
583615

584-
rdt_datas_last->next = &regbuf->bkp_rdatas[1];
585-
rdt_datas_last = rdt_datas_last->next;
616+
if (hole_length == 0)
617+
{
618+
rdt_datas_last->data = page;
619+
rdt_datas_last->len = BLCKSZ;
620+
}
621+
else
622+
{
623+
/* must skip the hole */
624+
rdt_datas_last->data = page;
625+
rdt_datas_last->len = hole_offset;
586626

587-
rdt_datas_last->data = page + (bimg.hole_offset + bimg.hole_length);
588-
rdt_datas_last->len = BLCKSZ - (bimg.hole_offset + bimg.hole_length);
627+
rdt_datas_last->next = &regbuf->bkp_rdatas[1];
628+
rdt_datas_last = rdt_datas_last->next;
629+
630+
rdt_datas_last->data = page + (hole_offset + hole_length);
631+
rdt_datas_last->len = BLCKSZ - (hole_offset + hole_length);
632+
}
589633
}
634+
635+
total_len += bimg.length;
590636
}
591637

592638
if (needs_data)
@@ -619,6 +665,12 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
619665
{
620666
memcpy(scratch, &bimg, SizeOfXLogRecordBlockImageHeader);
621667
scratch += SizeOfXLogRecordBlockImageHeader;
668+
if (hole_length != 0 && is_compressed)
669+
{
670+
memcpy(scratch, &cbimg,
671+
SizeOfXLogRecordBlockCompressHeader);
672+
scratch += SizeOfXLogRecordBlockCompressHeader;
673+
}
622674
}
623675
if (!samerel)
624676
{
@@ -680,6 +732,57 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
680732
return &hdr_rdt;
681733
}
682734

735+
/*
736+
* Create a compressed version of a backup block image.
737+
*
738+
* Returns FALSE if compression fails (i.e., compressed result is actually
739+
* bigger than original). Otherwise, returns TRUE and sets 'dlen' to
740+
* the length of compressed block image.
741+
*/
742+
static bool
743+
XLogCompressBackupBlock(char * page, uint16 hole_offset, uint16 hole_length,
744+
char *dest, uint16 *dlen)
745+
{
746+
int32 orig_len = BLCKSZ - hole_length;
747+
int32 len;
748+
int32 extra_bytes = 0;
749+
char *source;
750+
char tmp[BLCKSZ];
751+
752+
if (hole_length != 0)
753+
{
754+
/* must skip the hole */
755+
source = tmp;
756+
memcpy(source, page, hole_offset);
757+
memcpy(source + hole_offset,
758+
page + (hole_offset + hole_length),
759+
BLCKSZ - (hole_length + hole_offset));
760+
761+
/*
762+
* Extra data needs to be stored in WAL record for the compressed
763+
* version of block image if the hole exists.
764+
*/
765+
extra_bytes = SizeOfXLogRecordBlockCompressHeader;
766+
}
767+
else
768+
source = page;
769+
770+
/*
771+
* We recheck the actual size even if pglz_compress() reports success
772+
* and see if the number of bytes saved by compression is larger than
773+
* the length of extra data needed for the compressed version of block
774+
* image.
775+
*/
776+
len = pglz_compress(source, orig_len, dest, PGLZ_strategy_default);
777+
if (len >= 0 &&
778+
len + extra_bytes < orig_len)
779+
{
780+
*dlen = (uint16) len; /* successful compression */
781+
return true;
782+
}
783+
return false;
784+
}
785+
683786
/*
684787
* Determine whether the buffer referenced has to be backed up.
685788
*

0 commit comments

Comments
 (0)