Skip to content

Commit a3fcbcd

Browse files
committed
Fix backup manifests to generate correct WAL-Ranges across timelines
In a backup manifest, WAL-Ranges stores the range of WAL that is required for the backup to be valid. pg_verifybackup would then internally use pg_waldump for the checks based on this data. When the timeline where the backup started was more than 1 with a history file looked at for the manifest data generation, the calculation of the WAL range for the first timeline to check was incorrect. The previous logic used as start LSN the start position of the first timeline, but it needs to use the start LSN of the backup. This would cause failures with pg_verifybackup, or any tools making use of the backup manifests. This commit adds a test based on a logic using a self-promoted node, making it rather cheap. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20210818.143031.1867083699202617521.horikyota.ntt@gmail.com Backpatch-through: 13
1 parent c818c25 commit a3fcbcd

File tree

2 files changed

+29
-5
lines changed

2 files changed

+29
-5
lines changed

src/backend/replication/backup_manifest.c

+11-4
Original file line numberDiff line numberDiff line change
@@ -251,19 +251,26 @@ AddWALInfoToBackupManifest(backup_manifest_info *manifest, XLogRecPtr startptr,
251251
errmsg("expected end timeline %u but found timeline %u",
252252
starttli, entry->tli));
253253

254-
if (!XLogRecPtrIsInvalid(entry->begin))
255-
tl_beginptr = entry->begin;
254+
/*
255+
* If this timeline entry matches with the timeline on which the
256+
* backup started, WAL needs to be checked from the start LSN of the
257+
* backup. If this entry refers to a newer timeline, WAL needs to be
258+
* checked since the beginning of this timeline, so use the LSN where
259+
* the timeline began.
260+
*/
261+
if (starttli == entry->tli)
262+
tl_beginptr = startptr;
256263
else
257264
{
258-
tl_beginptr = startptr;
265+
tl_beginptr = entry->begin;
259266

260267
/*
261268
* If we reach a TLI that has no valid beginning LSN, there can't
262269
* be any more timelines in the history after this point, so we'd
263270
* better have arrived at the expected starting TLI. If not,
264271
* something's gone horribly wrong.
265272
*/
266-
if (starttli != entry->tli)
273+
if (XLogRecPtrIsInvalid(entry->begin))
267274
ereport(ERROR,
268275
errmsg("expected start timeline %u but found timeline %u",
269276
starttli, entry->tli));

src/bin/pg_verifybackup/t/007_wal.pl

+18-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
use File::Path qw(rmtree);
1111
use PostgresNode;
1212
use TestLib;
13-
use Test::More tests => 7;
13+
use Test::More tests => 9;
1414

1515
# Start up the server and take a backup.
1616
my $primary = PostgresNode->new('primary');
@@ -59,3 +59,20 @@
5959
[ 'pg_verifybackup', $backup_path ],
6060
qr/WAL parsing failed for timeline 1/,
6161
'corrupt WAL file causes failure');
62+
63+
# Check that WAL-Ranges has correct values with a history file and
64+
# a timeline > 1. Rather than plugging in a new standby, do a
65+
# self-promotion of this node.
66+
$primary->stop;
67+
$primary->append_conf('standby.signal');
68+
$primary->start;
69+
$primary->promote;
70+
$primary->safe_psql('postgres', 'SELECT pg_switch_wal()');
71+
my $backup_path2 = $primary->backup_dir . '/test_tli';
72+
# The base backup run below does a checkpoint, that removes the first segment
73+
# of the current timeline.
74+
$primary->command_ok([ 'pg_basebackup', '-D', $backup_path2, '--no-sync' ],
75+
"base backup 2 ok");
76+
command_ok(
77+
[ 'pg_verifybackup', $backup_path2 ],
78+
'valid base backup with timeline > 1');

0 commit comments

Comments
 (0)