Skip to content

Commit 8bcf90c

Browse files
committed
Fix race in TAP test 002_archiving.pl when restoring history file
This test, introduced in df86e52, uses a second standby to check that it is able to remove correctly RECOVERYHISTORY and RECOVERYXLOG at the end of recovery. This standby uses the archives of the primary to restore its contents, with some of the archive's contents coming from the first standby previously promoted. In slow environments, it was possible that the test did not check what it should, as the history file generated by the promotion of the first standby may not be stored yet on the archives the second standby feeds on. So, it could be possible that the second standby selects an incorrect timeline, without restoring a history file at all. This commits adds a wait phase to make sure that the history file required by the second standby is archived before this cluster is created. This relies on poll_query_until() with pg_stat_file() and an absolute path, something not supported in REL_10_STABLE. While on it, this adds a new test to check that the history file has been restored by looking at the logs of the second standby. This ensures that a RECOVERYHISTORY, whose removal needs to be checked, is created in the first place. This should make the test more robust. This test has been introduced by df86e52, but it came in light as an effect of the bug fixed by acf1dd4, where the extra restore_command calls made the test much slower. Reported-by: Andres Freund Discussion: https://postgr.es/m/YlT23IvsXkGuLzFi@paquier.xyz Backpatch-through: 11
1 parent acd0eb6 commit 8bcf90c

File tree

1 file changed

+25
-3
lines changed

1 file changed

+25
-3
lines changed

src/test/recovery/t/002_archiving.pl

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
use warnings;
77
use PostgresNode;
88
use TestLib;
9-
use Test::More tests => 3;
9+
use Test::More tests => 4;
1010
use File::Copy;
1111

1212
# Initialize primary node, doing archives
@@ -24,6 +24,8 @@
2424

2525
# Initialize standby node from backup, fetching WAL from archives
2626
my $node_standby = get_new_node('standby');
27+
# Note that this makes the standby store its contents on the archives
28+
# of the primary.
2729
$node_standby->init_from_backup($node_primary, $backup_name,
2830
has_restoring => 1);
2931
$node_standby->append_conf('postgresql.conf',
@@ -58,19 +60,39 @@
5860
# file, switch to a timeline large enough to allow a standby to recover
5961
# a history file from an archive. As this requires at least two timeline
6062
# switches, promote the existing standby first. Then create a second
61-
# standby based on the promoted one. Finally, the second standby is
62-
# promoted.
63+
# standby based on the primary, using its archives. Finally, the second
64+
# standby is promoted.
6365
$node_standby->promote;
6466

67+
# Wait until the history file has been stored on the archives of the
68+
# primary once the promotion of the standby completes. This ensures that
69+
# the second standby created below will be able to restore this file,
70+
# creating a RECOVERYHISTORY.
71+
my $primary_archive = $node_primary->archive_dir;
72+
$caughtup_query =
73+
"SELECT size IS NOT NULL FROM pg_stat_file('$primary_archive/00000002.history')";
74+
$node_primary->poll_query_until('postgres', $caughtup_query)
75+
or die "Timed out while waiting for archiving of 00000002.history";
76+
6577
my $node_standby2 = get_new_node('standby2');
6678
$node_standby2->init_from_backup($node_primary, $backup_name,
6779
has_restoring => 1);
6880
$node_standby2->start;
6981

82+
my $log_location = -s $node_standby2->logfile;
83+
7084
# Now promote standby2, and check that temporary files specifically
7185
# generated during archive recovery are removed by the end of recovery.
7286
$node_standby2->promote;
87+
88+
# Check the logs of the standby to see that the commands have failed.
89+
my $log_contents = slurp_file($node_standby2->logfile, $log_location);
7390
my $node_standby2_data = $node_standby2->data_dir;
91+
92+
like(
93+
$log_contents,
94+
qr/restored log file "00000002.history" from archive/s,
95+
"00000002.history retrieved from the archives");
7496
ok( !-f "$node_standby2_data/pg_wal/RECOVERYHISTORY",
7597
"RECOVERYHISTORY removed after promotion");
7698
ok( !-f "$node_standby2_data/pg_wal/RECOVERYXLOG",

0 commit comments

Comments
 (0)