Skip to content

Commit 9de9294

Browse files
committed
Stop archive recovery if WAL generated with wal_level=minimal is found.
Previously if hot standby was enabled, archive recovery exited with an error when it found WAL generated with wal_level=minimal. But if hot standby was disabled, it just reported a warning and continued in that case. Which could lead to data loss or errors during normal operation. A warning was emitted, but users could easily miss that and not notice this serious situation until they encountered the actual errors. To improve this situation, this commit changes archive recovery so that it exits with FATAL error when it finds WAL generated with wal_level=minimal whatever the setting of hot standby. This enables users to notice the serious situation soon. The FATAL error is thrown if archive recovery starts from a base backup taken before wal_level is changed to minimal. When archive recovery exits with the error, if users have a base backup taken after setting wal_level to higher than minimal, they can recover the database by starting archive recovery from that newer backup. But note that if such backup doesn't exist, there is no easy way to complete archive recovery, which may make the database server unstartable and users may lose whole database. The commit adds the note about this risk into the document. Even in the case of unstartable database server, previously by just disabling hot standby users could avoid the error during archive recovery, forcibly start up the server and salvage data from it. But note that this commit makes this procedure unavailable at all. Author: Takamichi Osumi Reviewed-by: Laurenz Albe, Kyotaro Horiguchi, David Steele, Fujii Masao Discussion: https://postgr.es/m/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90@OSBPR01MB4888.jpnprd01.prod.outlook.com
1 parent c4c393b commit 9de9294

File tree

4 files changed

+106
-9
lines changed

4 files changed

+106
-9
lines changed

doc/src/sgml/config.sgml

+4
Original file line numberDiff line numberDiff line change
@@ -2720,6 +2720,10 @@ include_dir 'conf.d'
27202720
data from a base backup and the WAL logs, so <literal>replica</literal> or
27212721
higher must be used to enable WAL archiving
27222722
(<xref linkend="guc-archive-mode"/>) and streaming replication.
2723+
Note that changing <varname>wal_level</varname> to
2724+
<literal>minimal</literal> makes any base backups taken before
2725+
unavailable for archive recovery and standby server, which may
2726+
lead to database loss.
27232727
</para>
27242728
<para>
27252729
In <literal>logical</literal> level, the same information is logged as

doc/src/sgml/perform.sgml

+3-1
Original file line numberDiff line numberDiff line change
@@ -1745,7 +1745,9 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
17451745
<xref linkend="guc-wal-level"/> to <literal>minimal</literal>,
17461746
<xref linkend="guc-archive-mode"/> to <literal>off</literal>, and
17471747
<xref linkend="guc-max-wal-senders"/> to zero.
1748-
But note that changing these settings requires a server restart.
1748+
But note that changing these settings requires a server restart,
1749+
and makes any base backups taken before unavailable for archive
1750+
recovery and standby server, which may lead to database loss.
17491751
</para>
17501752

17511753
<para>

src/backend/access/transam/xlog.c

+4-8
Original file line numberDiff line numberDiff line change
@@ -6403,9 +6403,10 @@ CheckRequiredParameterValues(void)
64036403
*/
64046404
if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
64056405
{
6406-
ereport(WARNING,
6407-
(errmsg("WAL was generated with wal_level=minimal, data may be missing"),
6408-
errhint("This happens if you temporarily set wal_level=minimal without taking a new base backup.")));
6406+
ereport(FATAL,
6407+
(errmsg("WAL was generated with wal_level=minimal, cannot continue recovering"),
6408+
errdetail("This happens if you temporarily set wal_level=minimal on the server."),
6409+
errhint("Use a backup taken after setting wal_level to higher than minimal.")));
64096410
}
64106411

64116412
/*
@@ -6414,11 +6415,6 @@ CheckRequiredParameterValues(void)
64146415
*/
64156416
if (ArchiveRecoveryRequested && EnableHotStandby)
64166417
{
6417-
if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
6418-
ereport(ERROR,
6419-
(errmsg("hot standby is not possible because wal_level was not set to \"replica\" or higher on the primary server"),
6420-
errhint("Either set wal_level to \"replica\" on the primary, or turn off hot_standby here.")));
6421-
64226418
/* We ignore autovacuum_max_workers when we make this test. */
64236419
RecoveryRequiresIntParameter("max_connections",
64246420
MaxConnections,
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Test for archive recovery of WAL generated with wal_level=minimal
2+
use strict;
3+
use warnings;
4+
use PostgresNode;
5+
use TestLib;
6+
use Test::More tests => 2;
7+
use Time::HiRes qw(usleep);
8+
9+
# Initialize and start node with wal_level = replica and WAL archiving
10+
# enabled.
11+
my $node = get_new_node('orig');
12+
$node->init(has_archiving => 1);
13+
my $replica_config = q[
14+
wal_level = replica
15+
archive_mode = on
16+
max_wal_senders = 10
17+
hot_standby = off
18+
];
19+
$node->append_conf('postgresql.conf', $replica_config);
20+
$node->start;
21+
22+
# Take backup
23+
my $backup_name = 'my_backup';
24+
$node->backup($backup_name);
25+
26+
# Restart node with wal_level = minimal and WAL archiving disabled
27+
# to generate WAL with that setting. Note that such WAL has not been
28+
# archived yet at this moment because WAL archiving is not enabled.
29+
$node->append_conf(
30+
'postgresql.conf', q[
31+
wal_level = minimal
32+
archive_mode = off
33+
max_wal_senders = 0
34+
]);
35+
$node->restart;
36+
37+
# Restart node with wal_level = replica and WAL archiving enabled
38+
# to archive WAL previously generated with wal_level = minimal.
39+
# We ensure the WAL file containing the record indicating the change
40+
# of wal_level to minimal is archived by checking pg_stat_archiver.
41+
$node->append_conf('postgresql.conf', $replica_config);
42+
$node->restart;
43+
44+
# Find next WAL segment to be archived
45+
my $walfile_to_be_archived = $node->safe_psql('postgres',
46+
"SELECT pg_walfile_name(pg_current_wal_lsn());");
47+
48+
# Make WAL segment eligible for archival
49+
$node->safe_psql('postgres', 'SELECT pg_switch_wal()');
50+
my $archive_wait_query
51+
= "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
52+
53+
# Wait until the WAL segment has been archived.
54+
$node->poll_query_until('postgres', $archive_wait_query)
55+
or die "Timed out while waiting for WAL segment to be archived";
56+
57+
$node->stop;
58+
59+
# Initialize new node from backup, and start archive recovery. Check that
60+
# archive recovery fails with an error when it detects the WAL record
61+
# indicating the change of wal_level to minimal and node stops.
62+
sub test_recovery_wal_level_minimal
63+
{
64+
my ($node_name, $node_text, $standby_setting) = @_;
65+
66+
my $recovery_node = get_new_node($node_name);
67+
$recovery_node->init_from_backup(
68+
$node, $backup_name,
69+
has_restoring => 1, standby => $standby_setting);
70+
71+
# Use run_log instead of recovery_node->start because this test expects
72+
# that the server ends with an error during recovery.
73+
run_log(
74+
['pg_ctl','-D', $recovery_node->data_dir, '-l',
75+
$recovery_node->logfile, 'start']);
76+
77+
# Wait up to 180s for postgres to terminate
78+
foreach my $i (0 .. 1800)
79+
{
80+
last if !-f $recovery_node->data_dir . '/postmaster.pid';
81+
usleep(100_000);
82+
}
83+
84+
# Confirm that the archive recovery fails with an expected error
85+
my $logfile = slurp_file($recovery_node->logfile());
86+
ok( $logfile =~
87+
qr/FATAL: WAL was generated with wal_level=minimal, cannot continue recovering/,
88+
"$node_text ends with an error because it finds WAL generated with wal_level=minimal");
89+
}
90+
91+
# Test for archive recovery
92+
test_recovery_wal_level_minimal('archive_recovery', 'archive recovery', 0);
93+
94+
# Test for standby server
95+
test_recovery_wal_level_minimal('standby', 'standby', 1);

0 commit comments

Comments
 (0)