Skip to content

Commit 5e550ac

Browse files
committed
Document Warm Standby for High Availability
Includes sample standby script. Simon Riggs
1 parent 075c0ca commit 5e550ac

File tree

1 file changed

+307
-1
lines changed

1 file changed

+307
-1
lines changed

doc/src/sgml/backup.sgml

Lines changed: 307 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.84 2006/09/15 21:55:07 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.85 2006/09/15 22:02:21 momjian Exp $ -->
22

33
<chapter id="backup">
44
<title>Backup and Restore</title>
@@ -1203,6 +1203,312 @@ restore_command = 'copy /mnt/server/archivedir/%f "%p"' # Windows
12031203
</sect2>
12041204
</sect1>
12051205

1206+
<sect1 id="warm-standby">
1207+
<title>Warm Standby Servers for High Availability</title>
1208+
1209+
<indexterm zone="backup">
1210+
<primary>Warm Standby</primary>
1211+
</indexterm>
1212+
1213+
<indexterm zone="backup">
1214+
<primary>PITR Standby</primary>
1215+
</indexterm>
1216+
1217+
<indexterm zone="backup">
1218+
<primary>Standby Server</primary>
1219+
</indexterm>
1220+
1221+
<indexterm zone="backup">
1222+
<primary>Log Shipping</primary>
1223+
</indexterm>
1224+
1225+
<indexterm zone="backup">
1226+
<primary>Witness Server</primary>
1227+
</indexterm>
1228+
1229+
<indexterm zone="backup">
1230+
<primary>STONITH</primary>
1231+
</indexterm>
1232+
1233+
<indexterm zone="backup">
1234+
<primary>High Availability</primary>
1235+
</indexterm>
1236+
1237+
<para>
1238+
Continuous Archiving can be used to create a High Availability (HA)
1239+
cluster configuration with one or more Standby Servers ready to take
1240+
over operations in the case that the Primary Server fails. This
1241+
capability is more widely known as Warm Standby Log Shipping.
1242+
</para>
1243+
1244+
<para>
1245+
The Primary and Standby Server work together to provide this capability,
1246+
though the servers are only loosely coupled. The Primary Server operates
1247+
in Continuous Archiving mode, while the Standby Server operates in a
1248+
continuous Recovery mode, reading the WAL files from the Primary. No
1249+
changes to the database tables are required to enable this capability,
1250+
so it offers a low administration overhead in comparison with other
1251+
replication approaches. This configuration also has a very low
1252+
performance impact on the Primary server.
1253+
</para>
1254+
1255+
<para>
1256+
Directly moving WAL or "log" records from one database server to another
1257+
is typically described as Log Shipping. PostgreSQL implements file-based
1258+
Log Shipping, meaning WAL records are batched one file at a time. WAL
1259+
files can be shipped easily and cheaply over any distance, whether it be
1260+
to an adjacent system, another system on the same site or another system
1261+
on the far side of the globe. The bandwidth required for this technique
1262+
varies according to the transaction rate of the Primary Server.
1263+
Record-based Log Shipping is also possible with custom-developed
1264+
procedures, discussed in a later section. Future developments are likely
1265+
to include options for synchronous and/or integrated record-based log
1266+
shipping.
1267+
</para>
1268+
1269+
<para>
1270+
It should be noted that the log shipping is asynchronous, i.e. the WAL
1271+
records are shipped after transaction commit. As a result there can be a
1272+
small window of data loss, should the Primary Server suffer a
1273+
catastrophic failure. The window of data loss is minimised by the use of
1274+
the archive_timeout parameter, which can be set as low as a few seconds
1275+
if required. A very low setting can increase the bandwidth requirements
1276+
for file shipping.
1277+
</para>
1278+
1279+
<para>
1280+
The Standby server is not available for access, since it is continually
1281+
performing recovery processing. Recovery performance is sufficiently
1282+
good that the Standby will typically be only minutes away from full
1283+
availability once it has been activated. As a result, we refer to this
1284+
capability as a Warm Standby configuration that offers High
1285+
Availability. Restoring a server from an archived base backup and
1286+
rollforward can take considerably longer and so that technique only
1287+
really offers a solution for Disaster Recovery, not HA.
1288+
</para>
1289+
1290+
<para>
1291+
Other mechanisms for High Availability replication are available, both
1292+
commercially and as open-source software.
1293+
</para>
1294+
1295+
<para>
1296+
In general, log shipping between servers running different release
1297+
levels will not be possible. It is the policy of the PostgreSQL Worldwide
1298+
Development Group not to make changes to disk formats during minor release
1299+
upgrades, so it is likely that running different minor release levels
1300+
on Primary and Standby servers will work successfully. However, no
1301+
formal support for that is offered and you are advised not to allow this
1302+
to occur over long periods.
1303+
</para>
1304+
1305+
<sect2 id="warm-standby-planning">
1306+
<title>Planning</title>
1307+
1308+
<para>
1309+
On the Standby server all tablespaces and paths will refer to similarly
1310+
named mount points, so it is important to create the Primary and Standby
1311+
servers so that they are as similar as possible, at least from the
1312+
perspective of the database server. Furthermore, any CREATE TABLESPACE
1313+
commands will be passed across as-is, so any new mount points must be
1314+
created on both servers before they are used on the Primary. Hardware
1315+
need not be the same, but experience shows that maintaining two
1316+
identical systems is easier than maintaining two dissimilar ones over
1317+
the whole lifetime of the application and system.
1318+
</para>
1319+
1320+
<para>
1321+
There is no special mode required to enable a Standby server. The
1322+
operations that occur on both Primary and Standby servers are entirely
1323+
normal continuous archiving and recovery tasks. The primary point of
1324+
contact between the two database servers is the archive of WAL files
1325+
that both share: Primary writing to the archive, Standby reading from
1326+
the archive. Care must be taken to ensure that WAL archives for separate
1327+
servers do not become mixed together or confused.
1328+
</para>
1329+
1330+
<para>
1331+
The magic that makes the two loosely coupled servers work together is
1332+
simply a restore_command that waits for the next WAL file to be archived
1333+
from the Primary. The restore_command is specified in the recovery.conf
1334+
file on the Standby Server. Normal recovery processing would request a
1335+
file from the WAL archive, causing an error if the file was unavailable.
1336+
For Standby processing it is normal for the next file to be unavailable,
1337+
so we must be patient and wait for it to appear. A waiting
1338+
restore_command can be written as a custom script that loops after
1339+
polling for the existence of the next WAL file. There must also be some
1340+
way to trigger failover, which should interrupt the restore_command,
1341+
break the loop and return a file not found error to the Standby Server.
1342+
This then ends recovery and the Standby will then come up as a normal
1343+
server.
1344+
</para>
1345+
1346+
<para>
1347+
Sample code for the C version of the restore_command would be be:
1348+
<programlisting>
1349+
triggered = false;
1350+
while (!NextWALFileReady() && !triggered)
1351+
{
1352+
sleep(100000L); // wait for ~0.1 sec
1353+
if (CheckForExternalTrigger())
1354+
triggered = true;
1355+
}
1356+
if (!triggered)
1357+
CopyWALFileForRecovery();
1358+
</programlisting>
1359+
</para>
1360+
1361+
<para>
1362+
PostgreSQL does not provide the system software required to identify a
1363+
failure on the Primary and notify the Standby system and then the
1364+
Standby database server. Many such tools exist and are well integrated
1365+
with other aspects of a system failover, such as ip address migration.
1366+
</para>
1367+
1368+
<para>
1369+
Triggering failover is an important part of planning and design. The
1370+
restore_command is executed in full once for each WAL file. The process
1371+
running the restore_command is therefore created and dies for each file,
1372+
so there is no daemon or server process and so we cannot use signals and
1373+
a signal handler. A more permanent notification is required to trigger
1374+
the failover. It is possible to use a simple timeout facility,
1375+
especially if used in conjunction with a known archive_timeout setting
1376+
on the Primary. This is somewhat error prone since a network or busy
1377+
Primary server might be sufficient to initiate failover. A notification
1378+
mechanism such as the explicit creation of a trigger file is less error
1379+
prone, if this can be arranged.
1380+
</para>
1381+
</sect2>
1382+
1383+
<sect2 id="warm-standby-config">
1384+
<title>Implementation</title>
1385+
1386+
<para>
1387+
The short procedure for configuring a Standby Server is as follows. For
1388+
full details of each step, refer to previous sections as noted.
1389+
<orderedlist>
1390+
<listitem>
1391+
<para>
1392+
Set up Primary and Standby systems as near identically as possible,
1393+
including two identical copies of PostgreSQL at same release level.
1394+
</para>
1395+
</listitem>
1396+
<listitem>
1397+
<para>
1398+
Set up Continuous Archiving from the Primary to a WAL archive located
1399+
in a directory on the Standby Server. Ensure that both <xref
1400+
linkend="guc-archive-command"> and <xref linkend="guc-archive-timeout">
1401+
are set. (See <xref linkend="backup-archiving-wal">)
1402+
</para>
1403+
</listitem>
1404+
<listitem>
1405+
<para>
1406+
Make a Base Backup of the Primary Server. (See <xref
1407+
linkend="backup-base-backup">)
1408+
</para>
1409+
</listitem>
1410+
<listitem>
1411+
<para>
1412+
Begin recovery on the Standby Server from the local WAL archive,
1413+
using a recovery.conf that specifies a restore_command that waits as
1414+
described previously. (See <xref linkend="backup-pitr-recovery">)
1415+
</para>
1416+
</listitem>
1417+
</orderedlist>
1418+
</para>
1419+
1420+
<para>
1421+
Recovery treats the WAL Archive as read-only, so once a WAL file has
1422+
been copied to the Standby system it can be copied to tape at the same
1423+
time as it is being used by the Standby database server to recover.
1424+
Thus, running a Standby Server for High Availability can be performed at
1425+
the same time as files are stored for longer term Disaster Recovery
1426+
purposes.
1427+
</para>
1428+
1429+
<para>
1430+
For testing purposes, it is possible to run both Primary and Standby
1431+
servers on the same system. This does not provide any worthwhile
1432+
improvement on server robustness, nor would it be described as HA.
1433+
</para>
1434+
</sect2>
1435+
1436+
<sect2 id="warm-standby-failover">
1437+
<title>Failover</title>
1438+
1439+
<para>
1440+
If the Primary Server fails then the Standby Server should take begin
1441+
failover procedures.
1442+
</para>
1443+
1444+
<para>
1445+
If the Standby Server fails then no failover need take place. If the
1446+
Standby Server can be restarted, then the recovery process can also be
1447+
immediately restarted, taking advantage of Restartable Recovery.
1448+
</para>
1449+
1450+
<para>
1451+
If the Primary Server fails and then immediately restarts, you must have
1452+
a mechanism for informing it that it is no longer the Primary. This is
1453+
sometimes known as STONITH (Should the Other Node In The Head), which is
1454+
necessary to avoid situations where both systems think they are the
1455+
Primary, which can lead to confusion and ultimately data loss.
1456+
</para>
1457+
1458+
<para>
1459+
Many failover systems use just two systems, the Primary and the Standby,
1460+
connected by some kind of heartbeat mechanism to continually verify the
1461+
connectivity between the two and the viability of the Primary. It is
1462+
also possible to use a third system, known as a Witness Server to avoid
1463+
some problems of inappropriate failover, but the additional complexity
1464+
may not be worthwhile unless it is set-up with sufficient care and
1465+
rigorous testing.
1466+
</para>
1467+
1468+
<para>
1469+
At the instant that failover takes place to the Standby, we have only a
1470+
single server in operation. This is known as a degenerate state.
1471+
The former Standby is now the Primary, but the former Primary is down
1472+
and may stay down. We must now fully re-create a Standby server,
1473+
either on the former Primary system when it comes up, or on a third,
1474+
possibly new, system. Once complete the Primary and Standby can be
1475+
considered to have switched roles. Some people choose to use a third
1476+
server to provide additional protection across the failover interval,
1477+
though clearly this complicates the system configuration and
1478+
operational processes (and this can also act as a Witness Server).
1479+
</para>
1480+
1481+
<para>
1482+
So, switching from Primary to Standby Server can be fast, but requires
1483+
some time to re-prepare the failover cluster. Regular switching from
1484+
Primary to Standby is encouraged, since it allows the regular downtime
1485+
one each system required to maintain HA. This also acts as a test of the
1486+
failover so that it definitely works when you really need it. Written
1487+
administration procedures are advised.
1488+
</para>
1489+
</sect2>
1490+
1491+
<sect2 id="warm-standby-record">
1492+
<title>Implementing Record-based Log Shipping</title>
1493+
1494+
<para>
1495+
The main features for Log Shipping in this release are based around the
1496+
file-based Log Shipping described above. It is also possible to
1497+
implement record-based Log Shipping using the pg_xlogfile_name_offset()
1498+
function, though this requires custom development.
1499+
</para>
1500+
1501+
<para>
1502+
An external program can call pg_xlogfile_name_offset() to find out the
1503+
filename and the exact byte offset within it of the latest WAL pointer.
1504+
If the external program regularly polls the server it can find out how
1505+
far forward the pointer has moved. It can then access the WAL file
1506+
directly and copy those bytes across to a less up-to-date copy on a
1507+
Standby Server.
1508+
</para>
1509+
</sect2>
1510+
</sect1>
1511+
12061512
<sect1 id="migration">
12071513
<title>Migration Between Releases</title>
12081514

0 commit comments

Comments
 (0)