Skip to content

Commit 1c850fa

Browse files
committed
Make smart shutdown work in combination with Hot Standby/Streaming Replication.
At present, killing the startup process does not release any locks it holds, so we must wait to stop the startup and walreceiver processes until all read-only backends have exited. Without this patch, the startup and walreceiver processes never exit, so the server gets permanently stuck in a half-shutdown state. Fujii Masao, with review, docs, and comment adjustments by me.
1 parent 2c0870f commit 1c850fa

File tree

3 files changed

+42
-5
lines changed

3 files changed

+42
-5
lines changed

doc/src/sgml/ref/pg_ctl-ref.sgml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/pg_ctl-ref.sgml,v 1.49 2010/04/03 07:23:01 petere Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/pg_ctl-ref.sgml,v 1.50 2010/04/08 01:39:37 rhaas Exp $
33
PostgreSQL documentation
44
-->
55

@@ -152,6 +152,8 @@ PostgreSQL documentation
152152
shutdown methods can be selected with the <option>-m</option>
153153
option: <quote>Smart</quote> mode waits for online backup mode
154154
to finish and all the clients to disconnect. This is the default.
155+
If the server is in recovery, recovery and streaming replication
156+
will be terminated once all clients have disconnected.
155157
<quote>Fast</quote> mode does not wait for clients to disconnect and
156158
will terminate an online backup in progress. All active transactions are
157159
rolled back and clients are forcibly disconnected, then the

doc/src/sgml/runtime.sgml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.433 2010/03/21 00:43:40 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.434 2010/04/08 01:39:37 rhaas Exp $ -->
22

33
<chapter Id="runtime">
44
<title>Server Setup and Operation</title>
@@ -1338,7 +1338,9 @@ echo -17 > /proc/self/oom_adj
13381338
until online backup mode is no longer active. While backup mode is
13391339
active, new connections will still be allowed, but only to superusers
13401340
(this exception allows a superuser to connect to terminate
1341-
online backup mode).
1341+
online backup mode). If the server is in recovery when a smart
1342+
shutdown is requested, recovery and streaming replication will be
1343+
stopped only after all regular sessions have terminated.
13421344
</para>
13431345
</listitem>
13441346
</varlistentry>

src/backend/postmaster/postmaster.c

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
*
3838
*
3939
* IDENTIFICATION
40-
* $PostgreSQL: pgsql/src/backend/postmaster/postmaster.c,v 1.604 2010/03/25 20:40:17 sriggs Exp $
40+
* $PostgreSQL: pgsql/src/backend/postmaster/postmaster.c,v 1.605 2010/04/08 01:39:37 rhaas Exp $
4141
*
4242
* NOTES
4343
*
@@ -278,6 +278,7 @@ typedef enum
278278
PM_RECOVERY_CONSISTENT, /* consistent recovery mode */
279279
PM_RUN, /* normal "database is alive" state */
280280
PM_WAIT_BACKUP, /* waiting for online backup mode to end */
281+
PM_WAIT_READONLY, /* waiting for read only backends to exit */
281282
PM_WAIT_BACKENDS, /* waiting for live backends to exit */
282283
PM_SHUTDOWN, /* waiting for bgwriter to do shutdown ckpt */
283284
PM_SHUTDOWN_2, /* waiting for archiver and walsenders to
@@ -2173,7 +2174,17 @@ pmdie(SIGNAL_ARGS)
21732174
/* and the walwriter too */
21742175
if (WalWriterPID != 0)
21752176
signal_child(WalWriterPID, SIGTERM);
2176-
pmState = PM_WAIT_BACKUP;
2177+
/*
2178+
* If we're in recovery, we can't kill the startup process
2179+
* right away, because at present doing so does not release
2180+
* its locks. We might want to change this in a future
2181+
* release. For the time being, the PM_WAIT_READONLY state
2182+
* indicates that we're waiting for the regular (read only)
2183+
* backends to die off; once they do, we'll kill the startup
2184+
* and walreceiver processes.
2185+
*/
2186+
pmState = (pmState == PM_RUN) ?
2187+
PM_WAIT_BACKUP : PM_WAIT_READONLY;
21772188
}
21782189

21792190
/*
@@ -2209,6 +2220,7 @@ pmdie(SIGNAL_ARGS)
22092220
}
22102221
if (pmState == PM_RUN ||
22112222
pmState == PM_WAIT_BACKUP ||
2223+
pmState == PM_WAIT_READONLY ||
22122224
pmState == PM_WAIT_BACKENDS ||
22132225
pmState == PM_RECOVERY_CONSISTENT)
22142226
{
@@ -2771,6 +2783,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
27712783
pmState == PM_RECOVERY_CONSISTENT ||
27722784
pmState == PM_RUN ||
27732785
pmState == PM_WAIT_BACKUP ||
2786+
pmState == PM_WAIT_READONLY ||
27742787
pmState == PM_SHUTDOWN)
27752788
pmState = PM_WAIT_BACKENDS;
27762789
}
@@ -2846,6 +2859,26 @@ PostmasterStateMachine(void)
28462859
pmState = PM_WAIT_BACKENDS;
28472860
}
28482861

2862+
if (pmState == PM_WAIT_READONLY)
2863+
{
2864+
/*
2865+
* PM_WAIT_READONLY state ends when we have no regular backends that
2866+
* have been started during recovery. We kill the startup and
2867+
* walreceiver processes and transition to PM_WAIT_BACKENDS. Ideally,
2868+
* we might like to kill these processes first and then wait for
2869+
* backends to die off, but that doesn't work at present because
2870+
* killing the startup process doesn't release its locks.
2871+
*/
2872+
if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
2873+
{
2874+
if (StartupPID != 0)
2875+
signal_child(StartupPID, SIGTERM);
2876+
if (WalReceiverPID != 0)
2877+
signal_child(WalReceiverPID, SIGTERM);
2878+
pmState = PM_WAIT_BACKENDS;
2879+
}
2880+
}
2881+
28492882
/*
28502883
* If we are in a state-machine state that implies waiting for backends to
28512884
* exit, see if they're all gone, and change state if so.

0 commit comments

Comments
 (0)