Skip to content

Commit 8b53c08

Browse files
committed
Fix incorrect order of lock file removal and failure to close() sockets.
Commit c9b0cbe accidentally broke the order of operations during postmaster shutdown: it resulted in removing the per-socket lockfiles after, not before, postmaster.pid. This creates a race-condition hazard for a new postmaster that's started immediately after observing that postmaster.pid has disappeared; if it sees the socket lockfile still present, it will quite properly refuse to start. This error appears to be the explanation for at least some of the intermittent buildfarm failures we've seen in the pg_upgrade test. Another problem, which has been there all along, is that the postmaster has never bothered to close() its listen sockets, but has just allowed them to close at process death. This creates a different race condition for an incoming postmaster: it might be unable to bind to the desired listen address because the old postmaster is still incumbent. This might explain some odd failures we've seen in the past, too. (Note: this is not related to the fact that individual backends don't close their client communication sockets. That behavior is intentional and is not changed by this patch.) Fix by adding an on_proc_exit function that closes the postmaster's ports explicitly, and (in 9.3 and up) reshuffling the responsibility for where to unlink the Unix socket files. Lock file unlinking can stay where it is, but teach it to unlink the lock files in reverse order of creation.
1 parent 4467996 commit 8b53c08

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed

src/backend/postmaster/postmaster.c

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -335,6 +335,7 @@ static DNSServiceRef bonjour_sdref = NULL;
335335
/*
336336
* postmaster.c - function prototypes
337337
*/
338+
static void CloseServerPorts(int status, Datum arg);
338339
static void getInstallationPaths(const char *argv0);
339340
static void checkDataDir(void);
340341
static void pmdaemonize(void);
@@ -977,6 +978,14 @@ PostmasterMain(int argc, char *argv[])
977978
ereport(FATAL,
978979
(errmsg("no socket created for listening")));
979980

981+
/*
982+
* Set up an on_proc_exit function that's charged with closing the sockets
983+
* again at postmaster shutdown. You might think we should have done this
984+
* earlier, but we want it to run before not after the proc_exit callback
985+
* that will remove the Unix socket file.
986+
*/
987+
on_proc_exit(CloseServerPorts, 0);
988+
980989
/*
981990
* Set up shared memory and semaphores.
982991
*/
@@ -1137,6 +1146,36 @@ PostmasterMain(int argc, char *argv[])
11371146
}
11381147

11391148

1149+
/*
1150+
* on_proc_exit callback to close server's listen sockets
1151+
*/
1152+
static void
1153+
CloseServerPorts(int status, Datum arg)
1154+
{
1155+
int i;
1156+
1157+
/*
1158+
* First, explicitly close all the socket FDs. We used to just let this
1159+
* happen implicitly at postmaster exit, but it's better to close them
1160+
* before we remove the postmaster.pid lockfile; otherwise there's a race
1161+
* condition if a new postmaster wants to re-use the TCP port number.
1162+
*/
1163+
for (i = 0; i < MAXLISTEN; i++)
1164+
{
1165+
if (ListenSocket[i] != PGINVALID_SOCKET)
1166+
{
1167+
StreamClose(ListenSocket[i]);
1168+
ListenSocket[i] = PGINVALID_SOCKET;
1169+
}
1170+
}
1171+
1172+
/*
1173+
* Removal of the Unix socket file and socket lockfile will happen in
1174+
* later on_proc_exit callbacks.
1175+
*/
1176+
}
1177+
1178+
11401179
/*
11411180
* Compute and check the directory paths to files that are part of the
11421181
* installation (as deduced from the postgres executable's own location)

0 commit comments

Comments
 (0)