Skip to content

Commit 2332020

Browse files
committed
Don't run atexit callbacks in quickdie signal handlers.
exit() is not async-signal safe. Even if the libc implementation is, 3rd party libraries might have installed unsafe atexit() callbacks. After receiving SIGQUIT, we really just want to exit as quickly as possible, so we don't really want to run the atexit() callbacks anyway. The original report by Jimmy Yih was a self-deadlock in startup_die(). However, this patch doesn't address that scenario; the signal handling while waiting for the startup packet is more complicated. But at least this alleviates similar problems in the SIGQUIT handlers, like that reported by Asim R P later in the same thread. Backpatch to 9.3 (all supported versions). Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com
1 parent 9446d71 commit 2332020

File tree

7 files changed

+77
-108
lines changed

7 files changed

+77
-108
lines changed

src/backend/postmaster/bgworker.c

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -636,28 +636,21 @@ SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
636636
static void
637637
bgworker_quickdie(SIGNAL_ARGS)
638638
{
639-
sigaddset(&BlockSig, SIGQUIT); /* prevent nested calls */
640-
PG_SETMASK(&BlockSig);
641-
642-
/*
643-
* We DO NOT want to run proc_exit() callbacks -- we're here because
644-
* shared memory may be corrupted, so we don't want to try to clean up our
645-
* transaction. Just nail the windows shut and get out of town. Now that
646-
* there's an atexit callback to prevent third-party code from breaking
647-
* things by calling exit() directly, we have to reset the callbacks
648-
* explicitly to make this work as intended.
649-
*/
650-
on_exit_reset();
651-
652639
/*
653-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
654-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
640+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
641+
* because shared memory may be corrupted, so we don't want to try to
642+
* clean up our transaction. Just nail the windows shut and get out of
643+
* town. The callbacks wouldn't be safe to run from a signal handler,
644+
* anyway.
645+
*
646+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
647+
* a system reset cycle if someone sends a manual SIGQUIT to a random
655648
* backend. This is necessary precisely because we don't clean up our
656649
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
657650
* should ensure the postmaster sees this as a crash, too, but no harm in
658651
* being doubly sure.)
659652
*/
660-
exit(2);
653+
_exit(2);
661654
}
662655

663656
/*

src/backend/postmaster/bgwriter.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -409,27 +409,21 @@ BackgroundWriterMain(void)
409409
static void
410410
bg_quickdie(SIGNAL_ARGS)
411411
{
412-
PG_SETMASK(&BlockSig);
413-
414412
/*
415-
* We DO NOT want to run proc_exit() callbacks -- we're here because
416-
* shared memory may be corrupted, so we don't want to try to clean up our
417-
* transaction. Just nail the windows shut and get out of town. Now that
418-
* there's an atexit callback to prevent third-party code from breaking
419-
* things by calling exit() directly, we have to reset the callbacks
420-
* explicitly to make this work as intended.
421-
*/
422-
on_exit_reset();
423-
424-
/*
425-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
426-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
413+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
414+
* because shared memory may be corrupted, so we don't want to try to
415+
* clean up our transaction. Just nail the windows shut and get out of
416+
* town. The callbacks wouldn't be safe to run from a signal handler,
417+
* anyway.
418+
*
419+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
420+
* a system reset cycle if someone sends a manual SIGQUIT to a random
427421
* backend. This is necessary precisely because we don't clean up our
428422
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
429423
* should ensure the postmaster sees this as a crash, too, but no harm in
430424
* being doubly sure.)
431425
*/
432-
exit(2);
426+
_exit(2);
433427
}
434428

435429
/* SIGHUP: set flag to re-read config file at next convenient time */

src/backend/postmaster/checkpointer.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -822,27 +822,21 @@ IsCheckpointOnSchedule(double progress)
822822
static void
823823
chkpt_quickdie(SIGNAL_ARGS)
824824
{
825-
PG_SETMASK(&BlockSig);
826-
827825
/*
828-
* We DO NOT want to run proc_exit() callbacks -- we're here because
829-
* shared memory may be corrupted, so we don't want to try to clean up our
830-
* transaction. Just nail the windows shut and get out of town. Now that
831-
* there's an atexit callback to prevent third-party code from breaking
832-
* things by calling exit() directly, we have to reset the callbacks
833-
* explicitly to make this work as intended.
834-
*/
835-
on_exit_reset();
836-
837-
/*
838-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
839-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
826+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
827+
* because shared memory may be corrupted, so we don't want to try to
828+
* clean up our transaction. Just nail the windows shut and get out of
829+
* town. The callbacks wouldn't be safe to run from a signal handler,
830+
* anyway.
831+
*
832+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
833+
* a system reset cycle if someone sends a manual SIGQUIT to a random
840834
* backend. This is necessary precisely because we don't clean up our
841835
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
842836
* should ensure the postmaster sees this as a crash, too, but no harm in
843837
* being doubly sure.)
844838
*/
845-
exit(2);
839+
_exit(2);
846840
}
847841

848842
/* SIGHUP: set flag to re-read config file at next convenient time */

src/backend/postmaster/startup.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -69,27 +69,21 @@ static void StartupProcSigHupHandler(SIGNAL_ARGS);
6969
static void
7070
startupproc_quickdie(SIGNAL_ARGS)
7171
{
72-
PG_SETMASK(&BlockSig);
73-
74-
/*
75-
* We DO NOT want to run proc_exit() callbacks -- we're here because
76-
* shared memory may be corrupted, so we don't want to try to clean up our
77-
* transaction. Just nail the windows shut and get out of town. Now that
78-
* there's an atexit callback to prevent third-party code from breaking
79-
* things by calling exit() directly, we have to reset the callbacks
80-
* explicitly to make this work as intended.
81-
*/
82-
on_exit_reset();
83-
8472
/*
85-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
86-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
73+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
74+
* because shared memory may be corrupted, so we don't want to try to
75+
* clean up our transaction. Just nail the windows shut and get out of
76+
* town. The callbacks wouldn't be safe to run from a signal handler,
77+
* anyway.
78+
*
79+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
80+
* a system reset cycle if someone sends a manual SIGQUIT to a random
8781
* backend. This is necessary precisely because we don't clean up our
8882
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
8983
* should ensure the postmaster sees this as a crash, too, but no harm in
9084
* being doubly sure.)
9185
*/
92-
exit(2);
86+
_exit(2);
9387
}
9488

9589

src/backend/postmaster/walwriter.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -319,27 +319,21 @@ WalWriterMain(void)
319319
static void
320320
wal_quickdie(SIGNAL_ARGS)
321321
{
322-
PG_SETMASK(&BlockSig);
323-
324322
/*
325-
* We DO NOT want to run proc_exit() callbacks -- we're here because
326-
* shared memory may be corrupted, so we don't want to try to clean up our
327-
* transaction. Just nail the windows shut and get out of town. Now that
328-
* there's an atexit callback to prevent third-party code from breaking
329-
* things by calling exit() directly, we have to reset the callbacks
330-
* explicitly to make this work as intended.
331-
*/
332-
on_exit_reset();
333-
334-
/*
335-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
336-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
323+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
324+
* because shared memory may be corrupted, so we don't want to try to
325+
* clean up our transaction. Just nail the windows shut and get out of
326+
* town. The callbacks wouldn't be safe to run from a signal handler,
327+
* anyway.
328+
*
329+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
330+
* a system reset cycle if someone sends a manual SIGQUIT to a random
337331
* backend. This is necessary precisely because we don't clean up our
338332
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
339333
* should ensure the postmaster sees this as a crash, too, but no harm in
340334
* being doubly sure.)
341335
*/
342-
exit(2);
336+
_exit(2);
343337
}
344338

345339
/* SIGHUP: set flag to re-read config file at next convenient time */

src/backend/replication/walreceiver.c

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -846,27 +846,21 @@ WalRcvShutdownHandler(SIGNAL_ARGS)
846846
static void
847847
WalRcvQuickDieHandler(SIGNAL_ARGS)
848848
{
849-
PG_SETMASK(&BlockSig);
850-
851849
/*
852-
* We DO NOT want to run proc_exit() callbacks -- we're here because
853-
* shared memory may be corrupted, so we don't want to try to clean up our
854-
* transaction. Just nail the windows shut and get out of town. Now that
855-
* there's an atexit callback to prevent third-party code from breaking
856-
* things by calling exit() directly, we have to reset the callbacks
857-
* explicitly to make this work as intended.
858-
*/
859-
on_exit_reset();
860-
861-
/*
862-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
863-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
864-
* backend. This is necessary precisely because we don't clean up our
865-
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
866-
* should ensure the postmaster sees this as a crash, too, but no harm in
867-
* being doubly sure.)
850+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
851+
* because shared memory may be corrupted, so we don't want to try to
852+
* clean up our transaction. Just nail the windows shut and get out of
853+
* town. The callbacks wouldn't be safe to run from a signal handler,
854+
* anyway.
855+
*
856+
* Note we use _exit(2) not _exit(0). This is to force the postmaster
857+
* into a system reset cycle if someone sends a manual SIGQUIT to a
858+
* random backend. This is necessary precisely because we don't clean up
859+
* our shared memory state. (The "dead man switch" mechanism in
860+
* pmsignal.c should ensure the postmaster sees this as a crash, too, but
861+
* no harm in being doubly sure.)
868862
*/
869-
exit(2);
863+
_exit(2);
870864
}
871865

872866
/*

src/backend/tcop/postgres.c

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2575,6 +2575,16 @@ quickdie(SIGNAL_ARGS)
25752575
whereToSendOutput = DestNone;
25762576

25772577
/*
2578+
* Notify the client before exiting, to give a clue on what happened.
2579+
*
2580+
* It's dubious to call ereport() from a signal handler. It is certainly
2581+
* not async-signal safe. But it seems better to try, than to disconnect
2582+
* abruptly and leave the client wondering what happened. It's remotely
2583+
* possible that we crash or hang while trying to send the message, but
2584+
* receiving a SIGQUIT is a sign that something has already gone badly
2585+
* wrong, so there's not much to lose. Assuming the postmaster is still
2586+
* running, it will SIGKILL us soon if we get stuck for some reason.
2587+
*
25782588
* Ideally this should be ereport(FATAL), but then we'd not get control
25792589
* back...
25802590
*/
@@ -2589,24 +2599,20 @@ quickdie(SIGNAL_ARGS)
25892599
" database and repeat your command.")));
25902600

25912601
/*
2592-
* We DO NOT want to run proc_exit() callbacks -- we're here because
2593-
* shared memory may be corrupted, so we don't want to try to clean up our
2594-
* transaction. Just nail the windows shut and get out of town. Now that
2595-
* there's an atexit callback to prevent third-party code from breaking
2596-
* things by calling exit() directly, we have to reset the callbacks
2597-
* explicitly to make this work as intended.
2598-
*/
2599-
on_exit_reset();
2600-
2601-
/*
2602-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
2603-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
2602+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
2603+
* because shared memory may be corrupted, so we don't want to try to
2604+
* clean up our transaction. Just nail the windows shut and get out of
2605+
* town. The callbacks wouldn't be safe to run from a signal handler,
2606+
* anyway.
2607+
*
2608+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
2609+
* a system reset cycle if someone sends a manual SIGQUIT to a random
26042610
* backend. This is necessary precisely because we don't clean up our
26052611
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
26062612
* should ensure the postmaster sees this as a crash, too, but no harm in
26072613
* being doubly sure.)
26082614
*/
2609-
exit(2);
2615+
_exit(2);
26102616
}
26112617

26122618
/*

0 commit comments

Comments
 (0)