Skip to content

Commit 8e4e783

Browse files
committed
Don't run atexit callbacks in quickdie signal handlers.
exit() is not async-signal safe. Even if the libc implementation is, 3rd party libraries might have installed unsafe atexit() callbacks. After receiving SIGQUIT, we really just want to exit as quickly as possible, so we don't really want to run the atexit() callbacks anyway. The original report by Jimmy Yih was a self-deadlock in startup_die(). However, this patch doesn't address that scenario; the signal handling while waiting for the startup packet is more complicated. But at least this alleviates similar problems in the SIGQUIT handlers, like that reported by Asim R P later in the same thread. Backpatch to 9.3 (all supported versions). Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com
1 parent f3ed536 commit 8e4e783

File tree

7 files changed

+77
-108
lines changed

7 files changed

+77
-108
lines changed

src/backend/postmaster/bgworker.c

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -536,28 +536,21 @@ SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
536536
static void
537537
bgworker_quickdie(SIGNAL_ARGS)
538538
{
539-
sigaddset(&BlockSig, SIGQUIT); /* prevent nested calls */
540-
PG_SETMASK(&BlockSig);
541-
542-
/*
543-
* We DO NOT want to run proc_exit() callbacks -- we're here because
544-
* shared memory may be corrupted, so we don't want to try to clean up our
545-
* transaction. Just nail the windows shut and get out of town. Now that
546-
* there's an atexit callback to prevent third-party code from breaking
547-
* things by calling exit() directly, we have to reset the callbacks
548-
* explicitly to make this work as intended.
549-
*/
550-
on_exit_reset();
551-
552539
/*
553-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
554-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
540+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
541+
* because shared memory may be corrupted, so we don't want to try to
542+
* clean up our transaction. Just nail the windows shut and get out of
543+
* town. The callbacks wouldn't be safe to run from a signal handler,
544+
* anyway.
545+
*
546+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
547+
* a system reset cycle if someone sends a manual SIGQUIT to a random
555548
* backend. This is necessary precisely because we don't clean up our
556549
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
557550
* should ensure the postmaster sees this as a crash, too, but no harm in
558551
* being doubly sure.)
559552
*/
560-
exit(2);
553+
_exit(2);
561554
}
562555

563556
/*

src/backend/postmaster/bgwriter.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -403,27 +403,21 @@ BackgroundWriterMain(void)
403403
static void
404404
bg_quickdie(SIGNAL_ARGS)
405405
{
406-
PG_SETMASK(&BlockSig);
407-
408406
/*
409-
* We DO NOT want to run proc_exit() callbacks -- we're here because
410-
* shared memory may be corrupted, so we don't want to try to clean up our
411-
* transaction. Just nail the windows shut and get out of town. Now that
412-
* there's an atexit callback to prevent third-party code from breaking
413-
* things by calling exit() directly, we have to reset the callbacks
414-
* explicitly to make this work as intended.
415-
*/
416-
on_exit_reset();
417-
418-
/*
419-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
420-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
407+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
408+
* because shared memory may be corrupted, so we don't want to try to
409+
* clean up our transaction. Just nail the windows shut and get out of
410+
* town. The callbacks wouldn't be safe to run from a signal handler,
411+
* anyway.
412+
*
413+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
414+
* a system reset cycle if someone sends a manual SIGQUIT to a random
421415
* backend. This is necessary precisely because we don't clean up our
422416
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
423417
* should ensure the postmaster sees this as a crash, too, but no harm in
424418
* being doubly sure.)
425419
*/
426-
exit(2);
420+
_exit(2);
427421
}
428422

429423
/* SIGHUP: set flag to re-read config file at next convenient time */

src/backend/postmaster/checkpointer.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -807,27 +807,21 @@ IsCheckpointOnSchedule(double progress)
807807
static void
808808
chkpt_quickdie(SIGNAL_ARGS)
809809
{
810-
PG_SETMASK(&BlockSig);
811-
812810
/*
813-
* We DO NOT want to run proc_exit() callbacks -- we're here because
814-
* shared memory may be corrupted, so we don't want to try to clean up our
815-
* transaction. Just nail the windows shut and get out of town. Now that
816-
* there's an atexit callback to prevent third-party code from breaking
817-
* things by calling exit() directly, we have to reset the callbacks
818-
* explicitly to make this work as intended.
819-
*/
820-
on_exit_reset();
821-
822-
/*
823-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
824-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
811+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
812+
* because shared memory may be corrupted, so we don't want to try to
813+
* clean up our transaction. Just nail the windows shut and get out of
814+
* town. The callbacks wouldn't be safe to run from a signal handler,
815+
* anyway.
816+
*
817+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
818+
* a system reset cycle if someone sends a manual SIGQUIT to a random
825819
* backend. This is necessary precisely because we don't clean up our
826820
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
827821
* should ensure the postmaster sees this as a crash, too, but no harm in
828822
* being doubly sure.)
829823
*/
830-
exit(2);
824+
_exit(2);
831825
}
832826

833827
/* SIGHUP: set flag to re-read config file at next convenient time */

src/backend/postmaster/startup.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -68,27 +68,21 @@ static void StartupProcSigHupHandler(SIGNAL_ARGS);
6868
static void
6969
startupproc_quickdie(SIGNAL_ARGS)
7070
{
71-
PG_SETMASK(&BlockSig);
72-
73-
/*
74-
* We DO NOT want to run proc_exit() callbacks -- we're here because
75-
* shared memory may be corrupted, so we don't want to try to clean up our
76-
* transaction. Just nail the windows shut and get out of town. Now that
77-
* there's an atexit callback to prevent third-party code from breaking
78-
* things by calling exit() directly, we have to reset the callbacks
79-
* explicitly to make this work as intended.
80-
*/
81-
on_exit_reset();
82-
8371
/*
84-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
85-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
72+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
73+
* because shared memory may be corrupted, so we don't want to try to
74+
* clean up our transaction. Just nail the windows shut and get out of
75+
* town. The callbacks wouldn't be safe to run from a signal handler,
76+
* anyway.
77+
*
78+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
79+
* a system reset cycle if someone sends a manual SIGQUIT to a random
8680
* backend. This is necessary precisely because we don't clean up our
8781
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
8882
* should ensure the postmaster sees this as a crash, too, but no harm in
8983
* being doubly sure.)
9084
*/
91-
exit(2);
85+
_exit(2);
9286
}
9387

9488

src/backend/postmaster/walwriter.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -316,27 +316,21 @@ WalWriterMain(void)
316316
static void
317317
wal_quickdie(SIGNAL_ARGS)
318318
{
319-
PG_SETMASK(&BlockSig);
320-
321319
/*
322-
* We DO NOT want to run proc_exit() callbacks -- we're here because
323-
* shared memory may be corrupted, so we don't want to try to clean up our
324-
* transaction. Just nail the windows shut and get out of town. Now that
325-
* there's an atexit callback to prevent third-party code from breaking
326-
* things by calling exit() directly, we have to reset the callbacks
327-
* explicitly to make this work as intended.
328-
*/
329-
on_exit_reset();
330-
331-
/*
332-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
333-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
320+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
321+
* because shared memory may be corrupted, so we don't want to try to
322+
* clean up our transaction. Just nail the windows shut and get out of
323+
* town. The callbacks wouldn't be safe to run from a signal handler,
324+
* anyway.
325+
*
326+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
327+
* a system reset cycle if someone sends a manual SIGQUIT to a random
334328
* backend. This is necessary precisely because we don't clean up our
335329
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
336330
* should ensure the postmaster sees this as a crash, too, but no harm in
337331
* being doubly sure.)
338332
*/
339-
exit(2);
333+
_exit(2);
340334
}
341335

342336
/* SIGHUP: set flag to re-read config file at next convenient time */

src/backend/replication/walreceiver.c

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -832,27 +832,21 @@ WalRcvShutdownHandler(SIGNAL_ARGS)
832832
static void
833833
WalRcvQuickDieHandler(SIGNAL_ARGS)
834834
{
835-
PG_SETMASK(&BlockSig);
836-
837835
/*
838-
* We DO NOT want to run proc_exit() callbacks -- we're here because
839-
* shared memory may be corrupted, so we don't want to try to clean up our
840-
* transaction. Just nail the windows shut and get out of town. Now that
841-
* there's an atexit callback to prevent third-party code from breaking
842-
* things by calling exit() directly, we have to reset the callbacks
843-
* explicitly to make this work as intended.
844-
*/
845-
on_exit_reset();
846-
847-
/*
848-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
849-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
850-
* backend. This is necessary precisely because we don't clean up our
851-
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
852-
* should ensure the postmaster sees this as a crash, too, but no harm in
853-
* being doubly sure.)
836+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
837+
* because shared memory may be corrupted, so we don't want to try to
838+
* clean up our transaction. Just nail the windows shut and get out of
839+
* town. The callbacks wouldn't be safe to run from a signal handler,
840+
* anyway.
841+
*
842+
* Note we use _exit(2) not _exit(0). This is to force the postmaster
843+
* into a system reset cycle if someone sends a manual SIGQUIT to a
844+
* random backend. This is necessary precisely because we don't clean up
845+
* our shared memory state. (The "dead man switch" mechanism in
846+
* pmsignal.c should ensure the postmaster sees this as a crash, too, but
847+
* no harm in being doubly sure.)
854848
*/
855-
exit(2);
849+
_exit(2);
856850
}
857851

858852
/*

src/backend/tcop/postgres.c

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2579,6 +2579,16 @@ quickdie(SIGNAL_ARGS)
25792579
whereToSendOutput = DestNone;
25802580

25812581
/*
2582+
* Notify the client before exiting, to give a clue on what happened.
2583+
*
2584+
* It's dubious to call ereport() from a signal handler. It is certainly
2585+
* not async-signal safe. But it seems better to try, than to disconnect
2586+
* abruptly and leave the client wondering what happened. It's remotely
2587+
* possible that we crash or hang while trying to send the message, but
2588+
* receiving a SIGQUIT is a sign that something has already gone badly
2589+
* wrong, so there's not much to lose. Assuming the postmaster is still
2590+
* running, it will SIGKILL us soon if we get stuck for some reason.
2591+
*
25822592
* Ideally this should be ereport(FATAL), but then we'd not get control
25832593
* back...
25842594
*/
@@ -2593,24 +2603,20 @@ quickdie(SIGNAL_ARGS)
25932603
" database and repeat your command.")));
25942604

25952605
/*
2596-
* We DO NOT want to run proc_exit() callbacks -- we're here because
2597-
* shared memory may be corrupted, so we don't want to try to clean up our
2598-
* transaction. Just nail the windows shut and get out of town. Now that
2599-
* there's an atexit callback to prevent third-party code from breaking
2600-
* things by calling exit() directly, we have to reset the callbacks
2601-
* explicitly to make this work as intended.
2602-
*/
2603-
on_exit_reset();
2604-
2605-
/*
2606-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
2607-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
2606+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
2607+
* because shared memory may be corrupted, so we don't want to try to
2608+
* clean up our transaction. Just nail the windows shut and get out of
2609+
* town. The callbacks wouldn't be safe to run from a signal handler,
2610+
* anyway.
2611+
*
2612+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
2613+
* a system reset cycle if someone sends a manual SIGQUIT to a random
26082614
* backend. This is necessary precisely because we don't clean up our
26092615
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
26102616
* should ensure the postmaster sees this as a crash, too, but no harm in
26112617
* being doubly sure.)
26122618
*/
2613-
exit(2);
2619+
_exit(2);
26142620
}
26152621

26162622
/*

0 commit comments

Comments
 (0)