Skip to content

Commit d5a9b70

Browse files
committed
Don't run atexit callbacks in quickdie signal handlers.
exit() is not async-signal safe. Even if the libc implementation is, 3rd party libraries might have installed unsafe atexit() callbacks. After receiving SIGQUIT, we really just want to exit as quickly as possible, so we don't really want to run the atexit() callbacks anyway. The original report by Jimmy Yih was a self-deadlock in startup_die(). However, this patch doesn't address that scenario; the signal handling while waiting for the startup packet is more complicated. But at least this alleviates similar problems in the SIGQUIT handlers, like that reported by Asim R P later in the same thread. Backpatch to 9.3 (all supported versions). Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com
1 parent 33c5d3b commit d5a9b70

File tree

7 files changed

+77
-108
lines changed

7 files changed

+77
-108
lines changed

src/backend/postmaster/bgworker.c

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -515,28 +515,21 @@ SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
515515
static void
516516
bgworker_quickdie(SIGNAL_ARGS)
517517
{
518-
sigaddset(&BlockSig, SIGQUIT); /* prevent nested calls */
519-
PG_SETMASK(&BlockSig);
520-
521-
/*
522-
* We DO NOT want to run proc_exit() callbacks -- we're here because
523-
* shared memory may be corrupted, so we don't want to try to clean up our
524-
* transaction. Just nail the windows shut and get out of town. Now that
525-
* there's an atexit callback to prevent third-party code from breaking
526-
* things by calling exit() directly, we have to reset the callbacks
527-
* explicitly to make this work as intended.
528-
*/
529-
on_exit_reset();
530-
531518
/*
532-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
533-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
519+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
520+
* because shared memory may be corrupted, so we don't want to try to
521+
* clean up our transaction. Just nail the windows shut and get out of
522+
* town. The callbacks wouldn't be safe to run from a signal handler,
523+
* anyway.
524+
*
525+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
526+
* a system reset cycle if someone sends a manual SIGQUIT to a random
534527
* backend. This is necessary precisely because we don't clean up our
535528
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
536529
* should ensure the postmaster sees this as a crash, too, but no harm in
537530
* being doubly sure.)
538531
*/
539-
exit(2);
532+
_exit(2);
540533
}
541534

542535
/*

src/backend/postmaster/bgwriter.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -408,27 +408,21 @@ BackgroundWriterMain(void)
408408
static void
409409
bg_quickdie(SIGNAL_ARGS)
410410
{
411-
PG_SETMASK(&BlockSig);
412-
413411
/*
414-
* We DO NOT want to run proc_exit() callbacks -- we're here because
415-
* shared memory may be corrupted, so we don't want to try to clean up our
416-
* transaction. Just nail the windows shut and get out of town. Now that
417-
* there's an atexit callback to prevent third-party code from breaking
418-
* things by calling exit() directly, we have to reset the callbacks
419-
* explicitly to make this work as intended.
420-
*/
421-
on_exit_reset();
422-
423-
/*
424-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
425-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
412+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
413+
* because shared memory may be corrupted, so we don't want to try to
414+
* clean up our transaction. Just nail the windows shut and get out of
415+
* town. The callbacks wouldn't be safe to run from a signal handler,
416+
* anyway.
417+
*
418+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
419+
* a system reset cycle if someone sends a manual SIGQUIT to a random
426420
* backend. This is necessary precisely because we don't clean up our
427421
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
428422
* should ensure the postmaster sees this as a crash, too, but no harm in
429423
* being doubly sure.)
430424
*/
431-
exit(2);
425+
_exit(2);
432426
}
433427

434428
/* SIGHUP: set flag to re-read config file at next convenient time */

src/backend/postmaster/checkpointer.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -813,27 +813,21 @@ IsCheckpointOnSchedule(double progress)
813813
static void
814814
chkpt_quickdie(SIGNAL_ARGS)
815815
{
816-
PG_SETMASK(&BlockSig);
817-
818816
/*
819-
* We DO NOT want to run proc_exit() callbacks -- we're here because
820-
* shared memory may be corrupted, so we don't want to try to clean up our
821-
* transaction. Just nail the windows shut and get out of town. Now that
822-
* there's an atexit callback to prevent third-party code from breaking
823-
* things by calling exit() directly, we have to reset the callbacks
824-
* explicitly to make this work as intended.
825-
*/
826-
on_exit_reset();
827-
828-
/*
829-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
830-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
817+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
818+
* because shared memory may be corrupted, so we don't want to try to
819+
* clean up our transaction. Just nail the windows shut and get out of
820+
* town. The callbacks wouldn't be safe to run from a signal handler,
821+
* anyway.
822+
*
823+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
824+
* a system reset cycle if someone sends a manual SIGQUIT to a random
831825
* backend. This is necessary precisely because we don't clean up our
832826
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
833827
* should ensure the postmaster sees this as a crash, too, but no harm in
834828
* being doubly sure.)
835829
*/
836-
exit(2);
830+
_exit(2);
837831
}
838832

839833
/* SIGHUP: set flag to re-read config file at next convenient time */

src/backend/postmaster/startup.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -68,27 +68,21 @@ static void StartupProcSigHupHandler(SIGNAL_ARGS);
6868
static void
6969
startupproc_quickdie(SIGNAL_ARGS)
7070
{
71-
PG_SETMASK(&BlockSig);
72-
73-
/*
74-
* We DO NOT want to run proc_exit() callbacks -- we're here because
75-
* shared memory may be corrupted, so we don't want to try to clean up our
76-
* transaction. Just nail the windows shut and get out of town. Now that
77-
* there's an atexit callback to prevent third-party code from breaking
78-
* things by calling exit() directly, we have to reset the callbacks
79-
* explicitly to make this work as intended.
80-
*/
81-
on_exit_reset();
82-
8371
/*
84-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
85-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
72+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
73+
* because shared memory may be corrupted, so we don't want to try to
74+
* clean up our transaction. Just nail the windows shut and get out of
75+
* town. The callbacks wouldn't be safe to run from a signal handler,
76+
* anyway.
77+
*
78+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
79+
* a system reset cycle if someone sends a manual SIGQUIT to a random
8680
* backend. This is necessary precisely because we don't clean up our
8781
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
8882
* should ensure the postmaster sees this as a crash, too, but no harm in
8983
* being doubly sure.)
9084
*/
91-
exit(2);
85+
_exit(2);
9286
}
9387

9488

src/backend/postmaster/walwriter.c

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -328,27 +328,21 @@ WalWriterMain(void)
328328
static void
329329
wal_quickdie(SIGNAL_ARGS)
330330
{
331-
PG_SETMASK(&BlockSig);
332-
333331
/*
334-
* We DO NOT want to run proc_exit() callbacks -- we're here because
335-
* shared memory may be corrupted, so we don't want to try to clean up our
336-
* transaction. Just nail the windows shut and get out of town. Now that
337-
* there's an atexit callback to prevent third-party code from breaking
338-
* things by calling exit() directly, we have to reset the callbacks
339-
* explicitly to make this work as intended.
340-
*/
341-
on_exit_reset();
342-
343-
/*
344-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
345-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
332+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
333+
* because shared memory may be corrupted, so we don't want to try to
334+
* clean up our transaction. Just nail the windows shut and get out of
335+
* town. The callbacks wouldn't be safe to run from a signal handler,
336+
* anyway.
337+
*
338+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
339+
* a system reset cycle if someone sends a manual SIGQUIT to a random
346340
* backend. This is necessary precisely because we don't clean up our
347341
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
348342
* should ensure the postmaster sees this as a crash, too, but no harm in
349343
* being doubly sure.)
350344
*/
351-
exit(2);
345+
_exit(2);
352346
}
353347

354348
/* SIGHUP: set flag to re-read config file at next convenient time */

src/backend/replication/walreceiver.c

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -778,27 +778,21 @@ WalRcvShutdownHandler(SIGNAL_ARGS)
778778
static void
779779
WalRcvQuickDieHandler(SIGNAL_ARGS)
780780
{
781-
PG_SETMASK(&BlockSig);
782-
783781
/*
784-
* We DO NOT want to run proc_exit() callbacks -- we're here because
785-
* shared memory may be corrupted, so we don't want to try to clean up our
786-
* transaction. Just nail the windows shut and get out of town. Now that
787-
* there's an atexit callback to prevent third-party code from breaking
788-
* things by calling exit() directly, we have to reset the callbacks
789-
* explicitly to make this work as intended.
790-
*/
791-
on_exit_reset();
792-
793-
/*
794-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
795-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
796-
* backend. This is necessary precisely because we don't clean up our
797-
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
798-
* should ensure the postmaster sees this as a crash, too, but no harm in
799-
* being doubly sure.)
782+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
783+
* because shared memory may be corrupted, so we don't want to try to
784+
* clean up our transaction. Just nail the windows shut and get out of
785+
* town. The callbacks wouldn't be safe to run from a signal handler,
786+
* anyway.
787+
*
788+
* Note we use _exit(2) not _exit(0). This is to force the postmaster
789+
* into a system reset cycle if someone sends a manual SIGQUIT to a
790+
* random backend. This is necessary precisely because we don't clean up
791+
* our shared memory state. (The "dead man switch" mechanism in
792+
* pmsignal.c should ensure the postmaster sees this as a crash, too, but
793+
* no harm in being doubly sure.)
800794
*/
801-
exit(2);
795+
_exit(2);
802796
}
803797

804798
/*

src/backend/tcop/postgres.c

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2559,6 +2559,16 @@ quickdie(SIGNAL_ARGS)
25592559
whereToSendOutput = DestNone;
25602560

25612561
/*
2562+
* Notify the client before exiting, to give a clue on what happened.
2563+
*
2564+
* It's dubious to call ereport() from a signal handler. It is certainly
2565+
* not async-signal safe. But it seems better to try, than to disconnect
2566+
* abruptly and leave the client wondering what happened. It's remotely
2567+
* possible that we crash or hang while trying to send the message, but
2568+
* receiving a SIGQUIT is a sign that something has already gone badly
2569+
* wrong, so there's not much to lose. Assuming the postmaster is still
2570+
* running, it will SIGKILL us soon if we get stuck for some reason.
2571+
*
25622572
* Ideally this should be ereport(FATAL), but then we'd not get control
25632573
* back...
25642574
*/
@@ -2573,24 +2583,20 @@ quickdie(SIGNAL_ARGS)
25732583
" database and repeat your command.")));
25742584

25752585
/*
2576-
* We DO NOT want to run proc_exit() callbacks -- we're here because
2577-
* shared memory may be corrupted, so we don't want to try to clean up our
2578-
* transaction. Just nail the windows shut and get out of town. Now that
2579-
* there's an atexit callback to prevent third-party code from breaking
2580-
* things by calling exit() directly, we have to reset the callbacks
2581-
* explicitly to make this work as intended.
2582-
*/
2583-
on_exit_reset();
2584-
2585-
/*
2586-
* Note we do exit(2) not exit(0). This is to force the postmaster into a
2587-
* system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
2586+
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
2587+
* because shared memory may be corrupted, so we don't want to try to
2588+
* clean up our transaction. Just nail the windows shut and get out of
2589+
* town. The callbacks wouldn't be safe to run from a signal handler,
2590+
* anyway.
2591+
*
2592+
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
2593+
* a system reset cycle if someone sends a manual SIGQUIT to a random
25882594
* backend. This is necessary precisely because we don't clean up our
25892595
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
25902596
* should ensure the postmaster sees this as a crash, too, but no harm in
25912597
* being doubly sure.)
25922598
*/
2593-
exit(2);
2599+
_exit(2);
25942600
}
25952601

25962602
/*

0 commit comments

Comments
 (0)