Make timeout.c more robust against missed timer interrupts.

tglsfdc · tglsfdc · commit d37776e451e3 · 2022-02-10T11:52:28.000-05:00
Commit 09cf1d5 taught schedule_alarm() to not do anything if the next requested event is after when we expect the next interrupt to fire. However, if somehow an interrupt gets lost, we'll continue to not do anything indefinitely, even after the "next interrupt" time is obviously in the past. Thus, one missed interrupt can break timeout scheduling for the life of the session. Michael Harris reported a scenario where a bug in a user-defined function caused this to happen, so you don't even need to assume kernel bugs exist to think this is worth fixing. We can make things more robust at little cost by detecting the case where signal_due_at is before "now" and forcing a new setitimer call to occur. This isn't a completely bulletproof fix of course; but in our typical usage pattern where we frequently set timeouts and clear them before they are reached, the interrupt will get re-enabled after at most one timeout interval, which with a little luck will be before we really need it. While here, let's mark signal_due_at as volatile, since the signal handler can both examine and set it. I'm not sure there's any actual risk given that signal_pending is already volatile, but it's surely questionable. Backpatch to v14 where this logic came in. Michael Harris and Tom Lane Discussion: https://postgr.es/m/CADofcAWbMrvgwSMqO4iG_iD3E2v8ZUrC-_crB41my=VMM02-CA@mail.gmail.com
diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c
@@ -71,11 +71,12 @@ static volatile sig_atomic_t alarm_enabled = false;
 
 /*
  * State recording if and when we next expect the interrupt to fire.
+ * (signal_due_at is valid only when signal_pending is true.)
  * Note that the signal handler will unconditionally reset signal_pending to
  * false, so that can change asynchronously even when alarm_enabled is false.
  */
 static volatile sig_atomic_t signal_pending = false;
-static TimestampTz signal_due_at = 0;	/* valid only when signal_pending */
+static volatile TimestampTz signal_due_at = 0;
 
 
 /*****************************************************************************
@@ -217,10 +218,22 @@ schedule_alarm(TimestampTz now)
 
 		MemSet(&timeval, 0, sizeof(struct itimerval));
 
+		/*
+		 * If we think there's a signal pending, but current time is more than
+		 * 10ms past when the signal was due, then assume that the timeout
+		 * request got lost somehow; clear signal_pending so that we'll reset
+		 * the interrupt request below.  (10ms corresponds to the worst-case
+		 * timeout granularity on modern systems.)  It won't hurt us if the
+		 * interrupt does manage to fire between now and when we reach the
+		 * setitimer() call.
+		 */
+		if (signal_pending && now > signal_due_at + 10 * 1000)
+			signal_pending = false;
+
 		/*
 		 * Get the time remaining till the nearest pending timeout.  If it is
-		 * negative, assume that we somehow missed an interrupt, and force
-		 * signal_pending off.  This gives us a chance to recover if the
+		 * negative, assume that we somehow missed an interrupt, and clear
+		 * signal_pending.  This gives us another chance to recover if the
 		 * kernel drops a timeout request for some reason.
 		 */
 		nearest_timeout = active_timeouts[0]->fin_time;