Skip to content

Commit 8ff5186

Browse files
committed
Fix confusion of max_parallel_workers mechanism following crash.
Commit b460f5d failed to contemplate the possibilit that a parallel worker registered before a crash would be unregistered only after the crash; if that happened, we'd end up with parallel_terminate_count > parallel_register_count and the system would refuse to launch any more parallel workers. The easiest way to fix that seems to be to forget BGW_NEVER_RESTART workers in ResetBackgroundWorkerCrashTimes() rather than leaving them around to be cleaned up after the conclusion of the restart, so that they go away before rather than after shared memory is reset. To make sure that this fix is water-tight, don't allow parallel workers to be anything other than BGW_NEVER_RESTART, so that after recovering from a crash, 0 is guaranteed to be the correct starting value for parallel_register_count. The core code wouldn't do this anyway, but somebody might try to do it in extension code. Report by Thomas Vondra. Patch by me, reviewed by Kuntal Ghosh. Discussion: http://postgr.es/m/CAGz5QC+AVEVS+3rBKRq83AxkJLMZ1peMt4nnrQwczxOrmo3CNw@mail.gmail.com
1 parent 1e298b8 commit 8ff5186

File tree

1 file changed

+42
-6
lines changed

1 file changed

+42
-6
lines changed

src/backend/postmaster/bgworker.c

+42-6
Original file line numberDiff line numberDiff line change
@@ -515,13 +515,34 @@ ResetBackgroundWorkerCrashTimes(void)
515515

516516
rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
517517

518-
/*
519-
* For workers that should not be restarted, we don't want to lose the
520-
* information that they have crashed; otherwise, they would be
521-
* restarted, which is wrong.
522-
*/
523-
if (rw->rw_worker.bgw_restart_time != BGW_NEVER_RESTART)
518+
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
519+
{
520+
/*
521+
* Workers marked BGW_NVER_RESTART shouldn't get relaunched after
522+
* the crash, so forget about them. (If we wait until after the
523+
* crash to forget about them, and they are parallel workers,
524+
* parallel_terminate_count will get incremented after we've
525+
* already zeroed parallel_register_count, which would be bad.)
526+
*/
527+
ForgetBackgroundWorker(&iter);
528+
}
529+
else
530+
{
531+
/*
532+
* The accounting which we do via parallel_register_count and
533+
* parallel_terminate_count would get messed up if a worker marked
534+
* parallel could survive a crash and restart cycle. All such
535+
* workers should be marked BGW_NEVER_RESTART, and thus control
536+
* should never reach this branch.
537+
*/
538+
Assert((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) == 0);
539+
540+
/*
541+
* Allow this worker to be restarted immediately after we finish
542+
* resetting.
543+
*/
524544
rw->rw_crashed_at = 0;
545+
}
525546
}
526547
}
527548

@@ -589,6 +610,21 @@ SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
589610
return false;
590611
}
591612

613+
/*
614+
* Parallel workers may not be configured for restart, because the
615+
* parallel_register_count/parallel_terminate_count accounting can't
616+
* handle parallel workers lasting through a crash-and-restart cycle.
617+
*/
618+
if (worker->bgw_restart_time != BGW_NEVER_RESTART &&
619+
(worker->bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
620+
{
621+
ereport(elevel,
622+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
623+
errmsg("background worker \"%s\": parallel workers may not be configured for restart",
624+
worker->bgw_name)));
625+
return false;
626+
}
627+
592628
return true;
593629
}
594630

0 commit comments

Comments
 (0)