Skip to content

Commit e652273

Browse files
committed
Redesign handling of SIGTERM/control-C in parallel pg_dump/pg_restore.
Formerly, Unix builds of pg_dump/pg_restore would trap SIGINT and similar signals and set a flag that was tested in various data-transfer loops. This was prone to errors of omission (cf commit 3c8aa66); and even if the client-side response was prompt, we did nothing that would cause long-running SQL commands (e.g. CREATE INDEX) to terminate early. Also, the master process would effectively do nothing at all upon receipt of SIGINT; the only reason it seemed to work was that in typical scenarios the signal would also be delivered to the child processes. We should support termination when a signal is delivered only to the master process, though. Windows builds had no console interrupt handler, so they would just fall over immediately at control-C, again leaving long-running SQL commands to finish unmolested. To fix, remove the flag-checking approach altogether. Instead, allow the Unix signal handler to send a cancel request directly and then exit(1). In the master process, also have it forward the signal to the children. On Windows, add a console interrupt handler that behaves approximately the same. The main difference is that a single execution of the Windows handler can send all the cancel requests since all the info is available in one process, whereas on Unix each process sends a cancel only for its own database connection. In passing, fix an old problem that DisconnectDatabase tends to send a cancel request before exiting a parallel worker, even if nothing went wrong. This is at least a waste of cycles, and could lead to unexpected log messages, or maybe even data loss if it happened in pg_restore (though in the current code the problem seems to affect only pg_dump). The cause was that after a COPY step, pg_dump was leaving libpq in PGASYNC_BUSY state, causing PQtransactionStatus() to report PQTRANS_ACTIVE. That's normally harmless because the next PQexec() will silently clear the PGASYNC_BUSY state; but in a parallel worker we might exit without any additional SQL commands after a COPY step. So add an extra PQgetResult() call after a COPY to allow libpq to return to PGASYNC_IDLE state. This is a bug fix, IMO, so back-patch to 9.3 where parallel dump/restore were introduced. Thanks to Kyotaro Horiguchi for Windows testing and code suggestions. Original-Patch: <7005.1464657274@sss.pgh.pa.us> Discussion: <20160602.174941.256342236.horiguchi.kyotaro@lab.ntt.co.jp>
1 parent 7392eed commit e652273

8 files changed

+505
-169
lines changed

src/bin/pg_dump/compress_io.c

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@
5454
#include "postgres_fe.h"
5555

5656
#include "compress_io.h"
57-
#include "parallel.h"
5857
#include "pg_backup_utils.h"
5958

6059
/*----------------------
@@ -184,9 +183,6 @@ void
184183
WriteDataToArchive(ArchiveHandle *AH, CompressorState *cs,
185184
const void *data, size_t dLen)
186185
{
187-
/* Are we aborting? */
188-
checkAborting(AH);
189-
190186
switch (cs->comprAlg)
191187
{
192188
case COMPR_ALG_LIBZ:
@@ -351,9 +347,6 @@ ReadDataFromArchiveZlib(ArchiveHandle *AH, ReadFunc readF)
351347
/* no minimal chunk size for zlib */
352348
while ((cnt = readF(AH, &buf, &buflen)))
353349
{
354-
/* Are we aborting? */
355-
checkAborting(AH);
356-
357350
zp->next_in = (void *) buf;
358351
zp->avail_in = cnt;
359352

@@ -414,9 +407,6 @@ ReadDataFromArchiveNone(ArchiveHandle *AH, ReadFunc readF)
414407

415408
while ((cnt = readF(AH, &buf, &buflen)))
416409
{
417-
/* Are we aborting? */
418-
checkAborting(AH);
419-
420410
ahwrite(buf, 1, cnt, AH);
421411
}
422412

0 commit comments

Comments
 (0)