Skip to content

Commit 2edf14f

Browse files
committed
Try to read data from the socket in pqSendSome's write_failed paths.
Even when we've concluded that we have a hard write failure on the socket, we should continue to try to read data. This gives us an opportunity to collect any final error message that the backend might have sent before closing the connection; moreover it is the job of pqReadData not pqSendSome to close the socket once EOF is detected. Due to an oversight in 1f39a1c, pqSendSome failed to try to collect data in the case where we'd already set write_failed. The problem was masked for ordinary query operations (which really only make one write attempt anyway), but COPY to the server would continue to send data indefinitely after a mid-COPY connection loss. Hence, add pqReadData calls into the paths where pqSendSome drops data because of write_failed. If we've lost the connection, this will eventually result in closing the socket and setting CONNECTION_BAD, which will cause PQputline and siblings to report failure, allowing the application to terminate the COPY sooner. (Basically this restores what happened before 1f39a1c.) There are related issues that this does not solve; for example, if the backend sends an error but doesn't drop the connection, we did and still will keep pumping COPY data as long as the application sends it. Fixing that will require application-visible behavior changes though, and anyway it's an ancient behavior that we've had few complaints about. For now I'm just trying to fix the regression from 1f39a1c. Per a complaint from Andres Freund. Back-patch into v12 where 1f39a1c came in. Discussion: https://postgr.es/m/20200603201242.ofvm4jztpqytwfye@alap3.anarazel.de
1 parent a00222f commit 2edf14f

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

src/interfaces/libpq/fe-misc.c

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -825,6 +825,10 @@ pqReadData(PGconn *conn)
825825
* Return 0 on success, -1 on failure and 1 when not all data could be sent
826826
* because the socket would block and the connection is non-blocking.
827827
*
828+
* Note that this is also responsible for consuming data from the socket
829+
* (putting it in conn->inBuffer) in any situation where we can't send
830+
* all the specified data immediately.
831+
*
828832
* Upon write failure, conn->write_failed is set and the error message is
829833
* saved in conn->write_err_msg, but we clear the output buffer and return
830834
* zero anyway; this is because callers should soldier on until it's possible
@@ -844,12 +848,20 @@ pqSendSome(PGconn *conn, int len)
844848
* on that connection. Even if the kernel would let us, we've probably
845849
* lost message boundary sync with the server. conn->write_failed
846850
* therefore persists until the connection is reset, and we just discard
847-
* all data presented to be written.
851+
* all data presented to be written. However, as long as we still have a
852+
* valid socket, we should continue to absorb data from the backend, so
853+
* that we can collect any final error messages.
848854
*/
849855
if (conn->write_failed)
850856
{
851857
/* conn->write_err_msg should be set up already */
852858
conn->outCount = 0;
859+
/* Absorb input data if any, and detect socket closure */
860+
if (conn->sock != PGINVALID_SOCKET)
861+
{
862+
if (pqReadData(conn) < 0)
863+
return -1;
864+
}
853865
return 0;
854866
}
855867

@@ -919,6 +931,13 @@ pqSendSome(PGconn *conn, int len)
919931

920932
/* Discard queued data; no chance it'll ever be sent */
921933
conn->outCount = 0;
934+
935+
/* Absorb input data if any, and detect socket closure */
936+
if (conn->sock != PGINVALID_SOCKET)
937+
{
938+
if (pqReadData(conn) < 0)
939+
return -1;
940+
}
922941
return 0;
923942
}
924943
}

0 commit comments

Comments
 (0)