Skip to content

Commit e0db288

Browse files
committed
Fix libpq's implementation of per-host connection timeouts.
Commit 5f374fe attempted to turn the connect_timeout from an overall maximum time limit into a per-host limit, but it didn't do a great job of that. The timer would only get restarted if we actually detected timeout within connectDBComplete(), not if we changed our attention to a new host for some other reason. In that case the old timeout continued to run, possibly causing a premature timeout failure for the new host. Fix that, and also tweak the logic so that if we do get a timeout, we advance to the next available IP address, not to the next host name. There doesn't seem to be a good reason to assume that all the IP addresses supplied for a given host name will necessarily fail the same way as the current one. Moreover, this conforms better to the admittedly-vague documentation statement that the timeout is "per connection attempt". I changed that to "per host name or IP address" to be clearer. (Note that reconnections to the same server, such as for switching protocol version or SSL status, don't get their own separate timeout; that was true before and remains so.) Also clarify documentation about the interpretation of connect_timeout values less than 2. This seems like a bug, so back-patch to v10 where this logic came in. Tom Lane, reviewed by Fabien Coelho Discussion: https://postgr.es/m/5735.1533828184@sss.pgh.pa.us
1 parent 32b16d4 commit e0db288

File tree

2 files changed

+28
-18
lines changed

2 files changed

+28
-18
lines changed

doc/src/sgml/libpq.sgml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1110,11 +1110,12 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
11101110
<term><literal>connect_timeout</literal></term>
11111111
<listitem>
11121112
<para>
1113-
Maximum wait for connection, in seconds (write as a decimal integer
1114-
string). Zero or not specified means wait indefinitely. It is not
1115-
recommended to use a timeout of less than 2 seconds.
1116-
This timeout applies separately to each connection attempt.
1117-
For example, if you specify two hosts and <literal>connect_timeout</>
1113+
Maximum wait for connection, in seconds (write as a decimal integer,
1114+
e.g. <literal>10</literal>). Zero, negative, or not specified means
1115+
wait indefinitely. The minimum allowed timeout is 2 seconds, therefore
1116+
a value of <literal>1</literal> is interpreted as <literal>2</literal>.
1117+
This timeout applies separately to each host name or IP address.
1118+
For example, if you specify two hosts and <literal>connect_timeout</literal>
11181119
is 5, each host will time out if no connection is made within 5
11191120
seconds, so the total time spent waiting for a connection might be
11201121
up to 10 seconds.

src/interfaces/libpq/fe-connect.c

Lines changed: 22 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1904,6 +1904,8 @@ connectDBComplete(PGconn *conn)
19041904
PostgresPollingStatusType flag = PGRES_POLLING_WRITING;
19051905
time_t finish_time = ((time_t) -1);
19061906
int timeout = 0;
1907+
int last_whichhost = -2; /* certainly different from whichhost */
1908+
struct addrinfo *last_addr_cur = NULL;
19071909

19081910
if (conn == NULL || conn->status == CONNECTION_BAD)
19091911
return 0;
@@ -1917,19 +1919,34 @@ connectDBComplete(PGconn *conn)
19171919
if (timeout > 0)
19181920
{
19191921
/*
1920-
* Rounding could cause connection to fail; need at least 2 secs
1922+
* Rounding could cause connection to fail unexpectedly quickly;
1923+
* to prevent possibly waiting hardly-at-all, insist on at least
1924+
* two seconds.
19211925
*/
19221926
if (timeout < 2)
19231927
timeout = 2;
1924-
/* calculate the finish time based on start + timeout */
1925-
finish_time = time(NULL) + timeout;
19261928
}
19271929
}
19281930

19291931
for (;;)
19301932
{
19311933
int ret = 0;
19321934

1935+
/*
1936+
* (Re)start the connect_timeout timer if it's active and we are
1937+
* considering a different host than we were last time through. If
1938+
* we've already succeeded, though, needn't recalculate.
1939+
*/
1940+
if (flag != PGRES_POLLING_OK &&
1941+
timeout > 0 &&
1942+
(conn->whichhost != last_whichhost ||
1943+
conn->addr_cur != last_addr_cur))
1944+
{
1945+
finish_time = time(NULL) + timeout;
1946+
last_whichhost = conn->whichhost;
1947+
last_addr_cur = conn->addr_cur;
1948+
}
1949+
19331950
/*
19341951
* Wait, if necessary. Note that the initial state (just after
19351952
* PQconnectStart) is to wait for the socket to select for writing.
@@ -1974,18 +1991,10 @@ connectDBComplete(PGconn *conn)
19741991
if (ret == 1) /* connect_timeout elapsed */
19751992
{
19761993
/*
1977-
* Attempt connection to the next host, ignoring any remaining
1978-
* addresses for the current host.
1994+
* Give up on current server/address, try the next one.
19791995
*/
1980-
conn->try_next_addr = false;
1981-
conn->try_next_host = true;
1996+
conn->try_next_addr = true;
19821997
conn->status = CONNECTION_NEEDED;
1983-
1984-
/*
1985-
* Restart the connect_timeout timer for the new host.
1986-
*/
1987-
if (timeout > 0)
1988-
finish_time = time(NULL) + timeout;
19891998
}
19901999

19912000
/*

0 commit comments

Comments
 (0)