Skip to content

Commit d2c2551

Browse files
committed
Add file.
1 parent dd47964 commit d2c2551

File tree

1 file changed

+189
-0
lines changed

1 file changed

+189
-0
lines changed

doc/TODO.detail/async

Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
From owner-pgsql-hackers@hub.org Sun Jan 23 13:31:03 2000
2+
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
3+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA28482
4+
for <pgman@candle.pha.pa.us>; Sun, 23 Jan 2000 13:31:01 -0500 (EST)
5+
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id NAA08409 for <pgman@candle.pha.pa.us>; Sun, 23 Jan 2000 13:04:34 -0500 (EST)
6+
Received: from localhost (majordom@localhost)
7+
by hub.org (8.9.3/8.9.3) with SMTP id MAA65651;
8+
Sun, 23 Jan 2000 12:57:33 -0500 (EST)
9+
(envelope-from owner-pgsql-hackers)
10+
Received: by hub.org (bulk_mailer v1.5); Sun, 23 Jan 2000 12:57:20 -0500
11+
Received: (from majordom@localhost)
12+
by hub.org (8.9.3/8.9.3) id MAA65548
13+
for pgsql-hackers-outgoing; Sun, 23 Jan 2000 12:56:20 -0500 (EST)
14+
(envelope-from owner-pgsql-hackers@postgreSQL.org)
15+
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
16+
by hub.org (8.9.3/8.9.3) with ESMTP id MAA65492
17+
for <pgsql-hackers@postgreSQL.org>; Sun, 23 Jan 2000 12:55:41 -0500 (EST)
18+
(envelope-from tgl@sss.pgh.pa.us)
19+
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
20+
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA06211;
21+
Sun, 23 Jan 2000 12:55:36 -0500 (EST)
22+
To: Alfred Perlstein <bright@wintelcom.net>
23+
cc: pgsql-hackers@postgreSQL.org
24+
Subject: Re: pg_dump possible fix, need testers. (was: Re: [HACKERS] pg_dump disaster)
25+
In-reply-to: <20000123022341.J26520@fw.wintelcom.net>
26+
References: <20000122211427.C26520@fw.wintelcom.net> <200001230525.AAA08020@candle.pha.pa.us> <20000122220256.H26520@fw.wintelcom.net> <5120.948606837@sss.pgh.pa.us> <20000123022341.J26520@fw.wintelcom.net>
27+
Comments: In-reply-to Alfred Perlstein <bright@wintelcom.net>
28+
message dated "Sun, 23 Jan 2000 02:23:41 -0800"
29+
Date: Sun, 23 Jan 2000 12:55:36 -0500
30+
Message-ID: <6208.948650136@sss.pgh.pa.us>
31+
From: Tom Lane <tgl@sss.pgh.pa.us>
32+
Sender: owner-pgsql-hackers@postgreSQL.org
33+
Status: ORr
34+
35+
>> Um, I didn't have any trouble at all reproducing Patrick's complaint.
36+
>> pg_dump any moderately large table (I used tenk1 from the regress
37+
>> database) and try to load the script with psql. Kaboom.
38+
39+
> This is after or before my latest patch?
40+
41+
Before. I haven't updated since yesterday...
42+
43+
> I can't seem to reproduce this problem,
44+
45+
Odd. Maybe there is something different about the kernel's timing of
46+
message sending on your platform. I see it very easily on HPUX 10.20,
47+
and Patrick sees it very easily on whatever he's using (netbsd I think).
48+
You might try varying the situation a little, say
49+
psql mydb <dumpfile
50+
psql -f dumpfile mydb
51+
psql mydb
52+
\i dumpfile
53+
and the same with -h localhost (to get a TCP/IP connection instead of
54+
Unix domain). At the moment (pre-patch) I see failures with the
55+
first two of these, but not with the \i method. -h doesn't seem to
56+
matter for me, but it might for you.
57+
58+
> Telling me something is wrong without giving suggestions on how
59+
> to fix it, nor direct pointers to where it fails doesn't help me
60+
> one bit. You're not offering constructive critism, you're not
61+
> even offering valid critism, you're just waving your finger at
62+
> "problems" that you say exist but don't pin down to anything specific.
63+
64+
I have been explaining it as clearly as I could. Let's try it
65+
one more time.
66+
67+
> I spent hours looking over what I did to pqFlush and pqPutnBytes
68+
> because of what you said earlier when all the bug seems to have
69+
> come down to is that I missed that the socket is set to non-blocking
70+
> in all cases now.
71+
72+
Letting the socket mode default to blocking will hide the problems from
73+
existing clients that don't care about non-block mode. But people who
74+
try to actually use the nonblock mode are going to see the same kinds of
75+
problems that psql is exhibiting.
76+
77+
> The old sequence of events that happened was as follows:
78+
79+
> user sends data almost filling the output buffer...
80+
> user sends another line of text overflowing the buffer...
81+
> pqFlush is invoked blocking the user until the output pipe clears...
82+
> and repeat.
83+
84+
Right.
85+
86+
> The nonblocking code allows sends to fail so the user can abort
87+
> sending stuff to the backend in order to process other work:
88+
89+
> user sends data almost filling the output buffer...
90+
> user sends another line of text that may overflow the buffer...
91+
> pqFlush is invoked,
92+
> if the pipe can't be cleared an error is returned allowing the user to
93+
> retry the send later.
94+
> if the flush succeeds then more data is queued and success is returned
95+
96+
But you haven't thought through the mechanics of the "error is returned
97+
allowing the user to retry" code path clearly enough. Let's take
98+
pqPutBytes for an example. If it returns EOF, is that a hard error or
99+
does it just mean that the application needs to wait a while? The
100+
application *must* distinguish these cases, or it will do the wrong
101+
thing: for example, if it mistakes a hard error for "wait a while",
102+
then it will wait forever without making any progress or producing
103+
an error report.
104+
105+
You need to provide a different return convention that indicates
106+
what happened, say
107+
EOF (-1) => hard error (same as old code)
108+
0 => OK
109+
1 => no data was queued due to risk of blocking
110+
And you need to guarantee that the application knows what the state is
111+
when the can't-do-it-yet return is made; note that I specified "no data
112+
was queued" above. If pqPutBytes might queue some of the data before
113+
returning 1, the application is in trouble again. While you apparently
114+
foresaw that in recoding pqPutBytes, your code doesn't actually work.
115+
There is the minor code bug that you fail to update "avail" after the
116+
first pqFlush call, and the much more fundamental problem that you
117+
cannot guarantee to have queued all or none of the data. Think about
118+
what happens if the passed nbytes is larger than the output buffer size.
119+
You may pass the first pqFlush successfully, then get into the loop and
120+
get a won't-block return from pqFlush in the loop. What then?
121+
You can't simply refuse to support the case nbytes > bufsize at all,
122+
because that will cause application failures as well (too long query
123+
sends it into an infinite loop trying to queue data, most likely).
124+
125+
A possible answer is to specify that a return of +N means "N bytes
126+
remain unqueued due to risk of blocking" (after having queued as much
127+
as you could). This would put the onus on the caller to update his
128+
pointers/counts properly; propagating that into all the internal uses
129+
of pqPutBytes would be no fun. (Of course, so far you haven't updated
130+
*any* of the internal callers to behave reasonably in case of a
131+
won't-block return; PQfn is just one example.)
132+
133+
Another possible answer is to preserve pqPutBytes' old API, "queue or
134+
bust", by the expedient of enlarging the output buffer to hold whatever
135+
we can't send immediately. This is probably more attractive, even
136+
though a long query might suck up a lot of space that won't get
137+
reclaimed as long as the connection lives. If you don't do this then
138+
you are going to have to make a lot of ugly changes in the internal
139+
callers to deal with won't-block returns. Actually, a bulk COPY IN
140+
would probably be the worst case --- the app could easily load data into
141+
the buffer far faster than it could be sent. It might be best to extend
142+
PQputline to have a three-way return and add code there to limit the
143+
growth of the output buffer, while allowing all internal callers to
144+
assume that the buffer is expanded when they need it.
145+
146+
pqFlush has the same kind of interface design problem: the same EOF code
147+
is returned for either a hard error or can't-flush-yet, but it would be
148+
disastrous to treat those cases alike. You must provide a 3-way return
149+
code.
150+
151+
Furthermore, the same sort of 3-way return code convention will have to
152+
propagate out through anything that calls pqFlush (with corresponding
153+
documentation updates). pqPutBytes can be made to hide a pqFlush won't-
154+
block return by trying to enlarge the output buffer, but in most other
155+
places you won't have a choice except to punt it back to the caller.
156+
157+
PQendcopy has the same interface design problem. It used to be that
158+
(unless you passed a null pointer) PQendcopy would *guarantee* that
159+
the connection was no longer in COPY state on return --- by resetting
160+
it, if necessary. So the return code was mainly informative; the
161+
application didn't have to do anything different if PQendcopy reported
162+
failure. But now, a nonblocking application does need to pay attention
163+
to whether PQendcopy completed or not --- and you haven't provided a way
164+
for it to tell. If 1 is returned, the connection might still be in
165+
COPY state, or it might not (PQendcopy might have reset it). If the
166+
application doesn't distinguish these cases then it will fail.
167+
168+
I also think that you want to take a hard look at the automatic "reset"
169+
behavior upon COPY failure, since a PQreset call will block the
170+
application until it finishes. Really, what is needed to close down a
171+
COPY safely in nonblock mode is a pair of entry points along the line of
172+
"PQendcopyStart" and "PQendcopyPoll", with API conventions similar to
173+
PQresetStart/PQresetPoll. This gives you the ability to do the reset
174+
(if one is necessary) without blocking the application. PQendcopy
175+
itself will only be useful to blocking applications.
176+
177+
> I'm sorry if they don't work for some situations other than COPY IN,
178+
> but it's functionality that I needed and I expect to be expanded on
179+
> by myself and others that take interest in nonblocking operation.
180+
181+
I don't think that the nonblock code is anywhere near production quality
182+
at this point. It may work for you, if you don't stress it too hard and
183+
never have a communications failure; but I don't want to see us ship it
184+
as part of Postgres unless these issues get addressed.
185+
186+
regards, tom lane
187+
188+
************
189+

0 commit comments

Comments
 (0)