|
| 1 | +From owner-pgsql-hackers@hub.org Sun Jan 23 13:31:03 2000 |
| 2 | +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) |
| 3 | + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA28482 |
| 4 | + for <pgman@candle.pha.pa.us>; Sun, 23 Jan 2000 13:31:01 -0500 (EST) |
| 5 | +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id NAA08409 for <pgman@candle.pha.pa.us>; Sun, 23 Jan 2000 13:04:34 -0500 (EST) |
| 6 | +Received: from localhost (majordom@localhost) |
| 7 | + by hub.org (8.9.3/8.9.3) with SMTP id MAA65651; |
| 8 | + Sun, 23 Jan 2000 12:57:33 -0500 (EST) |
| 9 | + (envelope-from owner-pgsql-hackers) |
| 10 | +Received: by hub.org (bulk_mailer v1.5); Sun, 23 Jan 2000 12:57:20 -0500 |
| 11 | +Received: (from majordom@localhost) |
| 12 | + by hub.org (8.9.3/8.9.3) id MAA65548 |
| 13 | + for pgsql-hackers-outgoing; Sun, 23 Jan 2000 12:56:20 -0500 (EST) |
| 14 | + (envelope-from owner-pgsql-hackers@postgreSQL.org) |
| 15 | +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) |
| 16 | + by hub.org (8.9.3/8.9.3) with ESMTP id MAA65492 |
| 17 | + for <pgsql-hackers@postgreSQL.org>; Sun, 23 Jan 2000 12:55:41 -0500 (EST) |
| 18 | + (envelope-from tgl@sss.pgh.pa.us) |
| 19 | +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) |
| 20 | + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA06211; |
| 21 | + Sun, 23 Jan 2000 12:55:36 -0500 (EST) |
| 22 | +To: Alfred Perlstein <bright@wintelcom.net> |
| 23 | +cc: pgsql-hackers@postgreSQL.org |
| 24 | +Subject: Re: pg_dump possible fix, need testers. (was: Re: [HACKERS] pg_dump disaster) |
| 25 | +In-reply-to: <20000123022341.J26520@fw.wintelcom.net> |
| 26 | +References: <20000122211427.C26520@fw.wintelcom.net> <200001230525.AAA08020@candle.pha.pa.us> <20000122220256.H26520@fw.wintelcom.net> <5120.948606837@sss.pgh.pa.us> <20000123022341.J26520@fw.wintelcom.net> |
| 27 | +Comments: In-reply-to Alfred Perlstein <bright@wintelcom.net> |
| 28 | + message dated "Sun, 23 Jan 2000 02:23:41 -0800" |
| 29 | +Date: Sun, 23 Jan 2000 12:55:36 -0500 |
| 30 | +Message-ID: <6208.948650136@sss.pgh.pa.us> |
| 31 | +From: Tom Lane <tgl@sss.pgh.pa.us> |
| 32 | +Sender: owner-pgsql-hackers@postgreSQL.org |
| 33 | +Status: ORr |
| 34 | + |
| 35 | +>> Um, I didn't have any trouble at all reproducing Patrick's complaint. |
| 36 | +>> pg_dump any moderately large table (I used tenk1 from the regress |
| 37 | +>> database) and try to load the script with psql. Kaboom. |
| 38 | + |
| 39 | +> This is after or before my latest patch? |
| 40 | + |
| 41 | +Before. I haven't updated since yesterday... |
| 42 | + |
| 43 | +> I can't seem to reproduce this problem, |
| 44 | + |
| 45 | +Odd. Maybe there is something different about the kernel's timing of |
| 46 | +message sending on your platform. I see it very easily on HPUX 10.20, |
| 47 | +and Patrick sees it very easily on whatever he's using (netbsd I think). |
| 48 | +You might try varying the situation a little, say |
| 49 | + psql mydb <dumpfile |
| 50 | + psql -f dumpfile mydb |
| 51 | + psql mydb |
| 52 | + \i dumpfile |
| 53 | +and the same with -h localhost (to get a TCP/IP connection instead of |
| 54 | +Unix domain). At the moment (pre-patch) I see failures with the |
| 55 | +first two of these, but not with the \i method. -h doesn't seem to |
| 56 | +matter for me, but it might for you. |
| 57 | + |
| 58 | +> Telling me something is wrong without giving suggestions on how |
| 59 | +> to fix it, nor direct pointers to where it fails doesn't help me |
| 60 | +> one bit. You're not offering constructive critism, you're not |
| 61 | +> even offering valid critism, you're just waving your finger at |
| 62 | +> "problems" that you say exist but don't pin down to anything specific. |
| 63 | + |
| 64 | +I have been explaining it as clearly as I could. Let's try it |
| 65 | +one more time. |
| 66 | + |
| 67 | +> I spent hours looking over what I did to pqFlush and pqPutnBytes |
| 68 | +> because of what you said earlier when all the bug seems to have |
| 69 | +> come down to is that I missed that the socket is set to non-blocking |
| 70 | +> in all cases now. |
| 71 | + |
| 72 | +Letting the socket mode default to blocking will hide the problems from |
| 73 | +existing clients that don't care about non-block mode. But people who |
| 74 | +try to actually use the nonblock mode are going to see the same kinds of |
| 75 | +problems that psql is exhibiting. |
| 76 | + |
| 77 | +> The old sequence of events that happened was as follows: |
| 78 | + |
| 79 | +> user sends data almost filling the output buffer... |
| 80 | +> user sends another line of text overflowing the buffer... |
| 81 | +> pqFlush is invoked blocking the user until the output pipe clears... |
| 82 | +> and repeat. |
| 83 | + |
| 84 | +Right. |
| 85 | + |
| 86 | +> The nonblocking code allows sends to fail so the user can abort |
| 87 | +> sending stuff to the backend in order to process other work: |
| 88 | + |
| 89 | +> user sends data almost filling the output buffer... |
| 90 | +> user sends another line of text that may overflow the buffer... |
| 91 | +> pqFlush is invoked, |
| 92 | +> if the pipe can't be cleared an error is returned allowing the user to |
| 93 | +> retry the send later. |
| 94 | +> if the flush succeeds then more data is queued and success is returned |
| 95 | + |
| 96 | +But you haven't thought through the mechanics of the "error is returned |
| 97 | +allowing the user to retry" code path clearly enough. Let's take |
| 98 | +pqPutBytes for an example. If it returns EOF, is that a hard error or |
| 99 | +does it just mean that the application needs to wait a while? The |
| 100 | +application *must* distinguish these cases, or it will do the wrong |
| 101 | +thing: for example, if it mistakes a hard error for "wait a while", |
| 102 | +then it will wait forever without making any progress or producing |
| 103 | +an error report. |
| 104 | + |
| 105 | +You need to provide a different return convention that indicates |
| 106 | +what happened, say |
| 107 | + EOF (-1) => hard error (same as old code) |
| 108 | + 0 => OK |
| 109 | + 1 => no data was queued due to risk of blocking |
| 110 | +And you need to guarantee that the application knows what the state is |
| 111 | +when the can't-do-it-yet return is made; note that I specified "no data |
| 112 | +was queued" above. If pqPutBytes might queue some of the data before |
| 113 | +returning 1, the application is in trouble again. While you apparently |
| 114 | +foresaw that in recoding pqPutBytes, your code doesn't actually work. |
| 115 | +There is the minor code bug that you fail to update "avail" after the |
| 116 | +first pqFlush call, and the much more fundamental problem that you |
| 117 | +cannot guarantee to have queued all or none of the data. Think about |
| 118 | +what happens if the passed nbytes is larger than the output buffer size. |
| 119 | +You may pass the first pqFlush successfully, then get into the loop and |
| 120 | +get a won't-block return from pqFlush in the loop. What then? |
| 121 | +You can't simply refuse to support the case nbytes > bufsize at all, |
| 122 | +because that will cause application failures as well (too long query |
| 123 | +sends it into an infinite loop trying to queue data, most likely). |
| 124 | + |
| 125 | +A possible answer is to specify that a return of +N means "N bytes |
| 126 | +remain unqueued due to risk of blocking" (after having queued as much |
| 127 | +as you could). This would put the onus on the caller to update his |
| 128 | +pointers/counts properly; propagating that into all the internal uses |
| 129 | +of pqPutBytes would be no fun. (Of course, so far you haven't updated |
| 130 | +*any* of the internal callers to behave reasonably in case of a |
| 131 | +won't-block return; PQfn is just one example.) |
| 132 | + |
| 133 | +Another possible answer is to preserve pqPutBytes' old API, "queue or |
| 134 | +bust", by the expedient of enlarging the output buffer to hold whatever |
| 135 | +we can't send immediately. This is probably more attractive, even |
| 136 | +though a long query might suck up a lot of space that won't get |
| 137 | +reclaimed as long as the connection lives. If you don't do this then |
| 138 | +you are going to have to make a lot of ugly changes in the internal |
| 139 | +callers to deal with won't-block returns. Actually, a bulk COPY IN |
| 140 | +would probably be the worst case --- the app could easily load data into |
| 141 | +the buffer far faster than it could be sent. It might be best to extend |
| 142 | +PQputline to have a three-way return and add code there to limit the |
| 143 | +growth of the output buffer, while allowing all internal callers to |
| 144 | +assume that the buffer is expanded when they need it. |
| 145 | + |
| 146 | +pqFlush has the same kind of interface design problem: the same EOF code |
| 147 | +is returned for either a hard error or can't-flush-yet, but it would be |
| 148 | +disastrous to treat those cases alike. You must provide a 3-way return |
| 149 | +code. |
| 150 | + |
| 151 | +Furthermore, the same sort of 3-way return code convention will have to |
| 152 | +propagate out through anything that calls pqFlush (with corresponding |
| 153 | +documentation updates). pqPutBytes can be made to hide a pqFlush won't- |
| 154 | +block return by trying to enlarge the output buffer, but in most other |
| 155 | +places you won't have a choice except to punt it back to the caller. |
| 156 | + |
| 157 | +PQendcopy has the same interface design problem. It used to be that |
| 158 | +(unless you passed a null pointer) PQendcopy would *guarantee* that |
| 159 | +the connection was no longer in COPY state on return --- by resetting |
| 160 | +it, if necessary. So the return code was mainly informative; the |
| 161 | +application didn't have to do anything different if PQendcopy reported |
| 162 | +failure. But now, a nonblocking application does need to pay attention |
| 163 | +to whether PQendcopy completed or not --- and you haven't provided a way |
| 164 | +for it to tell. If 1 is returned, the connection might still be in |
| 165 | +COPY state, or it might not (PQendcopy might have reset it). If the |
| 166 | +application doesn't distinguish these cases then it will fail. |
| 167 | + |
| 168 | +I also think that you want to take a hard look at the automatic "reset" |
| 169 | +behavior upon COPY failure, since a PQreset call will block the |
| 170 | +application until it finishes. Really, what is needed to close down a |
| 171 | +COPY safely in nonblock mode is a pair of entry points along the line of |
| 172 | +"PQendcopyStart" and "PQendcopyPoll", with API conventions similar to |
| 173 | +PQresetStart/PQresetPoll. This gives you the ability to do the reset |
| 174 | +(if one is necessary) without blocking the application. PQendcopy |
| 175 | +itself will only be useful to blocking applications. |
| 176 | + |
| 177 | +> I'm sorry if they don't work for some situations other than COPY IN, |
| 178 | +> but it's functionality that I needed and I expect to be expanded on |
| 179 | +> by myself and others that take interest in nonblocking operation. |
| 180 | + |
| 181 | +I don't think that the nonblock code is anywhere near production quality |
| 182 | +at this point. It may work for you, if you don't stress it too hard and |
| 183 | +never have a communications failure; but I don't want to see us ship it |
| 184 | +as part of Postgres unless these issues get addressed. |
| 185 | + |
| 186 | + regards, tom lane |
| 187 | + |
| 188 | +************ |
| 189 | + |
0 commit comments