@@ -7,7 +7,7 @@ actual output plan, the /path code generates all possible ways to join the
7
7
tables, and /prep handles special cases like inheritance. /util is utility
8
8
stuff. /geqo is the separate "genetic optimization" planner --- it does
9
9
a semi-random search through the join tree space, rather than exhaustively
10
- considering all possible join trees. (But each join considered by geqo
10
+ considering all possible join trees. (But each join considered by / geqo
11
11
is given to /path to create paths for, so we consider all possible
12
12
implementation paths for each specific join even in GEQO mode.)
13
13
@@ -40,7 +40,7 @@ the WHERE clause "tab1.col1 = tab2.col1" generates a JoinInfo for tab1
40
40
listing tab2 as an unjoined relation, and also one for tab2 showing tab1
41
41
as an unjoined relation.
42
42
43
- If we have only a single base relation in the query, we are done here .
43
+ If we have only a single base relation in the query, we are done now .
44
44
Otherwise we have to figure out how to join the base relations into a
45
45
single join relation.
46
46
@@ -225,5 +225,185 @@ way, the next level up will have the maximum freedom to build mergejoins
225
225
without sorting, since it can pick from any of the paths retained for its
226
226
inputs.
227
227
228
- See path/pathkeys.c for an explanation of the PathKeys data structure that
229
- represents what is known about the sort order of a particular Path.
228
+
229
+ PathKeys
230
+ --------
231
+
232
+ The PathKeys data structure represents what is known about the sort order
233
+ of a particular Path.
234
+
235
+ Path.pathkeys is a List of Lists of PathKeyItem nodes that represent
236
+ the sort order of the result generated by the Path. The n'th sublist
237
+ represents the n'th sort key of the result.
238
+
239
+ In single/base relation RelOptInfo's, the Paths represent various ways
240
+ of scanning the relation and the resulting ordering of the tuples.
241
+ Sequential scan Paths have NIL pathkeys, indicating no known ordering.
242
+ Index scans have Path.pathkeys that represent the chosen index's ordering,
243
+ if any. A single-key index would create a pathkey with a single sublist,
244
+ e.g. ( (tab1.indexkey1/sortop1) ). A multi-key index generates a sublist
245
+ per key, e.g. ( (tab1.indexkey1/sortop1) (tab1.indexkey2/sortop2) ) which
246
+ shows major sort by indexkey1 (ordering by sortop1) and minor sort by
247
+ indexkey2 with sortop2.
248
+
249
+ Note that a multi-pass indexscan (OR clause scan) has NIL pathkeys since
250
+ we can say nothing about the overall order of its result. Also, an
251
+ indexscan on an unordered type of index generates NIL pathkeys. However,
252
+ we can always create a pathkey by doing an explicit sort. The pathkeys
253
+ for a Sort plan's output just represent the sort key fields and the
254
+ ordering operators used.
255
+
256
+ Things get more interesting when we consider joins. Suppose we do a
257
+ mergejoin between A and B using the mergeclause A.X = B.Y. The output
258
+ of the mergejoin is sorted by X --- but it is also sorted by Y. We
259
+ represent this fact by listing both keys in a single pathkey sublist:
260
+ ( (A.X/xsortop B.Y/ysortop) ). This pathkey asserts that the major
261
+ sort order of the Path can be taken to be *either* A.X or B.Y.
262
+ They are equal, so they are both primary sort keys. By doing this,
263
+ we allow future joins to use either var as a pre-sorted key, so upper
264
+ Mergejoins may be able to avoid having to re-sort the Path. This is
265
+ why pathkeys is a List of Lists.
266
+
267
+ We keep a sortop associated with each PathKeyItem because cross-data-type
268
+ mergejoins are possible; for example int4 = int8 is mergejoinable.
269
+ In this case we need to remember that the left var is ordered by int4lt
270
+ while the right var is ordered by int8lt. So the different members of
271
+ each sublist could have different sortops.
272
+
273
+ Note that while the order of the top list is meaningful (primary vs.
274
+ secondary sort key), the order of each sublist is arbitrary. Each sublist
275
+ should be regarded as a set of equivalent keys, with no significance
276
+ to the list order.
277
+
278
+ With a little further thought, it becomes apparent that pathkeys for
279
+ joins need not only come from mergejoins. For example, if we do a
280
+ nestloop join between outer relation A and inner relation B, then any
281
+ pathkeys relevant to A are still valid for the join result: we have
282
+ not altered the order of the tuples from A. Even more interesting,
283
+ if there was a mergeclause (more formally, an "equijoin clause") A.X=B.Y,
284
+ and A.X was a pathkey for the outer relation A, then we can assert that
285
+ B.Y is a pathkey for the join result; X was ordered before and still is,
286
+ and the joined values of Y are equal to the joined values of X, so Y
287
+ must now be ordered too. This is true even though we used neither an
288
+ explicit sort nor a mergejoin on Y.
289
+
290
+ More generally, whenever we have an equijoin clause A.X = B.Y and a
291
+ pathkey A.X, we can add B.Y to that pathkey if B is part of the joined
292
+ relation the pathkey is for, *no matter how we formed the join*. It works
293
+ as long as the clause has been applied at some point while forming the
294
+ join relation. (In the current implementation, we always apply qual
295
+ clauses as soon as possible, ie, as far down in the plan tree as possible.
296
+ So we can always make this deduction. If we postponed filtering by qual
297
+ clauses then we'd not be able to assume pathkey equivalence until after
298
+ the equality check(s) had been applied.)
299
+
300
+ In short, then: when producing the pathkeys for a merge or nestloop join,
301
+ we can keep all of the keys of the outer path, since the ordering of the
302
+ outer path will be preserved in the result. Furthermore, we can add to
303
+ each pathkey sublist any inner vars that are equijoined to any of the
304
+ outer vars in the sublist; this works regardless of whether we are
305
+ implementing the join using that equijoin clause as a mergeclause,
306
+ or merely enforcing the clause after-the-fact as a qpqual filter.
307
+
308
+ Although Hashjoins also work only with equijoin operators, it is *not*
309
+ safe to consider the output of a Hashjoin to be sorted in any particular
310
+ order --- not even the outer path's order. This is true because the
311
+ executor might have to split the join into multiple batches. Therefore
312
+ a Hashjoin is always given NIL pathkeys. (Also, we need to use only
313
+ mergejoinable operators when deducing which inner vars are now sorted,
314
+ because a mergejoin operator tells us which left- and right-datatype
315
+ sortops can be considered equivalent, whereas a hashjoin operator
316
+ doesn't imply anything about sort order.)
317
+
318
+ Pathkeys are also useful to represent an ordering that we wish to achieve,
319
+ since they are easily compared to the pathkeys of a potential candidate
320
+ path. So, SortClause lists are turned into pathkeys lists for use inside
321
+ the optimizer.
322
+
323
+ OK, now for how it *really* works:
324
+
325
+ We did implement pathkeys just as described above, and found that the
326
+ planner spent a huge amount of time comparing pathkeys, because the
327
+ representation of pathkeys as unordered lists made it expensive to decide
328
+ whether two were equal or not. So, we've modified the representation
329
+ as described next.
330
+
331
+ If we scan the WHERE clause for equijoin clauses (mergejoinable clauses)
332
+ during planner startup, we can construct lists of equivalent pathkey items
333
+ for the query. There could be more than two items per equivalence set;
334
+ for example, WHERE A.X = B.Y AND B.Y = C.Z AND D.R = E.S creates the
335
+ equivalence sets { A.X B.Y C.Z } and { D.R E.S } (plus associated sortops).
336
+ Any pathkey item that belongs to an equivalence set implies that all the
337
+ other items in its set apply to the relation too, or at least all the ones
338
+ that are for fields present in the relation. (Some of the items in the
339
+ set might be for as-yet-unjoined relations.) Furthermore, any multi-item
340
+ pathkey sublist that appears at any stage of planning the query *must* be
341
+ a subset of one or another of these equivalence sets; there's no way we'd
342
+ have put two items in the same pathkey sublist unless they were equijoined
343
+ in WHERE.
344
+
345
+ Now suppose that we allow a pathkey sublist to contain pathkey items for
346
+ vars that are not yet part of the pathkey's relation. This introduces
347
+ no logical difficulty, because such items can easily be seen to be
348
+ irrelevant; we just mandate that they be ignored. But having allowed
349
+ this, we can declare (by fiat) that any multiple-item pathkey sublist
350
+ must be "equal()" to the appropriate equivalence set. In effect,
351
+ whenever we make a pathkey sublist that mentions any var appearing in an
352
+ equivalence set, we instantly add all the other vars equivalenced to it,
353
+ whether they appear yet in the pathkey's relation or not. And we also
354
+ mandate that the pathkey sublist appear in the same order as the
355
+ equivalence set it comes from. (In practice, we simply return a pointer
356
+ to the relevant equivalence set without building any new sublist at all.
357
+ Each equivalence set becomes a "canonical pathkey" for all its members.)
358
+ This makes comparing pathkeys very simple and fast, and saves a lot of
359
+ work and memory space for pathkey construction as well.
360
+
361
+ Note that pathkey sublists having just one item still exist, and are
362
+ not expected to be equal() to any equivalence set. This occurs when
363
+ we describe a sort order that involves a var that's not mentioned in
364
+ any equijoin clause of the WHERE. We could add singleton sets containing
365
+ such vars to the query's list of equivalence sets, but there's little
366
+ point in doing so.
367
+
368
+ By the way, it's OK and even useful for us to build equivalence sets
369
+ that mention multiple vars from the same relation. For example, if
370
+ we have WHERE A.X = A.Y and we are scanning A using an index on X,
371
+ we can legitimately conclude that the path is sorted by Y as well;
372
+ and this could be handy if Y is the variable used in other join clauses
373
+ or ORDER BY. So, any WHERE clause with a mergejoinable operator can
374
+ contribute to an equivalence set, even if it's not a join clause.
375
+
376
+ As sketched so far, equijoin operators allow us to conclude that
377
+ A.X = B.Y and B.Y = C.Z together imply A.X = C.Z, even when different
378
+ datatypes are involved. What is not immediately obvious is that to use
379
+ the "canonical pathkey" representation, we *must* make this deduction.
380
+ An example (from a real bug in Postgres 7.0) is a mergejoin for a query
381
+ like
382
+ SELECT * FROM t1, t2 WHERE t1.f2 = t2.f3 AND t1.f1 = t2.f3;
383
+ The canonical-pathkey mechanism is able to deduce that t1.f1 = t1.f2
384
+ (ie, both appear in the same canonical pathkey set). If we sort t1
385
+ and then apply a mergejoin, we *must* filter the t1 tuples using the
386
+ implied qualification f1 = f2, because otherwise the output of the sort
387
+ will be ordered by f1 or f2 (whichever we sort on) but not both. The
388
+ merge will then fail since (depending on which qual clause it applies
389
+ first) it's expecting either ORDER BY f1,f2 or ORDER BY f2,f1, but the
390
+ actual output of the sort has neither of these orderings. The best fix
391
+ for this is to generate all the implied equality constraints for each
392
+ equijoin set and add these clauses to the query's qualification list.
393
+ In other words, we *explicitly* deduce f1 = f2 and add this to the WHERE
394
+ clause. The constraint will be applied as a qpqual to the output of the
395
+ scan on t1, resulting in sort output that is indeed ordered by both vars.
396
+ This approach provides more information to the selectivity estimation
397
+ code than it would otherwise have, and reduces the number of tuples
398
+ processed in join stages, so it's a win to make these deductions even
399
+ if we weren't forced to.
400
+
401
+ Yet another implication of all this is that mergejoinable operators
402
+ must form closed equivalence sets. For example, if "int2 = int4"
403
+ and "int4 = int8" are both marked mergejoinable, then there had better
404
+ be a mergejoinable "int2 = int8" operator as well. Otherwise, when
405
+ we're given WHERE int2var = int4var AND int4var = int8var, we'll fail
406
+ while trying to create a representation of the implied clause
407
+ int2var = int8var.
408
+
409
+ -- bjm & tgl
0 commit comments