Skip to content

Commit 7ef507a

Browse files
committed
Further fixes for degenerate outer join clauses.
Further testing revealed that commit f69b4b9 was still a few bricks shy of a load: minor tweaking of the previous test cases resulted in the same wrong-outer-join-order problem coming back. After study I concluded that my previous changes in make_outerjoininfo() were just accidentally masking the problem, and should be reverted in favor of forcing syntactic join order whenever an upper outer join's predicate doesn't mention a lower outer join's LHS. This still allows the chained-outer-joins style that is the normally optimizable case. I also tightened things up some more in join_is_legal(). It seems to me on review that what's really happening in the exception case where we ignore a mismatched special join is that we're allowing the proposed join to associate into the RHS of the outer join we're comparing it to. As such, we should *always* insist that the proposed join be a left join, which eliminates a bunch of rather dubious argumentation. The case where we weren't enforcing that was the one that was already known buggy anyway (it had a violatable Assert before the aforesaid commit) so it hardly deserves a lot of deference. Back-patch to all active branches, like the previous patch. The added regression test case failed in all branches back to 9.1, and I think it's only an unrelated change in costing calculations that kept 9.0 from choosing a broken plan.
1 parent e72f211 commit 7ef507a

File tree

5 files changed

+153
-68
lines changed

5 files changed

+153
-68
lines changed

src/backend/optimizer/README

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -241,12 +241,23 @@ non-FULL joins can be freely associated into the lefthand side of an
241241
OJ, but in some cases they can't be associated into the righthand side.
242242
So the restriction enforced by join_is_legal is that a proposed join
243243
can't join a rel within or partly within an RHS boundary to one outside
244-
the boundary, unless the join validly implements some outer join.
245-
(To support use of identity 3, we have to allow cases where an apparent
246-
violation of a lower OJ's RHS is committed while forming an upper OJ.
247-
If this wouldn't in fact be legal, the upper OJ's minimum LHS or RHS
248-
set must be expanded to include the whole of the lower OJ, thereby
249-
preventing it from being formed before the lower OJ is.)
244+
the boundary, unless the proposed join is a LEFT join that can associate
245+
into the SpecialJoinInfo's RHS using identity 3.
246+
247+
The use of minimum Relid sets has some pitfalls; consider a query like
248+
A leftjoin (B leftjoin (C innerjoin D) on (Pbcd)) on Pa
249+
where Pa doesn't mention B/C/D at all. In this case a naive computation
250+
would give the upper leftjoin's min LHS as {A} and min RHS as {C,D} (since
251+
we know that the innerjoin can't associate out of the leftjoin's RHS, and
252+
enforce that by including its relids in the leftjoin's min RHS). And the
253+
lower leftjoin has min LHS of {B} and min RHS of {C,D}. Given such
254+
information, join_is_legal would think it's okay to associate the upper
255+
join into the lower join's RHS, transforming the query to
256+
B leftjoin (A leftjoin (C innerjoin D) on Pa) on (Pbcd)
257+
which yields totally wrong answers. We prevent that by forcing the min LHS
258+
for the upper join to include B. This is perhaps overly restrictive, but
259+
such cases don't arise often so it's not clear that it's worth developing a
260+
more complicated system.
250261

251262

252263
Pulling Up Subqueries

src/backend/optimizer/path/joinrels.c

Lines changed: 27 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
331331
SpecialJoinInfo *match_sjinfo;
332332
bool reversed;
333333
bool unique_ified;
334-
bool is_valid_inner;
334+
bool must_be_leftjoin;
335335
bool lateral_fwd;
336336
bool lateral_rev;
337337
ListCell *l;
@@ -346,12 +346,12 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
346346
/*
347347
* If we have any special joins, the proposed join might be illegal; and
348348
* in any case we have to determine its join type. Scan the join info
349-
* list for conflicts.
349+
* list for matches and conflicts.
350350
*/
351351
match_sjinfo = NULL;
352352
reversed = false;
353353
unique_ified = false;
354-
is_valid_inner = true;
354+
must_be_leftjoin = false;
355355

356356
foreach(l, root->join_info_list)
357357
{
@@ -402,7 +402,8 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
402402
* If one input contains min_lefthand and the other contains
403403
* min_righthand, then we can perform the SJ at this join.
404404
*
405-
* Barf if we get matches to more than one SJ (is that possible?)
405+
* Reject if we get matches to more than one SJ; that implies we're
406+
* considering something that's not really valid.
406407
*/
407408
if (bms_is_subset(sjinfo->min_lefthand, rel1->relids) &&
408409
bms_is_subset(sjinfo->min_righthand, rel2->relids))
@@ -470,58 +471,38 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
470471
/*
471472
* Otherwise, the proposed join overlaps the RHS but isn't a valid
472473
* implementation of this SJ. It might still be a legal join,
473-
* however, if it does not overlap the LHS. But we never allow
474-
* violations of the RHS of SEMI or ANTI joins. (In practice,
475-
* therefore, only LEFT joins ever allow RHS violation.)
474+
* however, if we're allowed to associate it into the RHS of this
475+
* SJ. That means this SJ must be a LEFT join (not SEMI or ANTI,
476+
* and certainly not FULL) and the proposed join must not overlap
477+
* the LHS.
476478
*/
477-
if (sjinfo->jointype == JOIN_SEMI ||
478-
sjinfo->jointype == JOIN_ANTI ||
479+
if (sjinfo->jointype != JOIN_LEFT ||
479480
bms_overlap(joinrelids, sjinfo->min_lefthand))
480481
return false; /* invalid join path */
481482

482-
/*----------
483-
* If both inputs overlap the RHS, assume that it's OK. Since the
484-
* inputs presumably got past this function's checks previously,
485-
* their violations of the RHS boundary must represent SJs that
486-
* have been determined to commute with this one.
487-
* We have to allow this to work correctly in cases like
488-
* (a LEFT JOIN (b JOIN (c LEFT JOIN d)))
489-
* when the c/d join has been determined to commute with the join
490-
* to a, and hence d is not part of min_righthand for the upper
491-
* join. It should be legal to join b to c/d but this will appear
492-
* as a violation of the upper join's RHS.
493-
*
494-
* Furthermore, if one input overlaps the RHS and the other does
495-
* not, we should still allow the join if it is a valid
496-
* implementation of some other SJ. We have to allow this to
497-
* support the associative identity
498-
* (a LJ b on Pab) LJ c ON Pbc = a LJ (b LJ c ON Pbc) on Pab
499-
* since joining B directly to C violates the lower SJ's RHS.
500-
* We assume that make_outerjoininfo() set things up correctly
501-
* so that we'll only match to some SJ if the join is valid.
502-
* Set flag here to check at bottom of loop.
503-
*----------
483+
/*
484+
* To be valid, the proposed join must be a LEFT join; otherwise
485+
* it can't associate into this SJ's RHS. But we may not yet have
486+
* found the SpecialJoinInfo matching the proposed join, so we
487+
* can't test that yet. Remember the requirement for later.
504488
*/
505-
if (bms_overlap(rel1->relids, sjinfo->min_righthand) &&
506-
bms_overlap(rel2->relids, sjinfo->min_righthand))
507-
{
508-
/* both overlap; assume OK */
509-
}
510-
else
511-
{
512-
/* one overlaps, the other doesn't */
513-
is_valid_inner = false;
514-
}
489+
must_be_leftjoin = true;
515490
}
516491
}
517492

518493
/*
519-
* Fail if violated some SJ's RHS and didn't match to another SJ. However,
520-
* "matching" to a semijoin we are implementing by unique-ification
521-
* doesn't count (think: it's really an inner join).
494+
* Fail if violated any SJ's RHS and didn't match to a LEFT SJ: the
495+
* proposed join can't associate into an SJ's RHS.
496+
*
497+
* Also, fail if the proposed join's predicate isn't strict; we're
498+
* essentially checking to see if we can apply outer-join identity 3, and
499+
* that's a requirement. (This check may be redundant with checks in
500+
* make_outerjoininfo, but I'm not quite sure, and it's cheap to test.)
522501
*/
523-
if (!is_valid_inner &&
524-
(match_sjinfo == NULL || unique_ified))
502+
if (must_be_leftjoin &&
503+
(match_sjinfo == NULL ||
504+
match_sjinfo->jointype != JOIN_LEFT ||
505+
!match_sjinfo->lhs_strict))
525506
return false; /* invalid join path */
526507

527508
/*

src/backend/optimizer/plan/initsplan.c

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1123,17 +1123,6 @@ make_outerjoininfo(PlannerInfo *root,
11231123
min_righthand = bms_int_members(bms_union(clause_relids, inner_join_rels),
11241124
right_rels);
11251125

1126-
/*
1127-
* If we have a degenerate join clause that doesn't mention any RHS rels,
1128-
* force the min RHS to be the syntactic RHS; otherwise we can end up
1129-
* making serious errors, like putting the LHS on the wrong side of an
1130-
* outer join. It seems to be safe to not do this when we have a
1131-
* contribution from inner_join_rels, though; that's enough to pin the SJ
1132-
* to occur at a reasonable place in the tree.
1133-
*/
1134-
if (bms_is_empty(min_righthand))
1135-
min_righthand = bms_copy(right_rels);
1136-
11371126
/*
11381127
* Now check previous outer joins for ordering restrictions.
11391128
*/
@@ -1176,9 +1165,15 @@ make_outerjoininfo(PlannerInfo *root,
11761165
* For a lower OJ in our RHS, if our join condition does not use the
11771166
* lower join's RHS and the lower OJ's join condition is strict, we
11781167
* can interchange the ordering of the two OJs; otherwise we must add
1179-
* the lower OJ's full syntactic relset to min_righthand. Also, we
1180-
* must preserve ordering anyway if either the current join or the
1181-
* lower OJ is either a semijoin or an antijoin.
1168+
* the lower OJ's full syntactic relset to min_righthand.
1169+
*
1170+
* Also, if our join condition does not use the lower join's LHS
1171+
* either, force the ordering to be preserved. Otherwise we can end
1172+
* up with SpecialJoinInfos with identical min_righthands, which can
1173+
* confuse join_is_legal (see discussion in backend/optimizer/README).
1174+
*
1175+
* Also, we must preserve ordering anyway if either the current join
1176+
* or the lower OJ is either a semijoin or an antijoin.
11821177
*
11831178
* Here, we have to consider that "our join condition" includes any
11841179
* clauses that syntactically appeared above the lower OJ and below
@@ -1194,6 +1189,7 @@ make_outerjoininfo(PlannerInfo *root,
11941189
if (bms_overlap(right_rels, otherinfo->syn_righthand))
11951190
{
11961191
if (bms_overlap(clause_relids, otherinfo->syn_righthand) ||
1192+
!bms_overlap(clause_relids, otherinfo->min_lefthand) ||
11971193
jointype == JOIN_SEMI ||
11981194
jointype == JOIN_ANTI ||
11991195
otherinfo->jointype == JOIN_SEMI ||
@@ -1233,10 +1229,12 @@ make_outerjoininfo(PlannerInfo *root,
12331229
* If we found nothing to put in min_lefthand, punt and make it the full
12341230
* LHS, to avoid having an empty min_lefthand which will confuse later
12351231
* processing. (We don't try to be smart about such cases, just correct.)
1236-
* We already forced min_righthand nonempty, so nothing to do for that.
1232+
* Likewise for min_righthand.
12371233
*/
12381234
if (bms_is_empty(min_lefthand))
12391235
min_lefthand = bms_copy(left_rels);
1236+
if (bms_is_empty(min_righthand))
1237+
min_righthand = bms_copy(right_rels);
12401238

12411239
/* Now they'd better be nonempty */
12421240
Assert(!bms_is_empty(min_lefthand));

src/test/regress/expected/join.out

Lines changed: 71 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3385,7 +3385,7 @@ select t1.* from
33853385
Output: t1.f1
33863386
Hash Cond: (i8.q2 = i4.f1)
33873387
-> Nested Loop Left Join
3388-
Output: i8.q2, t1.f1
3388+
Output: t1.f1, i8.q2
33893389
Join Filter: (t1.f1 = '***'::text)
33903390
-> Seq Scan on public.text_tbl t1
33913391
Output: t1.f1
@@ -3433,6 +3433,76 @@ select t1.* from
34333433
hi de ho neighbor
34343434
(2 rows)
34353435

3436+
explain (verbose, costs off)
3437+
select t1.* from
3438+
text_tbl t1
3439+
left join (select *, '***'::text as d1 from int8_tbl i8b1) b1
3440+
left join int8_tbl i8
3441+
left join (select *, null::int as d2 from int8_tbl i8b2, int4_tbl i4b2
3442+
where q1 = f1) b2
3443+
on (i8.q1 = b2.q1)
3444+
on (b2.d2 = b1.q2)
3445+
on (t1.f1 = b1.d1)
3446+
left join int4_tbl i4
3447+
on (i8.q2 = i4.f1);
3448+
QUERY PLAN
3449+
----------------------------------------------------------------------------
3450+
Hash Left Join
3451+
Output: t1.f1
3452+
Hash Cond: (i8.q2 = i4.f1)
3453+
-> Nested Loop Left Join
3454+
Output: t1.f1, i8.q2
3455+
Join Filter: (t1.f1 = '***'::text)
3456+
-> Seq Scan on public.text_tbl t1
3457+
Output: t1.f1
3458+
-> Materialize
3459+
Output: i8.q2
3460+
-> Hash Right Join
3461+
Output: i8.q2
3462+
Hash Cond: ((NULL::integer) = i8b1.q2)
3463+
-> Hash Right Join
3464+
Output: i8.q2, (NULL::integer)
3465+
Hash Cond: (i8b2.q1 = i8.q1)
3466+
-> Hash Join
3467+
Output: i8b2.q1, NULL::integer
3468+
Hash Cond: (i8b2.q1 = i4b2.f1)
3469+
-> Seq Scan on public.int8_tbl i8b2
3470+
Output: i8b2.q1, i8b2.q2
3471+
-> Hash
3472+
Output: i4b2.f1
3473+
-> Seq Scan on public.int4_tbl i4b2
3474+
Output: i4b2.f1
3475+
-> Hash
3476+
Output: i8.q1, i8.q2
3477+
-> Seq Scan on public.int8_tbl i8
3478+
Output: i8.q1, i8.q2
3479+
-> Hash
3480+
Output: i8b1.q2
3481+
-> Seq Scan on public.int8_tbl i8b1
3482+
Output: i8b1.q2
3483+
-> Hash
3484+
Output: i4.f1
3485+
-> Seq Scan on public.int4_tbl i4
3486+
Output: i4.f1
3487+
(37 rows)
3488+
3489+
select t1.* from
3490+
text_tbl t1
3491+
left join (select *, '***'::text as d1 from int8_tbl i8b1) b1
3492+
left join int8_tbl i8
3493+
left join (select *, null::int as d2 from int8_tbl i8b2, int4_tbl i4b2
3494+
where q1 = f1) b2
3495+
on (i8.q1 = b2.q1)
3496+
on (b2.d2 = b1.q2)
3497+
on (t1.f1 = b1.d1)
3498+
left join int4_tbl i4
3499+
on (i8.q2 = i4.f1);
3500+
f1
3501+
-------------------
3502+
doh!
3503+
hi de ho neighbor
3504+
(2 rows)
3505+
34363506
--
34373507
-- test ability to push constants through outer join clauses
34383508
--

src/test/regress/sql/join.sql

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1049,6 +1049,31 @@ select t1.* from
10491049
left join int4_tbl i4
10501050
on (i8.q2 = i4.f1);
10511051

1052+
explain (verbose, costs off)
1053+
select t1.* from
1054+
text_tbl t1
1055+
left join (select *, '***'::text as d1 from int8_tbl i8b1) b1
1056+
left join int8_tbl i8
1057+
left join (select *, null::int as d2 from int8_tbl i8b2, int4_tbl i4b2
1058+
where q1 = f1) b2
1059+
on (i8.q1 = b2.q1)
1060+
on (b2.d2 = b1.q2)
1061+
on (t1.f1 = b1.d1)
1062+
left join int4_tbl i4
1063+
on (i8.q2 = i4.f1);
1064+
1065+
select t1.* from
1066+
text_tbl t1
1067+
left join (select *, '***'::text as d1 from int8_tbl i8b1) b1
1068+
left join int8_tbl i8
1069+
left join (select *, null::int as d2 from int8_tbl i8b2, int4_tbl i4b2
1070+
where q1 = f1) b2
1071+
on (i8.q1 = b2.q1)
1072+
on (b2.d2 = b1.q2)
1073+
on (t1.f1 = b1.d1)
1074+
left join int4_tbl i4
1075+
on (i8.q2 = i4.f1);
1076+
10521077
--
10531078
-- test ability to push constants through outer join clauses
10541079
--

0 commit comments

Comments
 (0)