Skip to content

Commit 159c47d

Browse files
committed
Be more wary of missing statistics in eqjoinsel_semi().
In particular, if we don't have real ndistinct estimates for both sides, fall back to assuming that half of the left-hand rows have join partners. This is what was done in 8.2 and 8.3 (cf nulltestsel() in those versions). It's pretty stupid but it won't lead us to think that an antijoin produces no rows out, as seen in recent example from Uwe Schroeder.
1 parent 0e754ab commit 159c47d

File tree

1 file changed

+32
-17
lines changed

1 file changed

+32
-17
lines changed

src/backend/utils/adt/selfuncs.c

Lines changed: 32 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2329,7 +2329,9 @@ eqjoinsel_semi(Oid operator,
23292329
bool *hasmatch1;
23302330
bool *hasmatch2;
23312331
double nullfrac1 = stats1->stanullfrac;
2332-
double matchfreq1;
2332+
double matchfreq1,
2333+
uncertainfrac,
2334+
uncertain;
23332335
int i,
23342336
nmatches;
23352337

@@ -2382,18 +2384,26 @@ eqjoinsel_semi(Oid operator,
23822384
* the uncertain rows that a fraction nd2/nd1 have join partners. We
23832385
* can discount the known-matched MCVs from the distinct-values counts
23842386
* before doing the division.
2387+
*
2388+
* Crude as the above is, it's completely useless if we don't have
2389+
* reliable ndistinct values for both sides. Hence, if either nd1
2390+
* or nd2 is default, punt and assume half of the uncertain rows
2391+
* have join partners.
23852392
*/
2386-
nd1 -= nmatches;
2387-
nd2 -= nmatches;
2388-
if (nd1 <= nd2 || nd2 <= 0)
2389-
selec = Max(matchfreq1, 1.0 - nullfrac1);
2390-
else
2393+
if (nd1 != DEFAULT_NUM_DISTINCT && nd2 != DEFAULT_NUM_DISTINCT)
23912394
{
2392-
double uncertain = 1.0 - matchfreq1 - nullfrac1;
2393-
2394-
CLAMP_PROBABILITY(uncertain);
2395-
selec = matchfreq1 + (nd2 / nd1) * uncertain;
2395+
nd1 -= nmatches;
2396+
nd2 -= nmatches;
2397+
if (nd1 <= nd2 || nd2 <= 0)
2398+
uncertainfrac = 1.0;
2399+
else
2400+
uncertainfrac = nd2 / nd1;
23962401
}
2402+
else
2403+
uncertainfrac = 0.5;
2404+
uncertain = 1.0 - matchfreq1 - nullfrac1;
2405+
CLAMP_PROBABILITY(uncertain);
2406+
selec = matchfreq1 + uncertainfrac * uncertain;
23972407
}
23982408
else
23992409
{
@@ -2403,15 +2413,20 @@ eqjoinsel_semi(Oid operator,
24032413
*/
24042414
double nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;
24052415

2406-
if (vardata1->rel)
2407-
nd1 = Min(nd1, vardata1->rel->rows);
2408-
if (vardata2->rel)
2409-
nd2 = Min(nd2, vardata2->rel->rows);
2416+
if (nd1 != DEFAULT_NUM_DISTINCT && nd2 != DEFAULT_NUM_DISTINCT)
2417+
{
2418+
if (vardata1->rel)
2419+
nd1 = Min(nd1, vardata1->rel->rows);
2420+
if (vardata2->rel)
2421+
nd2 = Min(nd2, vardata2->rel->rows);
24102422

2411-
if (nd1 <= nd2 || nd2 <= 0)
2412-
selec = 1.0 - nullfrac1;
2423+
if (nd1 <= nd2 || nd2 <= 0)
2424+
selec = 1.0 - nullfrac1;
2425+
else
2426+
selec = (nd2 / nd1) * (1.0 - nullfrac1);
2427+
}
24132428
else
2414-
selec = (nd2 / nd1) * (1.0 - nullfrac1);
2429+
selec = 0.5 * (1.0 - nullfrac1);
24152430
}
24162431

24172432
if (have_mcvs1)

0 commit comments

Comments
 (0)