Skip to content

Commit f3281f9

Browse files
committed
Improve comments for estimate_multivariate_ndistinct()
estimate_multivariate_ndistinct() is coded to assume the caller handles passing it a list of GroupVarInfos with unique 'var' fields over the entire list. 6bb6a62 added code which didn't ensure this and that could result in estimate_multivariate_ndistinct() erroring out with: ERROR: corrupt MVNDistinct entry This occurred because estimate_multivariate_ndistinct() first searches for a set of stats that match to at least two of the given GroupVarInfos and then later assumes that the MVNDistinctItem.items array of the best matching stats will have an entry for those two columns. If the GroupVarInfos List contained a duplicate entry then the same column could be matched to twice and that could trick the code into thinking we have >= 2 columns matched in cases where only a single distinct column has been matched. This could result in a failure to find the correct MVNDistinctItem in the stats as the array containing those never contains an item for single columns. Here we make it more clear that the function needs a distinct set of GroupVarInfos and also tidy up a few other comments to make things a bit easier to follow. Author: David Rowley <drowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvocZCUhM9W9mJ39d6oQz7ePKoqFnao_347mvC-A7QatcQ@mail.gmail.com
1 parent ab3d8af commit f3281f9

File tree

1 file changed

+21
-10
lines changed

1 file changed

+21
-10
lines changed

src/backend/utils/adt/selfuncs.c

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4146,14 +4146,18 @@ estimate_hashagg_tablesize(PlannerInfo *root, Path *path,
41464146
*/
41474147

41484148
/*
4149-
* Find applicable ndistinct statistics for the given list of VarInfos (which
4150-
* must all belong to the given rel), and update *ndistinct to the estimate of
4151-
* the MVNDistinctItem that best matches. If a match it found, *varinfos is
4152-
* updated to remove the list of matched varinfos.
4149+
* Find the best matching ndistinct extended statistics for the given list of
4150+
* GroupVarInfos.
41534151
*
4154-
* Varinfos that aren't for simple Vars are ignored.
4152+
* Callers must ensure that the given GroupVarInfos all belong to 'rel' and
4153+
* the GroupVarInfos list does not contain any duplicate Vars or expressions.
41554154
*
4156-
* Return true if we're able to find a match, false otherwise.
4155+
* When statistics are found that match > 1 of the given GroupVarInfo, the
4156+
* *ndistinct parameter is set according to the ndistinct estimate and a new
4157+
* list is built with the matching GroupVarInfos removed, which is output via
4158+
* the *varinfos parameter before returning true. When no matching stats are
4159+
* found, false is returned and the *varinfos and *ndistinct parameters are
4160+
* left untouched.
41574161
*/
41584162
static bool
41594163
estimate_multivariate_ndistinct(PlannerInfo *root, RelOptInfo *rel,
@@ -4234,15 +4238,22 @@ estimate_multivariate_ndistinct(PlannerInfo *root, RelOptInfo *rel,
42344238
}
42354239
}
42364240

4241+
/*
4242+
* The ndistinct extended statistics contain estimates for a minimum
4243+
* of pairs of columns which the statistics are defined on and
4244+
* certainly not single columns. Here we skip unless we managed to
4245+
* match to at least two columns.
4246+
*/
42374247
if (nshared_vars + nshared_exprs < 2)
42384248
continue;
42394249

42404250
/*
4241-
* Does this statistics object match more columns than the currently
4242-
* best object? If so, use this one instead.
4251+
* Check if these statistics are a better match than the previous best
4252+
* match and if so, take note of the StatisticExtInfo.
42434253
*
4244-
* XXX This should break ties using name of the object, or something
4245-
* like that, to make the outcome stable.
4254+
* The statslist is sorted by statOid, so the StatisticExtInfo we
4255+
* select as the best match is deterministic even when multiple sets
4256+
* of statistics match equally as well.
42464257
*/
42474258
if ((nshared_exprs > nmatches_exprs) ||
42484259
(((nshared_exprs == nmatches_exprs)) && (nshared_vars > nmatches_vars)))

0 commit comments

Comments
 (0)