Skip to content

Commit 334fb8c

Browse files
committed
Avoid believing incomplete MCV-only stats in get_variable_range().
get_variable_range() would incautiously believe that statistics containing only an MCV list are sufficient to derive a range estimate. That's okay for an enum-like column that contains only MCVs, but otherwise the estimate could be pretty bad. Make it report that the range is indeterminate unless the MCVs plus nullfrac account for the whole table. I don't think this needs a dedicated test case, since a quick code coverage check verifies that the existing regression tests traverse all the alternatives. There is room to doubt that a future-proof test case could be built anyway, given that the submitted example accidentally doesn't fail before v11. Per bug #17207 from Simon Perepelitsa. Back-patch to v10. In principle this has been broken all along, but I'm hesitant to make such changes in 9.6, since if anyone is unhappy with 9.6.24's behavior there will be no second chance to fix it. Discussion: https://postgr.es/m/17207-5265aefa79e333b4@postgresql.org
1 parent cded2c4 commit 334fb8c

File tree

1 file changed

+56
-30
lines changed

1 file changed

+56
-30
lines changed

src/backend/utils/adt/selfuncs.c

Lines changed: 56 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -5264,46 +5264,72 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata,
52645264
/*
52655265
* If we have most-common-values info, look for extreme MCVs. This is
52665266
* needed even if we also have a histogram, since the histogram excludes
5267-
* the MCVs. However, usually the MCVs will not be the extreme values, so
5268-
* avoid unnecessary data copying.
5267+
* the MCVs. However, if we *only* have MCVs and no histogram, we should
5268+
* be pretty wary of deciding that that is a full representation of the
5269+
* data. Proceed only if the MCVs represent the whole table (to within
5270+
* roundoff error).
52695271
*/
52705272
if (get_attstatsslot(&sslot, vardata->statsTuple,
52715273
STATISTIC_KIND_MCV, InvalidOid,
5272-
ATTSTATSSLOT_VALUES))
5274+
have_data ? ATTSTATSSLOT_VALUES :
5275+
(ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS)))
52735276
{
5274-
bool tmin_is_mcv = false;
5275-
bool tmax_is_mcv = false;
5276-
FmgrInfo opproc;
5277+
bool use_mcvs = have_data;
52775278

5278-
fmgr_info(opfuncoid, &opproc);
5279+
if (!have_data)
5280+
{
5281+
double sumcommon = 0.0;
5282+
double nullfrac;
5283+
int i;
52795284

5280-
for (i = 0; i < sslot.nvalues; i++)
5285+
for (i = 0; i < sslot.nnumbers; i++)
5286+
sumcommon += sslot.numbers[i];
5287+
nullfrac = ((Form_pg_statistic) GETSTRUCT(vardata->statsTuple))->stanullfrac;
5288+
if (sumcommon + nullfrac > 0.99999)
5289+
use_mcvs = true;
5290+
}
5291+
5292+
if (use_mcvs)
52815293
{
5282-
if (!have_data)
5283-
{
5284-
tmin = tmax = sslot.values[i];
5285-
tmin_is_mcv = tmax_is_mcv = have_data = true;
5286-
continue;
5287-
}
5288-
if (DatumGetBool(FunctionCall2Coll(&opproc,
5289-
collation,
5290-
sslot.values[i], tmin)))
5291-
{
5292-
tmin = sslot.values[i];
5293-
tmin_is_mcv = true;
5294-
}
5295-
if (DatumGetBool(FunctionCall2Coll(&opproc,
5296-
collation,
5297-
tmax, sslot.values[i])))
5294+
/*
5295+
* Usually the MCVs will not be the extreme values, so avoid
5296+
* unnecessary data copying.
5297+
*/
5298+
bool tmin_is_mcv = false;
5299+
bool tmax_is_mcv = false;
5300+
FmgrInfo opproc;
5301+
5302+
fmgr_info(opfuncoid, &opproc);
5303+
5304+
for (i = 0; i < sslot.nvalues; i++)
52985305
{
5299-
tmax = sslot.values[i];
5300-
tmax_is_mcv = true;
5306+
if (!have_data)
5307+
{
5308+
tmin = tmax = sslot.values[i];
5309+
tmin_is_mcv = tmax_is_mcv = have_data = true;
5310+
continue;
5311+
}
5312+
if (DatumGetBool(FunctionCall2Coll(&opproc,
5313+
collation,
5314+
sslot.values[i], tmin)))
5315+
{
5316+
tmin = sslot.values[i];
5317+
tmin_is_mcv = true;
5318+
}
5319+
if (DatumGetBool(FunctionCall2Coll(&opproc,
5320+
collation,
5321+
tmax, sslot.values[i])))
5322+
{
5323+
tmax = sslot.values[i];
5324+
tmax_is_mcv = true;
5325+
}
53015326
}
5327+
if (tmin_is_mcv)
5328+
tmin = datumCopy(tmin, typByVal, typLen);
5329+
if (tmax_is_mcv)
5330+
tmax = datumCopy(tmax, typByVal, typLen);
53025331
}
5303-
if (tmin_is_mcv)
5304-
tmin = datumCopy(tmin, typByVal, typLen);
5305-
if (tmax_is_mcv)
5306-
tmax = datumCopy(tmax, typByVal, typLen);
5332+
53075333
free_attstatsslot(&sslot);
53085334
}
53095335

0 commit comments

Comments
 (0)