Skip to content

Commit df3a66e

Browse files
committed
Improve planner's handling of set-returning functions in grouping columns.
Improve query_is_distinct_for() to accept SRFs in the targetlist when we can prove distinctness from a DISTINCT clause. In that case the de-duplication will surely happen after SRF expansion, so the proof still works. Continue to punt in the case where we'd try to prove distinctness from GROUP BY (or, in the future, source relations). To do that, we'd have to determine whether the SRFs were in the grouping columns or elsewhere in the tlist, and it still doesn't seem worth the trouble. But this trivial change allows us to recognize that "SELECT DISTINCT unnest(foo) FROM ..." produces unique-ified output, which seems worth having. Also, fix estimate_num_groups() to consider the possibility of SRFs in the grouping columns. Its failure to do so was masked before v10 because grouping_planner() scaled up plan rowcount estimates by the estimated SRF multiplier after performing grouping. That doesn't happen anymore, which is more correct, but it means we need an adjustment in the estimate for the number of groups. Failure to do this leads to an underestimate for the number of output rows of subqueries like "SELECT DISTINCT unnest(foo)" compared to what 9.6 and earlier estimated, thus breaking plan choices in some cases. Per report from Dmitry Shalashov. Back-patch to v10 to avoid degraded plan choices compared to previous releases. Discussion: https://postgr.es/m/CAKPeCUGAeHgoh5O=SvcQxREVkoX7UdeJUMj1F5=aBNvoTa+O8w@mail.gmail.com
1 parent b10967e commit df3a66e

File tree

2 files changed

+41
-14
lines changed

2 files changed

+41
-14
lines changed

src/backend/optimizer/plan/analyzejoins.c

+14-14
Original file line numberDiff line numberDiff line change
@@ -744,8 +744,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
744744
bool
745745
query_supports_distinctness(Query *query)
746746
{
747-
/* we don't cope with SRFs, see comment below */
748-
if (query->hasTargetSRFs)
747+
/* SRFs break distinctness except with DISTINCT, see below */
748+
if (query->hasTargetSRFs && query->distinctClause == NIL)
749749
return false;
750750

751751
/* check for features we can prove distinctness with */
@@ -786,21 +786,11 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
786786

787787
Assert(list_length(colnos) == list_length(opids));
788788

789-
/*
790-
* A set-returning function in the query's targetlist can result in
791-
* returning duplicate rows, if the SRF is evaluated after the
792-
* de-duplication step; so we play it safe and say "no" if there are any
793-
* SRFs. (We could be certain that it's okay if SRFs appear only in the
794-
* specified columns, since those must be evaluated before de-duplication;
795-
* but it doesn't presently seem worth the complication to check that.)
796-
*/
797-
if (query->hasTargetSRFs)
798-
return false;
799-
800789
/*
801790
* DISTINCT (including DISTINCT ON) guarantees uniqueness if all the
802791
* columns in the DISTINCT clause appear in colnos and operator semantics
803-
* match.
792+
* match. This is true even if there are SRFs in the DISTINCT columns or
793+
* elsewhere in the tlist.
804794
*/
805795
if (query->distinctClause)
806796
{
@@ -819,6 +809,16 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
819809
return true;
820810
}
821811

812+
/*
813+
* Otherwise, a set-returning function in the query's targetlist can
814+
* result in returning duplicate rows, despite any grouping that might
815+
* occur before tlist evaluation. (If all tlist SRFs are within GROUP BY
816+
* columns, it would be safe because they'd be expanded before grouping.
817+
* But it doesn't currently seem worth the effort to check for that.)
818+
*/
819+
if (query->hasTargetSRFs)
820+
return false;
821+
822822
/*
823823
* Similarly, GROUP BY without GROUPING SETS guarantees uniqueness if all
824824
* the grouped columns appear in colnos and operator semantics match.

src/backend/utils/adt/selfuncs.c

+27
Original file line numberDiff line numberDiff line change
@@ -3361,6 +3361,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
33613361
List **pgset)
33623362
{
33633363
List *varinfos = NIL;
3364+
double srf_multiplier = 1.0;
33643365
double numdistinct;
33653366
ListCell *l;
33663367
int i;
@@ -3394,6 +3395,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
33943395
foreach(l, groupExprs)
33953396
{
33963397
Node *groupexpr = (Node *) lfirst(l);
3398+
double this_srf_multiplier;
33973399
VariableStatData vardata;
33983400
List *varshere;
33993401
ListCell *l2;
@@ -3402,6 +3404,21 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
34023404
if (pgset && !list_member_int(*pgset, i++))
34033405
continue;
34043406

3407+
/*
3408+
* Set-returning functions in grouping columns are a bit problematic.
3409+
* The code below will effectively ignore their SRF nature and come up
3410+
* with a numdistinct estimate as though they were scalar functions.
3411+
* We compensate by scaling up the end result by the largest SRF
3412+
* rowcount estimate. (This will be an overestimate if the SRF
3413+
* produces multiple copies of any output value, but it seems best to
3414+
* assume the SRF's outputs are distinct. In any case, it's probably
3415+
* pointless to worry too much about this without much better
3416+
* estimates for SRF output rowcounts than we have today.)
3417+
*/
3418+
this_srf_multiplier = expression_returns_set_rows(groupexpr);
3419+
if (srf_multiplier < this_srf_multiplier)
3420+
srf_multiplier = this_srf_multiplier;
3421+
34053422
/* Short-circuit for expressions returning boolean */
34063423
if (exprType(groupexpr) == BOOLOID)
34073424
{
@@ -3467,9 +3484,15 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
34673484
*/
34683485
if (varinfos == NIL)
34693486
{
3487+
/* Apply SRF multiplier as we would do in the long path */
3488+
numdistinct *= srf_multiplier;
3489+
/* Round off */
3490+
numdistinct = ceil(numdistinct);
34703491
/* Guard against out-of-range answers */
34713492
if (numdistinct > input_rows)
34723493
numdistinct = input_rows;
3494+
if (numdistinct < 1.0)
3495+
numdistinct = 1.0;
34733496
return numdistinct;
34743497
}
34753498

@@ -3638,6 +3661,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
36383661
varinfos = newvarinfos;
36393662
} while (varinfos != NIL);
36403663

3664+
/* Now we can account for the effects of any SRFs */
3665+
numdistinct *= srf_multiplier;
3666+
3667+
/* Round off */
36413668
numdistinct = ceil(numdistinct);
36423669

36433670
/* Guard against out-of-range answers */

0 commit comments

Comments
 (0)