Skip to content

Commit 0ce627d

Browse files
committed
Document evaluation-order considerations for aggregate functions.
The SELECT reference page didn't really address the question of when aggregate function evaluation occurs, nor did the "expression evaluation rules" documentation mention that CASE can't be used to control whether an aggregate gets evaluated or not. Improve that. Per discussion of bug #11661. Original text by Marti Raudsepp and Michael Paquier, rewritten significantly by me.
1 parent 80eacaa commit 0ce627d

File tree

2 files changed

+48
-10
lines changed

2 files changed

+48
-10
lines changed

doc/src/sgml/ref/select.sgml

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -109,9 +109,11 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
109109

110110
<listitem>
111111
<para>
112-
If the <literal>GROUP BY</literal> clause is specified, the
112+
If the <literal>GROUP BY</literal> clause is specified,
113+
or if there are aggregate function calls, the
113114
output is combined into groups of rows that match on one or more
114-
values. If the <literal>HAVING</literal> clause is present, it
115+
values, and the results of aggregate functions are computed.
116+
If the <literal>HAVING</literal> clause is present, it
115117
eliminates groups that do not satisfy the given condition. (See
116118
<xref linkend="sql-groupby" endterm="sql-groupby-title"> and
117119
<xref linkend="sql-having" endterm="sql-having-title"> below.)
@@ -637,24 +639,37 @@ GROUP BY <replaceable class="parameter">expression</replaceable> [, ...]
637639

638640
<para>
639641
Aggregate functions, if any are used, are computed across all rows
640-
making up each group, producing a separate value for each group
641-
(whereas without <literal>GROUP BY</literal>, an aggregate
642-
produces a single value computed across all the selected rows).
643-
The set of rows fed to the aggregate function can be further filtered by
642+
making up each group, producing a separate value for each group.
643+
(If there are aggregate functions but no <literal>GROUP BY</literal>
644+
clause, the query is treated as having a single group comprising all
645+
the selected rows.)
646+
The set of rows fed to each aggregate function can be further filtered by
644647
attaching a <literal>FILTER</literal> clause to the aggregate function
645648
call; see <xref linkend="syntax-aggregates"> for more information. When
646649
a <literal>FILTER</literal> clause is present, only those rows matching it
647-
are included.
648-
When <literal>GROUP BY</literal> is present, it is not valid for
650+
are included in the input to that aggregate function.
651+
</para>
652+
653+
<para>
654+
When <literal>GROUP BY</literal> is present,
655+
or any aggregate functions are present, it is not valid for
649656
the <command>SELECT</command> list expressions to refer to
650-
ungrouped columns except within aggregate functions or if the
657+
ungrouped columns except within aggregate functions or when the
651658
ungrouped column is functionally dependent on the grouped columns,
652659
since there would otherwise be more than one possible value to
653660
return for an ungrouped column. A functional dependency exists if
654661
the grouped columns (or a subset thereof) are the primary key of
655662
the table containing the ungrouped column.
656663
</para>
657664

665+
<para>
666+
Keep in mind that all aggregate functions are evaluated before
667+
evaluating any <quote>scalar</> expressions in the <literal>HAVING</>
668+
clause or <literal>SELECT</> list. This means that, for example,
669+
a <literal>CASE</> expression cannot be used to skip evaluation of
670+
an aggregate function; see <xref linkend="syntax-express-eval">.
671+
</para>
672+
658673
<para>
659674
Currently, <literal>FOR NO KEY UPDATE</>, <literal>FOR UPDATE</>,
660675
<literal>FOR SHARE</> and <literal>FOR KEY SHARE</> cannot be
@@ -683,7 +698,8 @@ HAVING <replaceable class="parameter">condition</replaceable>
683698
created by <literal>GROUP BY</literal>. Each column referenced in
684699
<replaceable class="parameter">condition</replaceable> must
685700
unambiguously reference a grouping column, unless the reference
686-
appears within an aggregate function.
701+
appears within an aggregate function or the ungrouped column is
702+
functionally dependent on the grouping columns.
687703
</para>
688704

689705
<para>

doc/src/sgml/syntax.sgml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2426,6 +2426,28 @@ SELECT ... WHERE CASE WHEN x &gt; 0 THEN y/x &gt; 1.5 ELSE false END;
24262426
example, it would be better to sidestep the problem by writing
24272427
<literal>y &gt; 1.5*x</> instead.)
24282428
</para>
2429+
2430+
<para>
2431+
A limitation of this technique is that a <literal>CASE</> cannot
2432+
prevent evaluation of an aggregate expression contained within it,
2433+
because aggregate expressions are computed before <quote>scalar</>
2434+
expressions in a <literal>SELECT</> list or <literal>HAVING</> clause
2435+
are considered. For example, the following query can cause a
2436+
division-by-zero error despite seemingly having protected against it:
2437+
<programlisting>
2438+
SELECT CASE WHEN min(employees) > 0
2439+
THEN avg(expenses / employees)
2440+
END
2441+
FROM departments;
2442+
</programlisting>
2443+
The <function>min()</> and <function>avg()</> aggregates are computed
2444+
concurrently over all the input rows, so if any row
2445+
has <structfield>employees</> equal to zero, the division-by-zero error
2446+
will occur before there is any opportunity to test the result of
2447+
<function>min()</>. Instead, use a <literal>WHERE</>
2448+
or <literal>FILTER</> clause to prevent problematic input rows from
2449+
reaching an aggregate function in the first place.
2450+
</para>
24292451
</sect2>
24302452
</sect1>
24312453

0 commit comments

Comments
 (0)