Skip to content

Commit a4d75c8

Browse files
committed
Extended statistics on expressions
Allow defining extended statistics on expressions, not just just on simple column references. With this commit, expressions are supported by all existing extended statistics kinds, improving the same types of estimates. A simple example may look like this: CREATE TABLE t (a int); CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t; ANALYZE t; The collected statistics are useful e.g. to estimate queries with those expressions in WHERE or GROUP BY clauses: SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0; SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20); This introduces new internal statistics kind 'e' (expressions) which is built automatically when the statistics object definition includes any expressions. This represents single-expression statistics, as if there was an expression index (but without the index maintenance overhead). The statistics is stored in pg_statistics_ext_data as an array of composite types, which is possible thanks to 79f6a94. CREATE STATISTICS allows building statistics on a single expression, in which case in which case it's not possible to specify statistics kinds. A new system view pg_stats_ext_exprs can be used to display expression statistics, similarly to pg_stats and pg_stats_ext views. ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it treats indexes, i.e. it drops and recreates the statistics. This means all statistics are reset, and we no longer try to preserve at least the functional dependencies. This should not be a major issue in practice, as the functional dependencies actually rely on per-column statistics, which were always reset anyway. Author: Tomas Vondra Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
1 parent 98376c1 commit a4d75c8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+5982
-963
lines changed

doc/src/sgml/catalogs.sgml

+287-8
Original file line numberDiff line numberDiff line change
@@ -7385,8 +7385,22 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
73857385
<literal>d</literal> for n-distinct statistics,
73867386
<literal>f</literal> for functional dependency statistics, and
73877387
<literal>m</literal> for most common values (MCV) list statistics
7388+
<literal>e</literal> for expression statistics
73887389
</para></entry>
73897390
</row>
7391+
7392+
<row>
7393+
<entry role="catalog_table_entry"><para role="column_definition">
7394+
<structfield>stxexprs</structfield> <type>pg_node_tree</type>
7395+
</para>
7396+
<para>
7397+
Expression trees (in <function>nodeToString()</function>
7398+
representation) for statistics object attributes that are not simple
7399+
column references. This is a list with one element per expression.
7400+
Null if all statistics object attributes are simple references.
7401+
</para></entry>
7402+
</row>
7403+
73907404
</tbody>
73917405
</tgroup>
73927406
</table>
@@ -7452,7 +7466,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
74527466
(references <link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>.<structfield>oid</structfield>)
74537467
</para>
74547468
<para>
7455-
Extended statistic object containing the definition for this data
7469+
Extended statistics object containing the definition for this data
74567470
</para></entry>
74577471
</row>
74587472

@@ -7484,6 +7498,15 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
74847498
<structname>pg_mcv_list</structname> type
74857499
</para></entry>
74867500
</row>
7501+
7502+
<row>
7503+
<entry role="catalog_table_entry"><para role="column_definition">
7504+
<structfield>stxexprs</structfield> <type>pg_node_tree</type>
7505+
</para>
7506+
<para>
7507+
A list of any expressions covered by this statistics object.
7508+
</para></entry>
7509+
</row>
74877510
</tbody>
74887511
</tgroup>
74897512
</table>
@@ -7637,6 +7660,16 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
76377660
see <xref linkend="logical-replication-publication"/>.
76387661
</para></entry>
76397662
</row>
7663+
7664+
<row>
7665+
<entry role="catalog_table_entry"><para role="column_definition">
7666+
<structfield>stxdexpr</structfield> <type>pg_statistic[]</type>
7667+
</para>
7668+
<para>
7669+
Per-expression statistics, serialized as an array of
7670+
<structname>pg_statistic</structname> type
7671+
</para></entry>
7672+
</row>
76407673
</tbody>
76417674
</tgroup>
76427675
</table>
@@ -9444,6 +9477,11 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
94449477
<entry>extended planner statistics</entry>
94459478
</row>
94469479

9480+
<row>
9481+
<entry><link linkend="view-pg-stats-ext-exprs"><structname>pg_stats_ext_exprs</structname></link></entry>
9482+
<entry>extended planner statistics for expressions</entry>
9483+
</row>
9484+
94479485
<row>
94489486
<entry><link linkend="view-pg-tables"><structname>pg_tables</structname></link></entry>
94499487
<entry>tables</entry>
@@ -12696,10 +12734,19 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
1269612734
(references <link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attname</structfield>)
1269712735
</para>
1269812736
<para>
12699-
Name of the column described by this row
12737+
Names of the columns included in the extended statistics object
1270012738
</para></entry>
1270112739
</row>
1270212740

12741+
<row>
12742+
<entry role="catalog_table_entry"><para role="column_definition">
12743+
<structfield>exprs</structfield> <type>text[]</type>
12744+
</para>
12745+
<para>
12746+
Expressions included in the extended statistics object
12747+
</para></entry>
12748+
</row>
12749+
1270312750
<row>
1270412751
<entry role="catalog_table_entry"><para role="column_definition">
1270512752
<structfield>inherited</structfield> <type>bool</type>
@@ -12851,7 +12898,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
1285112898

1285212899
<para>
1285312900
The view <structname>pg_stats_ext</structname> provides access to
12854-
the information stored in the <link
12901+
information about each extended statistics object in the database,
12902+
combining information stored in the <link
1285512903
linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>
1285612904
and <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link>
1285712905
catalogs. This view allows access only to rows of
@@ -12908,7 +12956,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
1290812956
(references <link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link>.<structfield>nspname</structfield>)
1290912957
</para>
1291012958
<para>
12911-
Name of schema containing extended statistic
12959+
Name of schema containing extended statistics object
1291212960
</para></entry>
1291312961
</row>
1291412962

@@ -12918,7 +12966,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
1291812966
(references <link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>.<structfield>stxname</structfield>)
1291912967
</para>
1292012968
<para>
12921-
Name of extended statistics
12969+
Name of extended statistics object
1292212970
</para></entry>
1292312971
</row>
1292412972

@@ -12928,7 +12976,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
1292812976
(references <link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.<structfield>rolname</structfield>)
1292912977
</para>
1293012978
<para>
12931-
Owner of the extended statistics
12979+
Owner of the extended statistics object
1293212980
</para></entry>
1293312981
</row>
1293412982

@@ -12938,7 +12986,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
1293812986
(references <link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attname</structfield>)
1293912987
</para>
1294012988
<para>
12941-
Names of the columns the extended statistics is defined on
12989+
Names of the columns the extended statistics object is defined on
1294212990
</para></entry>
1294312991
</row>
1294412992

@@ -12947,7 +12995,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
1294712995
<structfield>kinds</structfield> <type>char[]</type>
1294812996
</para>
1294912997
<para>
12950-
Types of extended statistics enabled for this record
12998+
Types of extended statistics object enabled for this record
1295112999
</para></entry>
1295213000
</row>
1295313001

@@ -13032,6 +13080,237 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
1303213080

1303313081
</sect1>
1303413082

13083+
<sect1 id="view-pg-stats-ext-exprs">
13084+
<title><structname>pg_stats_ext_exprs</structname></title>
13085+
13086+
<indexterm zone="view-pg-stats-ext-exprs">
13087+
<primary>pg_stats_ext_exprs</primary>
13088+
</indexterm>
13089+
13090+
<para>
13091+
The view <structname>pg_stats_ext_exprs</structname> provides access to
13092+
information about all expressions included in extended statistics objects,
13093+
combining information stored in the <link
13094+
linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>
13095+
and <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link>
13096+
catalogs. This view allows access only to rows of
13097+
<link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link> and <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link>
13098+
that correspond to tables the user has permission to read, and therefore
13099+
it is safe to allow public read access to this view.
13100+
</para>
13101+
13102+
<para>
13103+
<structname>pg_stats_ext_exprs</structname> is also designed to present
13104+
the information in a more readable format than the underlying catalogs
13105+
&mdash; at the cost that its schema must be extended whenever the structure
13106+
of statistics in <link linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link> changes.
13107+
</para>
13108+
13109+
<table>
13110+
<title><structname>pg_stats_ext_exprs</structname> Columns</title>
13111+
<tgroup cols="1">
13112+
<thead>
13113+
<row>
13114+
<entry role="catalog_table_entry"><para role="column_definition">
13115+
Column Type
13116+
</para>
13117+
<para>
13118+
Description
13119+
</para></entry>
13120+
</row>
13121+
</thead>
13122+
13123+
<tbody>
13124+
<row>
13125+
<entry role="catalog_table_entry"><para role="column_definition">
13126+
<structfield>schemaname</structfield> <type>name</type>
13127+
(references <link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link>.<structfield>nspname</structfield>)
13128+
</para>
13129+
<para>
13130+
Name of schema containing table
13131+
</para></entry>
13132+
</row>
13133+
13134+
<row>
13135+
<entry role="catalog_table_entry"><para role="column_definition">
13136+
<structfield>tablename</structfield> <type>name</type>
13137+
(references <link linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>relname</structfield>)
13138+
</para>
13139+
<para>
13140+
Name of table the statistics object is defined on
13141+
</para></entry>
13142+
</row>
13143+
13144+
<row>
13145+
<entry role="catalog_table_entry"><para role="column_definition">
13146+
<structfield>statistics_schemaname</structfield> <type>name</type>
13147+
(references <link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link>.<structfield>nspname</structfield>)
13148+
</para>
13149+
<para>
13150+
Name of schema containing extended statistics object
13151+
</para></entry>
13152+
</row>
13153+
13154+
<row>
13155+
<entry role="catalog_table_entry"><para role="column_definition">
13156+
<structfield>statistics_name</structfield> <type>name</type>
13157+
(references <link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>.<structfield>stxname</structfield>)
13158+
</para>
13159+
<para>
13160+
Name of extended statistics object
13161+
</para></entry>
13162+
</row>
13163+
13164+
<row>
13165+
<entry role="catalog_table_entry"><para role="column_definition">
13166+
<structfield>statistics_owner</structfield> <type>name</type>
13167+
(references <link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.<structfield>rolname</structfield>)
13168+
</para>
13169+
<para>
13170+
Owner of the extended statistics object
13171+
</para></entry>
13172+
</row>
13173+
13174+
<row>
13175+
<entry role="catalog_table_entry"><para role="column_definition">
13176+
<structfield>expr</structfield> <type>text</type>
13177+
</para>
13178+
<para>
13179+
Expression included in the extended statistics object
13180+
</para></entry>
13181+
</row>
13182+
13183+
<row>
13184+
<entry role="catalog_table_entry"><para role="column_definition">
13185+
<structfield>null_frac</structfield> <type>float4</type>
13186+
</para>
13187+
<para>
13188+
Fraction of expression entries that are null
13189+
</para></entry>
13190+
</row>
13191+
13192+
<row>
13193+
<entry role="catalog_table_entry"><para role="column_definition">
13194+
<structfield>avg_width</structfield> <type>int4</type>
13195+
</para>
13196+
<para>
13197+
Average width in bytes of expression's entries
13198+
</para></entry>
13199+
</row>
13200+
13201+
<row>
13202+
<entry role="catalog_table_entry"><para role="column_definition">
13203+
<structfield>n_distinct</structfield> <type>float4</type>
13204+
</para>
13205+
<para>
13206+
If greater than zero, the estimated number of distinct values in the
13207+
expression. If less than zero, the negative of the number of distinct
13208+
values divided by the number of rows. (The negated form is used when
13209+
<command>ANALYZE</command> believes that the number of distinct values is
13210+
likely to increase as the table grows; the positive form is used when
13211+
the expression seems to have a fixed number of possible values.) For
13212+
example, -1 indicates a unique expression in which the number of distinct
13213+
values is the same as the number of rows.
13214+
</para></entry>
13215+
</row>
13216+
13217+
<row>
13218+
<entry role="catalog_table_entry"><para role="column_definition">
13219+
<structfield>most_common_vals</structfield> <type>anyarray</type>
13220+
</para>
13221+
<para>
13222+
A list of the most common values in the expression. (Null if
13223+
no values seem to be more common than any others.)
13224+
</para></entry>
13225+
</row>
13226+
13227+
<row>
13228+
<entry role="catalog_table_entry"><para role="column_definition">
13229+
<structfield>most_common_freqs</structfield> <type>float4[]</type>
13230+
</para>
13231+
<para>
13232+
A list of the frequencies of the most common values,
13233+
i.e., number of occurrences of each divided by total number of rows.
13234+
(Null when <structfield>most_common_vals</structfield> is.)
13235+
</para></entry>
13236+
</row>
13237+
13238+
<row>
13239+
<entry role="catalog_table_entry"><para role="column_definition">
13240+
<structfield>histogram_bounds</structfield> <type>anyarray</type>
13241+
</para>
13242+
<para>
13243+
A list of values that divide the expression's values into groups of
13244+
approximately equal population. The values in
13245+
<structfield>most_common_vals</structfield>, if present, are omitted from this
13246+
histogram calculation. (This expression is null if the expression data type
13247+
does not have a <literal>&lt;</literal> operator or if the
13248+
<structfield>most_common_vals</structfield> list accounts for the entire
13249+
population.)
13250+
</para></entry>
13251+
</row>
13252+
13253+
<row>
13254+
<entry role="catalog_table_entry"><para role="column_definition">
13255+
<structfield>correlation</structfield> <type>float4</type>
13256+
</para>
13257+
<para>
13258+
Statistical correlation between physical row ordering and
13259+
logical ordering of the expression values. This ranges from -1 to +1.
13260+
When the value is near -1 or +1, an index scan on the expression will
13261+
be estimated to be cheaper than when it is near zero, due to reduction
13262+
of random access to the disk. (This expression is null if the expression's
13263+
data type does not have a <literal>&lt;</literal> operator.)
13264+
</para></entry>
13265+
</row>
13266+
13267+
<row>
13268+
<entry role="catalog_table_entry"><para role="column_definition">
13269+
<structfield>most_common_elems</structfield> <type>anyarray</type>
13270+
</para>
13271+
<para>
13272+
A list of non-null element values most often appearing within values of
13273+
the expression. (Null for scalar types.)
13274+
</para></entry>
13275+
</row>
13276+
13277+
<row>
13278+
<entry role="catalog_table_entry"><para role="column_definition">
13279+
<structfield>most_common_elem_freqs</structfield> <type>float4[]</type>
13280+
</para>
13281+
<para>
13282+
A list of the frequencies of the most common element values, i.e., the
13283+
fraction of rows containing at least one instance of the given value.
13284+
Two or three additional values follow the per-element frequencies;
13285+
these are the minimum and maximum of the preceding per-element
13286+
frequencies, and optionally the frequency of null elements.
13287+
(Null when <structfield>most_common_elems</structfield> is.)
13288+
</para></entry>
13289+
</row>
13290+
13291+
<row>
13292+
<entry role="catalog_table_entry"><para role="column_definition">
13293+
<structfield>elem_count_histogram</structfield> <type>float4[]</type>
13294+
</para>
13295+
<para>
13296+
A histogram of the counts of distinct non-null element values within the
13297+
values of the expression, followed by the average number of distinct
13298+
non-null elements. (Null for scalar types.)
13299+
</para></entry>
13300+
</row>
13301+
</tbody>
13302+
</tgroup>
13303+
</table>
13304+
13305+
<para>
13306+
The maximum number of entries in the array fields can be controlled on a
13307+
column-by-column basis using the <link linkend="sql-altertable"><command>ALTER
13308+
TABLE SET STATISTICS</command></link> command, or globally by setting the
13309+
<xref linkend="guc-default-statistics-target"/> run-time parameter.
13310+
</para>
13311+
13312+
</sect1>
13313+
1303513314
<sect1 id="view-pg-tables">
1303613315
<title><structname>pg_tables</structname></title>
1303713316

0 commit comments

Comments
 (0)