Skip to content

Commit a9c70b4

Browse files
committed
Add pg_stat_io view, providing more detailed IO statistics
Builds on 28e626b and f30d62c. See the former for motivation. Rows of the view show IO operations for a particular backend type, IO target object, IO context combination (e.g. a client backend's operations on permanent relations in shared buffers) and each column in the view is the total number of IO Operations done (e.g. writes). So a cell in the view would be, for example, the number of blocks of relation data written from shared buffers by client backends since the last stats reset. In anticipation of tracking WAL IO and non-block-oriented IO (such as temporary file IO), the "op_bytes" column specifies the unit of the "reads", "writes", and "extends" columns for a given row. Rows for combinations of IO operation, backend type, target object and context that never occur, are ommitted entirely. For example, checkpointer will never operate on temporary relations. Similarly, if an IO operation never occurs for such a combination, the IO operation's cell will be null, to distinguish from 0 observed IO operations. For example, bgwriter should not perform reads. Note that some of the cells in the view are redundant with fields in pg_stat_bgwriter (e.g. buffers_backend). For now, these have been kept for backwards compatibility. Bumps catversion. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Samay Sharma <smilingsamay@gmail.com> Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
1 parent 44e56ba commit a9c70b4

File tree

7 files changed

+486
-15
lines changed

7 files changed

+486
-15
lines changed

doc/src/sgml/monitoring.sgml

+307-14
Original file line numberDiff line numberDiff line change
@@ -469,6 +469,16 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
469469
</entry>
470470
</row>
471471

472+
<row>
473+
<entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
474+
<entry>
475+
One row for each combination of backend type, context, and target object
476+
containing cluster-wide I/O statistics.
477+
See <link linkend="monitoring-pg-stat-io-view">
478+
<structname>pg_stat_io</structname></link> for details.
479+
</entry>
480+
</row>
481+
472482
<row>
473483
<entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
474484
<entry>One row per replication slot, showing statistics about the
@@ -665,20 +675,16 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
665675
</para>
666676

667677
<para>
668-
The <structname>pg_statio_</structname> views are primarily useful to
669-
determine the effectiveness of the buffer cache. When the number
670-
of actual disk reads is much smaller than the number of buffer
671-
hits, then the cache is satisfying most read requests without
672-
invoking a kernel call. However, these statistics do not give the
673-
entire story: due to the way in which <productname>PostgreSQL</productname>
674-
handles disk I/O, data that is not in the
675-
<productname>PostgreSQL</productname> buffer cache might still reside in the
676-
kernel's I/O cache, and might therefore still be fetched without
677-
requiring a physical read. Users interested in obtaining more
678-
detailed information on <productname>PostgreSQL</productname> I/O behavior are
679-
advised to use the <productname>PostgreSQL</productname> statistics views
680-
in combination with operating system utilities that allow insight
681-
into the kernel's handling of I/O.
678+
The <structname>pg_stat_io</structname> and
679+
<structname>pg_statio_</structname> set of views are useful for determining
680+
the effectiveness of the buffer cache. They can be used to calculate a cache
681+
hit ratio. Note that while <productname>PostgreSQL</productname>'s I/O
682+
statistics capture most instances in which the kernel was invoked in order
683+
to perform I/O, they do not differentiate between data which had to be
684+
fetched from disk and that which already resided in the kernel page cache.
685+
Users are advised to use the <productname>PostgreSQL</productname>
686+
statistics views in combination with operating system utilities for a more
687+
complete picture of their database's I/O performance.
682688
</para>
683689

684690
</sect2>
@@ -3669,6 +3675,293 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
36693675
<structfield>last_archived_wal</structfield> have also been successfully
36703676
archived.
36713677
</para>
3678+
</sect2>
3679+
3680+
<sect2 id="monitoring-pg-stat-io-view">
3681+
<title><structname>pg_stat_io</structname></title>
3682+
3683+
<indexterm>
3684+
<primary>pg_stat_io</primary>
3685+
</indexterm>
3686+
3687+
<para>
3688+
The <structname>pg_stat_io</structname> view will contain one row for each
3689+
combination of backend type, target I/O object, and I/O context, showing
3690+
cluster-wide I/O statistics. Combinations which do not make sense are
3691+
omitted.
3692+
</para>
3693+
3694+
<para>
3695+
Currently, I/O on relations (e.g. tables, indexes) is tracked. However,
3696+
relation I/O which bypasses shared buffers (e.g. when moving a table from one
3697+
tablespace to another) is currently not tracked.
3698+
</para>
3699+
3700+
<table id="pg-stat-io-view" xreflabel="pg_stat_io">
3701+
<title><structname>pg_stat_io</structname> View</title>
3702+
<tgroup cols="1">
3703+
<thead>
3704+
<row>
3705+
<entry role="catalog_table_entry">
3706+
<para role="column_definition">
3707+
Column Type
3708+
</para>
3709+
<para>
3710+
Description
3711+
</para>
3712+
</entry>
3713+
</row>
3714+
</thead>
3715+
<tbody>
3716+
<row>
3717+
<entry role="catalog_table_entry">
3718+
<para role="column_definition">
3719+
<structfield>backend_type</structfield> <type>text</type>
3720+
</para>
3721+
<para>
3722+
Type of backend (e.g. background worker, autovacuum worker). See <link
3723+
linkend="monitoring-pg-stat-activity-view">
3724+
<structname>pg_stat_activity</structname></link> for more information
3725+
on <varname>backend_type</varname>s. Some
3726+
<varname>backend_type</varname>s do not accumulate I/O operation
3727+
statistics and will not be included in the view.
3728+
</para>
3729+
</entry>
3730+
</row>
3731+
3732+
<row>
3733+
<entry role="catalog_table_entry">
3734+
<para role="column_definition">
3735+
<structfield>io_object</structfield> <type>text</type>
3736+
</para>
3737+
<para>
3738+
Target object of an I/O operation. Possible values are:
3739+
<itemizedlist>
3740+
<listitem>
3741+
<para>
3742+
<literal>relation</literal>: Permanent relations.
3743+
</para>
3744+
</listitem>
3745+
<listitem>
3746+
<para>
3747+
<literal>temp relation</literal>: Temporary relations.
3748+
</para>
3749+
</listitem>
3750+
</itemizedlist>
3751+
</para>
3752+
</entry>
3753+
</row>
3754+
3755+
<row>
3756+
<entry role="catalog_table_entry">
3757+
<para role="column_definition">
3758+
<structfield>io_context</structfield> <type>text</type>
3759+
</para>
3760+
<para>
3761+
The context of an I/O operation. Possible values are:
3762+
</para>
3763+
<itemizedlist>
3764+
<listitem>
3765+
<para>
3766+
<literal>normal</literal>: The default or standard
3767+
<varname>io_context</varname> for a type of I/O operation. For
3768+
example, by default, relation data is read into and written out from
3769+
shared buffers. Thus, reads and writes of relation data to and from
3770+
shared buffers are tracked in <varname>io_context</varname>
3771+
<literal>normal</literal>.
3772+
</para>
3773+
</listitem>
3774+
<listitem>
3775+
<para>
3776+
<literal>vacuum</literal>: I/O operations performed outside of shared
3777+
buffers while vacuuming and analyzing permanent relations. Temporary
3778+
table vacuums use the same local buffer pool as other temporary table
3779+
IO operations and are tracked in <varname>io_context</varname>
3780+
<literal>normal</literal>.
3781+
</para>
3782+
</listitem>
3783+
<listitem>
3784+
<para>
3785+
<literal>bulkread</literal>: Certain large read I/O operations
3786+
done outside of shared buffers, for example, a sequential scan of a
3787+
large table.
3788+
</para>
3789+
</listitem>
3790+
<listitem>
3791+
<para>
3792+
<literal>bulkwrite</literal>: Certain large write I/O operations
3793+
done outside of shared buffers, such as <command>COPY</command>.
3794+
</para>
3795+
</listitem>
3796+
</itemizedlist>
3797+
</entry>
3798+
</row>
3799+
3800+
<row>
3801+
<entry role="catalog_table_entry">
3802+
<para role="column_definition">
3803+
<structfield>reads</structfield> <type>bigint</type>
3804+
</para>
3805+
<para>
3806+
Number of read operations, each of the size specified in
3807+
<varname>op_bytes</varname>.
3808+
</para>
3809+
</entry>
3810+
</row>
3811+
3812+
<row>
3813+
<entry role="catalog_table_entry">
3814+
<para role="column_definition">
3815+
<structfield>writes</structfield> <type>bigint</type>
3816+
</para>
3817+
<para>
3818+
Number of write operations, each of the size specified in
3819+
<varname>op_bytes</varname>.
3820+
</para>
3821+
</entry>
3822+
</row>
3823+
3824+
<row>
3825+
<entry role="catalog_table_entry">
3826+
<para role="column_definition">
3827+
<structfield>extends</structfield> <type>bigint</type>
3828+
</para>
3829+
<para>
3830+
Number of relation extend operations, each of the size specified in
3831+
<varname>op_bytes</varname>.
3832+
</para>
3833+
</entry>
3834+
</row>
3835+
3836+
<row>
3837+
<entry role="catalog_table_entry">
3838+
<para role="column_definition">
3839+
<structfield>op_bytes</structfield> <type>bigint</type>
3840+
</para>
3841+
<para>
3842+
The number of bytes per unit of I/O read, written, or extended.
3843+
</para>
3844+
<para>
3845+
Relation data reads, writes, and extends are done in
3846+
<varname>block_size</varname> units, derived from the build-time
3847+
parameter <symbol>BLCKSZ</symbol>, which is <literal>8192</literal> by
3848+
default.
3849+
</para>
3850+
</entry>
3851+
</row>
3852+
3853+
<row>
3854+
<entry role="catalog_table_entry">
3855+
<para role="column_definition">
3856+
<structfield>evictions</structfield> <type>bigint</type>
3857+
</para>
3858+
<para>
3859+
Number of times a block has been written out from a shared or local
3860+
buffer in order to make it available for another use.
3861+
</para>
3862+
<para>
3863+
In <varname>io_context</varname> <literal>normal</literal>, this counts
3864+
the number of times a block was evicted from a buffer and replaced with
3865+
another block. In <varname>io_context</varname>s
3866+
<literal>bulkwrite</literal>, <literal>bulkread</literal>, and
3867+
<literal>vacuum</literal>, this counts the number of times a block was
3868+
evicted from shared buffers in order to add the shared buffer to a
3869+
separate, size-limited ring buffer for use in a bulk I/O operation.
3870+
</para>
3871+
</entry>
3872+
</row>
3873+
3874+
<row>
3875+
<entry role="catalog_table_entry">
3876+
<para role="column_definition">
3877+
<structfield>reuses</structfield> <type>bigint</type>
3878+
</para>
3879+
<para>
3880+
The number of times an existing buffer in a size-limited ring buffer
3881+
outside of shared buffers was reused as part of an I/O operation in the
3882+
<literal>bulkread</literal>, <literal>bulkwrite</literal>, or
3883+
<literal>vacuum</literal> <varname>io_context</varname>s.
3884+
</para>
3885+
</entry>
3886+
</row>
3887+
3888+
<row>
3889+
<entry role="catalog_table_entry">
3890+
<para role="column_definition">
3891+
<structfield>fsyncs</structfield> <type>bigint</type>
3892+
</para>
3893+
<para>
3894+
Number of <literal>fsync</literal> calls. These are only tracked in
3895+
<varname>io_context</varname> <literal>normal</literal>.
3896+
</para>
3897+
</entry>
3898+
</row>
3899+
3900+
<row>
3901+
<entry role="catalog_table_entry">
3902+
<para role="column_definition">
3903+
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
3904+
</para>
3905+
<para>
3906+
Time at which these statistics were last reset.
3907+
</para>
3908+
</entry>
3909+
</row>
3910+
</tbody>
3911+
</tgroup>
3912+
</table>
3913+
3914+
<para>
3915+
Some backend types never perform I/O operations on some I/O objects and/or
3916+
in some I/O contexts. These rows are omitted from the view. For example, the
3917+
checkpointer does not checkpoint temporary tables, so there will be no rows
3918+
for <varname>backend_type</varname> <literal>checkpointer</literal> and
3919+
<varname>io_object</varname> <literal>temp relation</literal>.
3920+
</para>
3921+
3922+
<para>
3923+
In addition, some I/O operations will never be performed either by certain
3924+
backend types or on certain I/O objects and/or in certain I/O contexts.
3925+
These cells will be NULL. For example, temporary tables are not
3926+
<literal>fsync</literal>ed, so <varname>fsyncs</varname> will be NULL for
3927+
<varname>io_object</varname> <literal>temp relation</literal>. Also, the
3928+
background writer does not perform reads, so <varname>reads</varname> will
3929+
be NULL in rows for <varname>backend_type</varname> <literal>background
3930+
writer</literal>.
3931+
</para>
3932+
3933+
<para>
3934+
<structname>pg_stat_io</structname> can be used to inform database tuning.
3935+
For example:
3936+
<itemizedlist>
3937+
<listitem>
3938+
<para>
3939+
A high <varname>evictions</varname> count can indicate that shared
3940+
buffers should be increased.
3941+
</para>
3942+
</listitem>
3943+
<listitem>
3944+
<para>
3945+
Client backends rely on the checkpointer to ensure data is persisted to
3946+
permanent storage. Large numbers of <varname>fsyncs</varname> by
3947+
<literal>client backend</literal>s could indicate a misconfiguration of
3948+
shared buffers or of the checkpointer. More information on configuring
3949+
the checkpointer can be found in <xref linkend="wal-configuration"/>.
3950+
</para>
3951+
</listitem>
3952+
<listitem>
3953+
<para>
3954+
Normally, client backends should be able to rely on auxiliary processes
3955+
like the checkpointer and the background writer to write out dirty data
3956+
as much as possible. Large numbers of writes by client backends could
3957+
indicate a misconfiguration of shared buffers or of the checkpointer.
3958+
More information on configuring the checkpointer can be found in <xref
3959+
linkend="wal-configuration"/>.
3960+
</para>
3961+
</listitem>
3962+
</itemizedlist>
3963+
</para>
3964+
36723965

36733966
</sect2>
36743967

src/backend/catalog/system_views.sql

+15
Original file line numberDiff line numberDiff line change
@@ -1117,6 +1117,21 @@ CREATE VIEW pg_stat_bgwriter AS
11171117
pg_stat_get_buf_alloc() AS buffers_alloc,
11181118
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
11191119

1120+
CREATE VIEW pg_stat_io AS
1121+
SELECT
1122+
b.backend_type,
1123+
b.io_object,
1124+
b.io_context,
1125+
b.reads,
1126+
b.writes,
1127+
b.extends,
1128+
b.op_bytes,
1129+
b.evictions,
1130+
b.reuses,
1131+
b.fsyncs,
1132+
b.stats_reset
1133+
FROM pg_stat_get_io() b;
1134+
11201135
CREATE VIEW pg_stat_wal AS
11211136
SELECT
11221137
w.wal_records,

0 commit comments

Comments
 (0)