You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 0004_pg_stat_statements_part_1.md
+20-17Lines changed: 20 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,16 @@ Originally from: [tweet](https://twitter.com/samokhvalov/status/1708244676313317
4
4
5
5
# How to work with pg_stat_statments, part 1
6
6
7
-
There are two big areas of query optimization:
7
+
## 2 branches of query optimization
8
+
There are two big branches of query optimization:
8
9
1. "Micro" optimization: analysis and improvement of particular queries. Main tool: `EXPLAIN`.
9
10
2. "Macro" optimization: analysis of whole or large parts of workload, segmentation of it, studying characteristics, going from top to down, to identify and improve the parts that behave the worst. Main tools: `pg_stat_statements` (and additions or alternatives), wait event analysis, and Postgres logs.
10
11
11
12
Today we focus on how to read and use `pg_stat_statements`, starting from basics and proceeding to using the data from it for macro optimization.
Extension `pg_stat_statements` (for short, "pgss") became standard de-facto for macro-analysis.
16
18
17
19
It tracks all queries, aggregating them to query groups – called "normalized queries" – where parameters are
@@ -27,6 +29,20 @@ The view pg_stat_statements has 3 kinds of columns:
27
29
2. Descriptive columns: ID of database (`dbid`), user (`userid`), and the query itself (`query`).
28
30
3. Metrics. Almost all of them are cumulative: `calls`, `total_time`, `rows`, etc. Non-cumulative: `stddev_plan_time`, `stddev_exec_time`, `min_exec_time`, etc. In this post, we'll focus only on cumulative ones.
29
31
32
+
// Below I sometimes call normalized query "query group" or simply "group".
33
+
34
+
Let's mention some metrics that are usually most frequently used in macro optimization (full list: https://postgresql.org/docs/current/pgstatstatements.html#PGSTATSTATEMENTS-PG-STAT-STATEMENTS):
35
+
1.`calls` – how many query calls happened for this query group (normalized query)
36
+
2.`total_plan_time` and `total_exec_time` – aggregated duration for planning and execution for this group (again, remember: failed queries are not tracked, including those that failed on `statement_timeout`)
37
+
3.`rows` – how many rows returned by queries in this group
38
+
4.`shared_blks_hit` and `shared_blks_read` – number if hit and read operations from the buffer pool. Two important notes here:
39
+
- "read" here means a read from the buffer pool – it is not necessarily a physical read from disk, since data can be cached in the OS page cache. So we cannot say these reads are reads from disk. Some monitoring systems make this mistake, but there are cases that this nuance is essential for our analysis to produce correct results and conclusions.
40
+
- the names "blocks hit" and "blocks read" might be a little bit misleading, suggesting that here we talk about data volumes – number of blocks (buffers). While aggregation here definitely make sense, we must keep in mind that the same buffers may be read or hit multiple times. So instead of "blocks have been hit" it is better to say "block hits".
41
+
5.`wal_bytes` – how many bytes are written to WAL by queries in this group
42
+
43
+
There are many more other interesting metrics, it is recommended to explore all of them: https://postgresql.org/docs/current/pgstatstatements.html.
44
+
45
+
## Dealing with cumulative metrics in pgss
30
46
To read and interpret data from pgss, you need three steps:
31
47
1. Take two snapshots corresponding to two points of time.
32
48
2. Calculate the diff for each cumulative metric and for time difference for the two points in time
@@ -44,22 +60,9 @@ Step 3 can be also applied not to particular normalized queries on a single host
44
60
45
61
If your monitoring system supports pgss, you don't need to deal with working with snapshots manually – although, keep in mind that I personally don't know any monitoring that works with pgss perfectly, preserving all kinds of information discussed in this post (and I studied quite a few of Postgres monitoring tools).
46
62
47
-
// Below I sometimes call normalized query "query group" or simply "group".
48
-
49
-
Let's mention some metrics that are usually most frequently used in macro optimization (full list: https://postgresql.org/docs/current/pgstatstatements.html#PGSTATSTATEMENTS-PG-STAT-STATEMENTS):
50
-
1.`calls` – how many query calls happened for this query group (normalized query)
51
-
2.`total_plan_time` and `total_exec_time` – aggregated duration for planning and execution for this group (again, remember: failed queries are not tracked, including those that failed on `statement_timeout`)
52
-
3.`rows` – how many rows returned by queries in this group
53
-
4.`shared_blks_hit` and `shared_blks_read` – number if hit and read operations from the buffer pool. Two important notes here:
54
-
- "read" here means a read from the buffer pool – it is not necessarily a physical read from disk, since data can be cached in the OS page cache. So we cannot say these reads are reads from disk. Some monitoring systems make this mistake, but there are cases that this nuance is essential for our analysis to produce correct results and conclusions.
55
-
- the names "blocks hit" and "blocks read" might be a little bit misleading, suggesting that here we talk about data volumes – number of blocks (buffers). While aggregation here definitely make sense, we must keep in mind that the same buffers may be read or hit multiple times. So instead of "blocks have been hit" it is better to say "block hits".
56
-
5.`wal_bytes` – how many bytes are written to WAL by queries in this group
57
-
58
-
There are many more other interesting metrics, it is worth exploring them all.
59
-
60
-
Once you obtained 2 snapshots of pgss (remembering timestamp when they were collected), let's consider practical meaning of the three derivatives we discussed:
63
+
Assuming you successfully obtained 2 snapshots of pgss (remembering timestamp when they were collected) or use proper monitoring tool, let's consider practical meaning of the three derivatives we discussed.
61
64
62
-
Derivative 1. Time-based differentiation
65
+
## Derivative 1. Time-based differentiation
63
66
64
67
*`dM/dt`, where `M` is `calls` – the meaning is simple. It's QPS (queries per second). If we talk about particular group (normalized query), it's that all queries in this group have. `10,000` is pretty large so, probably, you need to improve the client (app) behavior to reduce it, `10` is pretty small (of course, depending on situation). If we consider this derivative for whole node, it's our "global QPS".
65
68
@@ -77,4 +80,4 @@ That's it for the part 1 of pgss-related howto, in next parts we'll talk about d
77
80
78
81
Let me know if it was useful, and please share with your colleagues and any people who work with
0 commit comments