@@ -30,11 +30,23 @@ PostgreSQL documentation
30
30
<title>Description</title>
31
31
32
32
<para>
33
- <application>pg_test_timing</application> is a tool to measure the timing overhead
34
- on your system and confirm that the system time never moves backwards.
33
+ <application>pg_test_timing</application> is a tool to measure the
34
+ timing overhead on your system and confirm that the system time never
35
+ moves backwards. It simply reads the system clock over and over again
36
+ as fast as it can for a specified length of time, and then prints
37
+ statistics about the observed differences in successive clock readings.
38
+ </para>
39
+ <para>
40
+ Smaller (but not zero) differences are better, since they imply both
41
+ more-precise clock hardware and less overhead to collect a clock reading.
35
42
Systems that are slow to collect timing data can give less accurate
36
43
<command>EXPLAIN ANALYZE</command> results.
37
44
</para>
45
+ <para>
46
+ This tool is also helpful to determine if
47
+ the <varname>track_io_timing</varname> configuration parameter is likely
48
+ to produce useful results.
49
+ </para>
38
50
</refsect1>
39
51
40
52
<refsect1>
@@ -59,6 +71,21 @@ PostgreSQL documentation
59
71
</listitem>
60
72
</varlistentry>
61
73
74
+ <varlistentry>
75
+ <term><option>-c <replaceable class="parameter">cutoff</replaceable></option></term>
76
+ <term><option>--cutoff=<replaceable class="parameter">cutoff</replaceable></option></term>
77
+ <listitem>
78
+ <para>
79
+ Specifies the cutoff percentage for the list of exact observed
80
+ timing durations (that is, the changes in the system clock value
81
+ from one reading to the next). The list will end once the running
82
+ percentage total reaches or exceeds this value, except that the
83
+ largest observed duration will always be printed. The default
84
+ cutoff is 99.99.
85
+ </para>
86
+ </listitem>
87
+ </varlistentry>
88
+
62
89
<varlistentry>
63
90
<term><option>-V</option></term>
64
91
<term><option>--version</option></term>
@@ -92,212 +119,92 @@ PostgreSQL documentation
92
119
<title>Interpreting Results</title>
93
120
94
121
<para>
95
- Good results will show most (>90%) individual timing calls take less than
96
- one microsecond. Average per loop overhead will be even lower, below 100
97
- nanoseconds. This example from an Intel i7-860 system using a TSC clock
98
- source shows excellent performance:
99
-
100
- <screen><![CDATA[
101
- Testing timing overhead for 3 seconds.
102
- Per loop time including overhead: 35.96 ns
103
- Histogram of timing durations:
104
- < us % of total count
105
- 1 96.40465 80435604
106
- 2 3.59518 2999652
107
- 4 0.00015 126
108
- 8 0.00002 13
109
- 16 0.00000 2
110
- ]]></screen>
122
+ The first block of output has four columns, with rows showing a
123
+ shifted-by-one log2(ns) histogram of timing durations (that is, the
124
+ differences between successive clock readings). This is not the
125
+ classic log2(n+1) histogram as it counts zeros separately and then
126
+ switches to log2(ns) starting from value 1.
111
127
</para>
112
-
113
128
<para>
114
- Note that different units are used for the per loop time than the
115
- histogram. The loop can have resolution within a few nanoseconds (ns),
116
- while the individual timing calls can only resolve down to one microsecond
117
- (us).
129
+ The columns are:
130
+ <itemizedlist spacing="compact">
131
+ <listitem>
132
+ <simpara>nanosecond value that is >= the durations in this
133
+ bucket</simpara>
134
+ </listitem>
135
+ <listitem>
136
+ <simpara>percentage of durations in this bucket</simpara>
137
+ </listitem>
138
+ <listitem>
139
+ <simpara>running-sum percentage of durations in this and previous
140
+ buckets</simpara>
141
+ </listitem>
142
+ <listitem>
143
+ <simpara>count of durations in this bucket</simpara>
144
+ </listitem>
145
+ </itemizedlist>
118
146
</para>
119
-
120
- </refsect2>
121
- <refsect2>
122
- <title>Measuring Executor Timing Overhead</title>
123
-
124
147
<para>
125
- When the query executor is running a statement using
126
- <command>EXPLAIN ANALYZE</command>, individual operations are timed as well
127
- as showing a summary. The overhead of your system can be checked by
128
- counting rows with the <application>psql</application> program:
129
-
130
- <screen>
131
- CREATE TABLE t AS SELECT * FROM generate_series(1,100000);
132
- \timing
133
- SELECT COUNT(*) FROM t;
134
- EXPLAIN ANALYZE SELECT COUNT(*) FROM t;
135
- </screen>
148
+ The second block of output goes into more detail, showing the exact
149
+ timing differences observed. For brevity this list is cut off when the
150
+ running-sum percentage exceeds the user-selectable cutoff value.
151
+ However, the largest observed difference is always shown.
136
152
</para>
137
-
138
153
<para>
139
- The i7-860 system measured runs the count query in 9.8 ms while
140
- the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms, each
141
- processing just over 100,000 rows. That 6.8 ms difference means the timing
142
- overhead per row is 68 ns, about twice what pg_test_timing estimated it
143
- would be. Even that relatively small amount of overhead is making the fully
144
- timed count statement take almost 70% longer. On more substantial queries,
145
- the timing overhead would be less problematic.
154
+ The example results below show that 99.99% of timing loops took between
155
+ 8 and 31 nanoseconds, with the worst case somewhere between 32768 and
156
+ 65535 nanoseconds. In the second block, we can see that typical loop
157
+ time is 16 nanoseconds, and the readings appear to have full nanosecond
158
+ precision.
146
159
</para>
147
160
148
- </refsect2>
149
-
150
- <refsect2>
151
- <title>Changing Time Sources</title>
152
161
<para>
153
- On some newer Linux systems, it's possible to change the clock source used
154
- to collect timing data at any time. A second example shows the slowdown
155
- possible from switching to the slower acpi_pm time source, on the same
156
- system used for the fast results above:
157
-
158
162
<screen><![CDATA[
159
- # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
160
- tsc hpet acpi_pm
161
- # echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
162
- # pg_test_timing
163
- Per loop time including overhead: 722.92 ns
163
+ Testing timing overhead for 3 seconds.
164
+ Per loop time including overhead: 16.40 ns
164
165
Histogram of timing durations:
165
- < us % of total count
166
- 1 27.84870 1155682
167
- 2 72.05956 2990371
168
- 4 0.07810 3241
169
- 8 0.01357 563
170
- 16 0.00007 3
166
+ <= ns % of total running % count
167
+ 0 0.0000 0.0000 0
168
+ 1 0.0000 0.0000 0
169
+ 3 0.0000 0.0000 0
170
+ 7 0.0000 0.0000 0
171
+ 15 4.5452 4.5452 8313178
172
+ 31 95.4527 99.9979 174581501
173
+ 63 0.0001 99.9981 253
174
+ 127 0.0001 99.9982 165
175
+ 255 0.0000 99.9982 35
176
+ 511 0.0000 99.9982 1
177
+ 1023 0.0013 99.9994 2300
178
+ 2047 0.0004 99.9998 690
179
+ 4095 0.0000 99.9998 9
180
+ 8191 0.0000 99.9998 8
181
+ 16383 0.0002 100.0000 337
182
+ 32767 0.0000 100.0000 2
183
+ 65535 0.0000 100.0000 1
184
+
185
+ Observed timing durations up to 99.9900%:
186
+ ns % of total running % count
187
+ 15 4.5452 4.5452 8313178
188
+ 16 58.3785 62.9237 106773354
189
+ 17 33.6840 96.6078 61607584
190
+ 18 3.1151 99.7229 5697480
191
+ 19 0.2638 99.9867 482570
192
+ 20 0.0093 99.9960 17054
193
+ ...
194
+ 38051 0.0000 100.0000 1
171
195
]]></screen>
172
196
</para>
173
197
174
- <para>
175
- In this configuration, the sample <command>EXPLAIN ANALYZE</command> above
176
- takes 115.9 ms. That's 1061 ns of timing overhead, again a small multiple
177
- of what's measured directly by this utility. That much timing overhead
178
- means the actual query itself is only taking a tiny fraction of the
179
- accounted for time, most of it is being consumed in overhead instead. In
180
- this configuration, any <command>EXPLAIN ANALYZE</command> totals involving
181
- many timed operations would be inflated significantly by timing overhead.
182
- </para>
183
-
184
- <para>
185
- FreeBSD also allows changing the time source on the fly, and it logs
186
- information about the timer selected during boot:
187
-
188
- <screen>
189
- # dmesg | grep "Timecounter"
190
- Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
191
- Timecounter "i8254" frequency 1193182 Hz quality 0
192
- Timecounters tick every 10.000 msec
193
- Timecounter "TSC" frequency 2531787134 Hz quality 800
194
- # sysctl kern.timecounter.hardware=TSC
195
- kern.timecounter.hardware: ACPI-fast -> TSC
196
- </screen>
197
- </para>
198
-
199
- <para>
200
- Other systems may only allow setting the time source on boot. On older
201
- Linux systems the "clock" kernel setting is the only way to make this sort
202
- of change. And even on some more recent ones, the only option you'll see
203
- for a clock source is "jiffies". Jiffies are the older Linux software clock
204
- implementation, which can have good resolution when it's backed by fast
205
- enough timing hardware, as in this example:
206
-
207
- <screen><![CDATA[
208
- $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
209
- jiffies
210
- $ dmesg | grep time.c
211
- time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
212
- time.c: Detected 2400.153 MHz processor.
213
- $ pg_test_timing
214
- Testing timing overhead for 3 seconds.
215
- Per timing duration including loop overhead: 97.75 ns
216
- Histogram of timing durations:
217
- < us % of total count
218
- 1 90.23734 27694571
219
- 2 9.75277 2993204
220
- 4 0.00981 3010
221
- 8 0.00007 22
222
- 16 0.00000 1
223
- 32 0.00000 1
224
- ]]></screen></para>
225
-
226
198
</refsect2>
227
-
228
- <refsect2>
229
- <title>Clock Hardware and Timing Accuracy</title>
230
-
231
- <para>
232
- Collecting accurate timing information is normally done on computers using
233
- hardware clocks with various levels of accuracy. With some hardware the
234
- operating systems can pass the system clock time almost directly to
235
- programs. A system clock can also be derived from a chip that simply
236
- provides timing interrupts, periodic ticks at some known time interval. In
237
- either case, operating system kernels provide a clock source that hides
238
- these details. But the accuracy of that clock source and how quickly it can
239
- return results varies based on the underlying hardware.
240
- </para>
241
-
242
- <para>
243
- Inaccurate time keeping can result in system instability. Test any change
244
- to the clock source very carefully. Operating system defaults are sometimes
245
- made to favor reliability over best accuracy. And if you are using a virtual
246
- machine, look into the recommended time sources compatible with it. Virtual
247
- hardware faces additional difficulties when emulating timers, and there are
248
- often per operating system settings suggested by vendors.
249
- </para>
250
-
251
- <para>
252
- The Time Stamp Counter (TSC) clock source is the most accurate one available
253
- on current generation CPUs. It's the preferred way to track the system time
254
- when it's supported by the operating system and the TSC clock is
255
- reliable. There are several ways that TSC can fail to provide an accurate
256
- timing source, making it unreliable. Older systems can have a TSC clock that
257
- varies based on the CPU temperature, making it unusable for timing. Trying
258
- to use TSC on some older multicore CPUs can give a reported time that's
259
- inconsistent among multiple cores. This can result in the time going
260
- backwards, a problem this program checks for. And even the newest systems
261
- can fail to provide accurate TSC timing with very aggressive power saving
262
- configurations.
263
- </para>
264
-
265
- <para>
266
- Newer operating systems may check for the known TSC problems and switch to a
267
- slower, more stable clock source when they are seen. If your system
268
- supports TSC time but doesn't default to that, it may be disabled for a good
269
- reason. And some operating systems may not detect all the possible problems
270
- correctly, or will allow using TSC even in situations where it's known to be
271
- inaccurate.
272
- </para>
273
-
274
- <para>
275
- The High Precision Event Timer (HPET) is the preferred timer on systems
276
- where it's available and TSC is not accurate. The timer chip itself is
277
- programmable to allow up to 100 nanosecond resolution, but you may not see
278
- that much accuracy in your system clock.
279
- </para>
280
-
281
- <para>
282
- Advanced Configuration and Power Interface (ACPI) provides a Power
283
- Management (PM) Timer, which Linux refers to as the acpi_pm. The clock
284
- derived from acpi_pm will at best provide 300 nanosecond resolution.
285
- </para>
286
-
287
- <para>
288
- Timers used on older PC hardware include the 8254 Programmable Interval
289
- Timer (PIT), the real-time clock (RTC), the Advanced Programmable Interrupt
290
- Controller (APIC) timer, and the Cyclone timer. These timers aim for
291
- millisecond resolution.
292
- </para>
293
- </refsect2>
294
199
</refsect1>
295
200
296
201
<refsect1>
297
202
<title>See Also</title>
298
203
299
204
<simplelist type="inline">
300
205
<member><xref linkend="sql-explain"/></member>
206
+ <member><ulink url="https://wiki.postgresql.org/wiki/Pg_test_timing">Wiki
207
+ discussion about timing</ulink></member>
301
208
</simplelist>
302
209
</refsect1>
303
210
</refentry>
0 commit comments