Skip to content

Commit 5d50873

Browse files
committed
Replace the BufMgrLock with separate locks on the lookup hashtable and
the freelist, plus per-buffer spinlocks that protect access to individual shared buffer headers. This requires abandoning a global freelist (since the freelist is a global contention point), which shoots down ARC and 2Q as well as plain LRU management. Adopt a clock sweep algorithm instead. Preliminary results show substantial improvement in multi-backend situations.
1 parent 5592a6c commit 5d50873

File tree

18 files changed

+1387
-1909
lines changed

18 files changed

+1387
-1909
lines changed

doc/src/sgml/runtime.sgml

Lines changed: 71 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.306 2005/03/02 19:58:54 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.307 2005/03/04 20:21:05 tgl Exp $
33
-->
44

55
<chapter Id="runtime">
@@ -1379,9 +1379,7 @@ SET ENABLE_SEQSCAN TO OFF;
13791379
Specifies the delay between activity rounds for the
13801380
background writer. In each round the writer issues writes
13811381
for some number of dirty buffers (controllable by the
1382-
following parameters). The selected buffers will always be
1383-
the least recently used ones among the currently dirty
1384-
buffers. It then sleeps for <varname>bgwriter_delay</>
1382+
following parameters). It then sleeps for <varname>bgwriter_delay</>
13851383
milliseconds, and repeats. The default value is 200. Note
13861384
that on many systems, the effective resolution of sleep
13871385
delays is 10 milliseconds; setting <varname>bgwriter_delay</>
@@ -1393,46 +1391,97 @@ SET ENABLE_SEQSCAN TO OFF;
13931391
</listitem>
13941392
</varlistentry>
13951393

1396-
<varlistentry id="guc-bgwriter-percent" xreflabel="bgwriter_percent">
1397-
<term><varname>bgwriter_percent</varname> (<type>integer</type>)</term>
1394+
<varlistentry id="guc-bgwriter-lru-percent" xreflabel="bgwriter_lru_percent">
1395+
<term><varname>bgwriter_lru_percent</varname> (<type>floating point</type>)</term>
13981396
<indexterm>
1399-
<primary><varname>bgwriter_percent</> configuration parameter</primary>
1397+
<primary><varname>bgwriter_lru_percent</> configuration parameter</primary>
14001398
</indexterm>
14011399
<listitem>
14021400
<para>
1403-
In each round, no more than this percentage of the currently
1404-
dirty buffers will be written (rounding up any fraction to
1405-
the next whole number of buffers). The default value is
1406-
1. This option can only be set at server start or in the
1401+
To reduce the probability that server processes will need to issue
1402+
their own writes, the background writer tries to write buffers that
1403+
are likely to be recycled soon. In each round, it examines up to
1404+
<varname>bgwriter_lru_percent</> of the buffers that are nearest to
1405+
being recycled, and writes any that are dirty.
1406+
The default value is 1.0 (this is a percentage of the total number
1407+
of shared buffers).
1408+
This option can only be set at server start or in the
1409+
<filename>postgresql.conf</filename> file.
1410+
</para>
1411+
</listitem>
1412+
</varlistentry>
1413+
1414+
<varlistentry id="guc-bgwriter-lru-maxpages" xreflabel="bgwriter_lru_maxpages">
1415+
<term><varname>bgwriter_lru_maxpages</varname> (<type>integer</type>)</term>
1416+
<indexterm>
1417+
<primary><varname>bgwriter_lru_maxpages</> configuration parameter</primary>
1418+
</indexterm>
1419+
<listitem>
1420+
<para>
1421+
In each round, no more than this many buffers will be written
1422+
as a result of scanning soon-to-be-recycled buffers.
1423+
The default value is 5.
1424+
This option can only be set at server start or in the
1425+
<filename>postgresql.conf</filename> file.
1426+
</para>
1427+
</listitem>
1428+
</varlistentry>
1429+
1430+
<varlistentry id="guc-bgwriter-all-percent" xreflabel="bgwriter_all_percent">
1431+
<term><varname>bgwriter_all_percent</varname> (<type>floating point</type>)</term>
1432+
<indexterm>
1433+
<primary><varname>bgwriter_all_percent</> configuration parameter</primary>
1434+
</indexterm>
1435+
<listitem>
1436+
<para>
1437+
To reduce the amount of work that will be needed at checkpoint time,
1438+
the background writer also does a circular scan through the entire
1439+
buffer pool, writing buffers that are found to be dirty.
1440+
In each round, it examines up to
1441+
<varname>bgwriter_all_percent</> of the buffers for this purpose.
1442+
The default value is 0.333 (this is a percentage of the total number
1443+
of shared buffers). With the default <varname>bgwriter_delay</>
1444+
setting, this will allow the entire shared buffer pool to be scanned
1445+
about once per minute.
1446+
This option can only be set at server start or in the
14071447
<filename>postgresql.conf</filename> file.
14081448
</para>
14091449
</listitem>
14101450
</varlistentry>
14111451

1412-
<varlistentry id="guc-bgwriter-maxpages" xreflabel="bgwriter_maxpages">
1413-
<term><varname>bgwriter_maxpages</varname> (<type>integer</type>)</term>
1452+
<varlistentry id="guc-bgwriter-all-maxpages" xreflabel="bgwriter_all_maxpages">
1453+
<term><varname>bgwriter_all_maxpages</varname> (<type>integer</type>)</term>
14141454
<indexterm>
1415-
<primary><varname>bgwriter_maxpages</> configuration parameter</primary>
1455+
<primary><varname>bgwriter_all_maxpages</> configuration parameter</primary>
14161456
</indexterm>
14171457
<listitem>
14181458
<para>
1419-
In each round, no more than this many dirty buffers will be
1420-
written. The default value is 100. This option can only be
1421-
set at server start or in the
1459+
In each round, no more than this many buffers will be written
1460+
as a result of the scan of the entire buffer pool. (If this
1461+
limit is reached, the scan stops, and resumes at the next buffer
1462+
during the next round.)
1463+
The default value is 5.
1464+
This option can only be set at server start or in the
14221465
<filename>postgresql.conf</filename> file.
14231466
</para>
14241467
</listitem>
14251468
</varlistentry>
14261469
</variablelist>
14271470

14281471
<para>
1429-
Smaller values of <varname>bgwriter_percent</varname> and
1430-
<varname>bgwriter_maxpages</varname> reduce the extra I/O load
1472+
Smaller values of <varname>bgwriter_all_percent</varname> and
1473+
<varname>bgwriter_all_maxpages</varname> reduce the extra I/O load
14311474
caused by the background writer, but leave more work to be done
14321475
at checkpoint time. To reduce load spikes at checkpoints,
1433-
increase the values. To disable background writing entirely,
1434-
set <varname>bgwriter_percent</varname> and/or
1435-
<varname>bgwriter_maxpages</varname> to zero.
1476+
increase these two values.
1477+
Similarly, smaller values of <varname>bgwriter_lru_percent</varname> and
1478+
<varname>bgwriter_lru_maxpages</varname> reduce the extra I/O load
1479+
caused by the background writer, but make it more likely that server
1480+
processes will have to issue writes for themselves, delaying interactive
1481+
queries.
1482+
To disable background writing entirely,
1483+
set both <varname>maxpages</varname> values and/or both
1484+
<varname>percent</varname> values to zero.
14361485
</para>
14371486
</sect3>
14381487

@@ -3866,20 +3915,6 @@ plruby.bar = true # generates error, unknown class name
38663915
</listitem>
38673916
</varlistentry>
38683917

3869-
<varlistentry id="guc-debug-shared-buffers" xreflabel="debug_shared_buffers">
3870-
<term><varname>debug_shared_buffers</varname> (<type>integer</type>)</term>
3871-
<indexterm>
3872-
<primary><varname>debug_shared_buffers</> configuration parameter</primary>
3873-
</indexterm>
3874-
<listitem>
3875-
<para>
3876-
Number of seconds between ARC reports.
3877-
If set greater than zero, emit ARC statistics to the log every so many
3878-
seconds. Zero (the default) disables reporting.
3879-
</para>
3880-
</listitem>
3881-
</varlistentry>
3882-
38833918
<varlistentry id="guc-pre-auth-delay" xreflabel="pre_auth_delay">
38843919
<term><varname>pre_auth_delay</varname> (<type>integer</type>)</term>
38853920
<indexterm>

src/backend/catalog/index.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/catalog/index.c,v 1.244 2005/01/10 20:02:19 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/catalog/index.c,v 1.245 2005/03/04 20:21:05 tgl Exp $
1212
*
1313
*
1414
* INTERFACE ROUTINES
@@ -1060,7 +1060,6 @@ setRelhasindex(Oid relid, bool hasindex, bool isprimary, Oid reltoastidxid)
10601060
/* Send out shared cache inval if necessary */
10611061
if (!IsBootstrapProcessingMode())
10621062
CacheInvalidateHeapTuple(pg_class, tuple);
1063-
BufferSync(-1, -1);
10641063
}
10651064
else if (dirty)
10661065
{

src/backend/commands/dbcommands.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
*
1616
*
1717
* IDENTIFICATION
18-
* $PostgreSQL: pgsql/src/backend/commands/dbcommands.c,v 1.151 2005/02/26 18:43:33 tgl Exp $
18+
* $PostgreSQL: pgsql/src/backend/commands/dbcommands.c,v 1.152 2005/03/04 20:21:05 tgl Exp $
1919
*
2020
*-------------------------------------------------------------------------
2121
*/
@@ -339,7 +339,7 @@ createdb(const CreatedbStmt *stmt)
339339
* up-to-date for the copy. (We really only need to flush buffers for
340340
* the source database, but bufmgr.c provides no API for that.)
341341
*/
342-
BufferSync(-1, -1);
342+
BufferSync();
343343

344344
/*
345345
* Close virtual file descriptors so the kernel has more available for
@@ -1201,7 +1201,7 @@ dbase_redo(XLogRecPtr lsn, XLogRecord *record)
12011201
* up-to-date for the copy. (We really only need to flush buffers for
12021202
* the source database, but bufmgr.c provides no API for that.)
12031203
*/
1204-
BufferSync(-1, -1);
1204+
BufferSync();
12051205

12061206
#ifndef WIN32
12071207

src/backend/commands/vacuum.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
*
1414
*
1515
* IDENTIFICATION
16-
* $PostgreSQL: pgsql/src/backend/commands/vacuum.c,v 1.302 2005/02/26 18:43:33 tgl Exp $
16+
* $PostgreSQL: pgsql/src/backend/commands/vacuum.c,v 1.303 2005/03/04 20:21:06 tgl Exp $
1717
*
1818
*-------------------------------------------------------------------------
1919
*/
@@ -36,7 +36,6 @@
3636
#include "commands/vacuum.h"
3737
#include "executor/executor.h"
3838
#include "miscadmin.h"
39-
#include "storage/buf_internals.h"
4039
#include "storage/freespace.h"
4140
#include "storage/sinval.h"
4241
#include "storage/smgr.h"

src/backend/postmaster/bgwriter.c

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
*
3838
*
3939
* IDENTIFICATION
40-
* $PostgreSQL: pgsql/src/backend/postmaster/bgwriter.c,v 1.14 2005/02/19 23:16:15 tgl Exp $
40+
* $PostgreSQL: pgsql/src/backend/postmaster/bgwriter.c,v 1.15 2005/03/04 20:21:06 tgl Exp $
4141
*
4242
*-------------------------------------------------------------------------
4343
*/
@@ -116,9 +116,6 @@ static BgWriterShmemStruct *BgWriterShmem;
116116
* GUC parameters
117117
*/
118118
int BgWriterDelay = 200;
119-
int BgWriterPercent = 1;
120-
int BgWriterMaxPages = 100;
121-
122119
int CheckPointTimeout = 300;
123120
int CheckPointWarning = 30;
124121

@@ -274,7 +271,6 @@ BackgroundWriterMain(void)
274271
bool force_checkpoint = false;
275272
time_t now;
276273
int elapsed_secs;
277-
int n;
278274
long udelay;
279275

280276
/*
@@ -365,16 +361,13 @@ BackgroundWriterMain(void)
365361
* checkpoints happen at a predictable spacing.
366362
*/
367363
last_checkpoint_time = now;
368-
369-
/* Nap for configured time before rechecking */
370-
n = 1;
371364
}
372365
else
373-
n = BufferSync(BgWriterPercent, BgWriterMaxPages);
366+
BgBufferSync();
374367

375368
/*
376-
* Nap for the configured time or sleep for 10 seconds if there
377-
* was nothing to do at all.
369+
* Nap for the configured time, or sleep for 10 seconds if there
370+
* is no bgwriter activity configured.
378371
*
379372
* On some platforms, signals won't interrupt the sleep. To ensure
380373
* we respond reasonably promptly when someone signals us, break
@@ -383,7 +376,11 @@ BackgroundWriterMain(void)
383376
*
384377
* We absorb pending requests after each short sleep.
385378
*/
386-
udelay = ((n > 0) ? BgWriterDelay : 10000) * 1000L;
379+
if ((bgwriter_all_percent > 0.0 && bgwriter_all_maxpages > 0) ||
380+
(bgwriter_lru_percent > 0.0 && bgwriter_lru_maxpages > 0))
381+
udelay = BgWriterDelay * 1000L;
382+
else
383+
udelay = 10000000L;
387384
while (udelay > 1000000L)
388385
{
389386
if (got_SIGHUP || checkpoint_requested || shutdown_requested)

0 commit comments

Comments
 (0)