Skip to content

Commit e324348

Browse files
committed
Document the interaction of write-barrier-enabled file systems, and BBU
caches, per June email thread.
1 parent 20be0d4 commit e324348

File tree

1 file changed

+35
-10
lines changed

1 file changed

+35
-10
lines changed

doc/src/sgml/wal.sgml

Lines changed: 35 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.66 2010/04/13 14:15:25 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.67 2010/07/07 14:42:09 momjian Exp $ -->
22

33
<chapter id="wal">
44
<title>Reliability and the Write-Ahead Log</title>
@@ -48,21 +48,27 @@
4848
some later time. Such caches can be a reliability hazard because the
4949
memory in the disk controller cache is volatile, and will lose its
5050
contents in a power failure. Better controller cards have
51-
<firstterm>battery-backed</> caches, meaning the card has a battery that
51+
<firstterm>battery-backed unit</> (<acronym>BBU</>) caches, meaning
52+
the card has a battery that
5253
maintains power to the cache in case of system power loss. After power
5354
is restored the data will be written to the disk drives.
5455
</para>
5556

5657
<para>
5758
And finally, most disk drives have caches. Some are write-through
58-
while some are write-back, and the
59-
same concerns about data loss exist for write-back drive caches as
60-
exist for disk controller caches. Consumer-grade IDE and SATA drives are
61-
particularly likely to have write-back caches that will not survive a
62-
power failure, though <acronym>ATAPI-6</> introduced a drive cache
63-
flush command (FLUSH CACHE EXT) that some file systems use, e.g. <acronym>ZFS</>.
64-
Many solid-state drives (SSD) also have volatile write-back
65-
caches, and many do not honor cache flush commands by default.
59+
while some are write-back, and the same concerns about data loss
60+
exist for write-back drive caches as exist for disk controller
61+
caches. Consumer-grade IDE and SATA drives are particularly likely
62+
to have write-back caches that will not survive a power failure,
63+
though <acronym>ATAPI-6</> introduced a drive cache flush command
64+
(<command>FLUSH CACHE EXT</>) that some file systems use, e.g.
65+
<acronym>ZFS</>, <acronym>ext4</>. (The SCSI command
66+
<command>SYNCHRONIZE CACHE</> has long been available.) Many
67+
solid-state drives (SSD) also have volatile write-back caches, and
68+
many do not honor cache flush commands by default.
69+
</para>
70+
71+
<para>
6672
To check write caching on <productname>Linux</> use
6773
<command>hdparm -I</>; it is enabled if there is a <literal>*</> next
6874
to <literal>Write cache</>; <command>hdparm -W</> to turn off
@@ -82,6 +88,25 @@
8288
<literal>fsync_writethrough</> never do write caching.
8389
</para>
8490

91+
<para>
92+
Many file systems that use write barriers (e.g. <acronym>ZFS</>,
93+
<acronym>ext4</>) internally use <command>FLUSH CACHE EXT</> or
94+
<command>SYNCHRONIZE CACHE</> commands to flush data to the platers on
95+
write-back-enabled drives. Unfortunately, such write barrier file
96+
systems behave suboptimally when combined with battery-backed unit
97+
(<acronym>BBU</>) disk controllers. In such setups, the synchronize
98+
command forces all data from the BBU to the disks, eliminating much
99+
of the benefit of the BBU. You can run the utility
100+
<filename>src/tools/fsync</> in the PostgreSQL source tree to see
101+
if you are effected. If you are effected, the performance benefits
102+
of the BBU cache can be regained by turning off write barriers in
103+
the file system or reconfiguring the disk controller, if that is
104+
an option. If write barriers are turned off, make sure the battery
105+
remains active; a faulty battery can potentially lead to data loss.
106+
Hopefully file system and disk controller designers will eventually
107+
address this suboptimal behavior.
108+
</para>
109+
85110
<para>
86111
When the operating system sends a write request to the storage hardware,
87112
there is little it can do to make sure the data has arrived at a truly

0 commit comments

Comments
 (0)