|
1 |
| -<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.66 2010/04/13 14:15:25 momjian Exp $ --> |
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.67 2010/07/07 14:42:09 momjian Exp $ --> |
2 | 2 |
|
3 | 3 | <chapter id="wal">
|
4 | 4 | <title>Reliability and the Write-Ahead Log</title>
|
|
48 | 48 | some later time. Such caches can be a reliability hazard because the
|
49 | 49 | memory in the disk controller cache is volatile, and will lose its
|
50 | 50 | contents in a power failure. Better controller cards have
|
51 |
| - <firstterm>battery-backed</> caches, meaning the card has a battery that |
| 51 | + <firstterm>battery-backed unit</> (<acronym>BBU</>) caches, meaning |
| 52 | + the card has a battery that |
52 | 53 | maintains power to the cache in case of system power loss. After power
|
53 | 54 | is restored the data will be written to the disk drives.
|
54 | 55 | </para>
|
55 | 56 |
|
56 | 57 | <para>
|
57 | 58 | And finally, most disk drives have caches. Some are write-through
|
58 |
| - while some are write-back, and the |
59 |
| - same concerns about data loss exist for write-back drive caches as |
60 |
| - exist for disk controller caches. Consumer-grade IDE and SATA drives are |
61 |
| - particularly likely to have write-back caches that will not survive a |
62 |
| - power failure, though <acronym>ATAPI-6</> introduced a drive cache |
63 |
| - flush command (FLUSH CACHE EXT) that some file systems use, e.g. <acronym>ZFS</>. |
64 |
| - Many solid-state drives (SSD) also have volatile write-back |
65 |
| - caches, and many do not honor cache flush commands by default. |
| 59 | + while some are write-back, and the same concerns about data loss |
| 60 | + exist for write-back drive caches as exist for disk controller |
| 61 | + caches. Consumer-grade IDE and SATA drives are particularly likely |
| 62 | + to have write-back caches that will not survive a power failure, |
| 63 | + though <acronym>ATAPI-6</> introduced a drive cache flush command |
| 64 | + (<command>FLUSH CACHE EXT</>) that some file systems use, e.g. |
| 65 | + <acronym>ZFS</>, <acronym>ext4</>. (The SCSI command |
| 66 | + <command>SYNCHRONIZE CACHE</> has long been available.) Many |
| 67 | + solid-state drives (SSD) also have volatile write-back caches, and |
| 68 | + many do not honor cache flush commands by default. |
| 69 | + </para> |
| 70 | + |
| 71 | + <para> |
66 | 72 | To check write caching on <productname>Linux</> use
|
67 | 73 | <command>hdparm -I</>; it is enabled if there is a <literal>*</> next
|
68 | 74 | to <literal>Write cache</>; <command>hdparm -W</> to turn off
|
|
82 | 88 | <literal>fsync_writethrough</> never do write caching.
|
83 | 89 | </para>
|
84 | 90 |
|
| 91 | + <para> |
| 92 | + Many file systems that use write barriers (e.g. <acronym>ZFS</>, |
| 93 | + <acronym>ext4</>) internally use <command>FLUSH CACHE EXT</> or |
| 94 | + <command>SYNCHRONIZE CACHE</> commands to flush data to the platers on |
| 95 | + write-back-enabled drives. Unfortunately, such write barrier file |
| 96 | + systems behave suboptimally when combined with battery-backed unit |
| 97 | + (<acronym>BBU</>) disk controllers. In such setups, the synchronize |
| 98 | + command forces all data from the BBU to the disks, eliminating much |
| 99 | + of the benefit of the BBU. You can run the utility |
| 100 | + <filename>src/tools/fsync</> in the PostgreSQL source tree to see |
| 101 | + if you are effected. If you are effected, the performance benefits |
| 102 | + of the BBU cache can be regained by turning off write barriers in |
| 103 | + the file system or reconfiguring the disk controller, if that is |
| 104 | + an option. If write barriers are turned off, make sure the battery |
| 105 | + remains active; a faulty battery can potentially lead to data loss. |
| 106 | + Hopefully file system and disk controller designers will eventually |
| 107 | + address this suboptimal behavior. |
| 108 | + </para> |
| 109 | + |
85 | 110 | <para>
|
86 | 111 | When the operating system sends a write request to the storage hardware,
|
87 | 112 | there is little it can do to make sure the data has arrived at a truly
|
|
0 commit comments