|
27 | 27 | </para>
|
28 | 28 |
|
29 | 29 | <para>
|
30 |
| - While forcing data periodically to the disk platters might seem like |
| 30 | + While forcing data to the disk platters periodically might seem like |
31 | 31 | a simple operation, it is not. Because disk drives are dramatically
|
32 | 32 | slower than main memory and CPUs, several layers of caching exist
|
33 | 33 | between the computer's main memory and the disk platters.
|
|
48 | 48 | some later time. Such caches can be a reliability hazard because the
|
49 | 49 | memory in the disk controller cache is volatile, and will lose its
|
50 | 50 | contents in a power failure. Better controller cards have
|
51 |
| - <firstterm>battery-backed unit</> (<acronym>BBU</>) caches, meaning |
| 51 | + <firstterm>battery-backup units</> (<acronym>BBU</>s), meaning |
52 | 52 | the card has a battery that
|
53 | 53 | maintains power to the cache in case of system power loss. After power
|
54 | 54 | is restored the data will be written to the disk drives.
|
|
57 | 57 | <para>
|
58 | 58 | And finally, most disk drives have caches. Some are write-through
|
59 | 59 | while some are write-back, and the same concerns about data loss
|
60 |
| - exist for write-back drive caches as exist for disk controller |
| 60 | + exist for write-back drive caches as for disk controller |
61 | 61 | caches. Consumer-grade IDE and SATA drives are particularly likely
|
62 |
| - to have write-back caches that will not survive a power failure, |
63 |
| - though <acronym>ATAPI-6</> introduced a drive cache flush command |
64 |
| - (<command>FLUSH CACHE EXT</>) that some file systems use, e.g. |
65 |
| - <acronym>ZFS</>, <acronym>ext4</>. (The SCSI command |
66 |
| - <command>SYNCHRONIZE CACHE</> has long been available.) Many |
67 |
| - solid-state drives (SSD) also have volatile write-back caches, and |
68 |
| - many do not honor cache flush commands by default. |
| 62 | + to have write-back caches that will not survive a power failure. Many |
| 63 | + solid-state drives (SSD) also have volatile write-back caches. |
69 | 64 | </para>
|
70 | 65 |
|
71 | 66 | <para>
|
|
81 | 76 | a <literal>*</> next to <literal>Write cache</>. <command>hdparm -W</>
|
82 | 77 | can be used to turn off write caching. SCSI drives can be queried
|
83 | 78 | using <ulink url="http://sg.danny.cz/sg/sdparm.html"><application>sdparm</></ulink>.
|
84 |
| - for SCSI drives. Use <command>sdparm --get=WCE</command> to check |
| 79 | + Use <command>sdparm --get=WCE</command> to check |
85 | 80 | whether the write cache is enabled and <command>sdparm --clear=WCE</>
|
86 | 81 | to disable it.
|
87 | 82 | </para>
|
|
107 | 102 | <listitem>
|
108 | 103 | <para>
|
109 | 104 | On <productname>Windows</>, if <varname>wal_sync_method</> is
|
110 |
| - <literal>open_datasync</> (the default), write caching is disabled |
111 |
| - by unchecking <literal>My Computer\Open\{select disk drive}\Properties\Hardware\Properties\Policies\Enable write caching on the disk</>. |
112 |
| - Alternatively, set <varname>wal_sync_method</varname> to <literal>fsync</> or <literal>fsync_writethrough</>, which never do write caching. |
| 105 | + <literal>open_datasync</> (the default), write caching can be disabled |
| 106 | + by unchecking <literal>My Computer\Open\<replaceable>disk drive</>\Properties\Hardware\Properties\Policies\Enable write caching on the disk</>. |
| 107 | + Alternatively, set <varname>wal_sync_method</varname> to |
| 108 | + <literal>fsync</> or <literal>fsync_writethrough</>, which prevent |
| 109 | + write caching. |
113 | 110 | </para>
|
114 | 111 | </listitem>
|
115 | 112 |
|
116 | 113 | <listitem>
|
117 | 114 | <para>
|
118 |
| - On <productname>MacOS X</productname>, write caching can be disabled by |
| 115 | + On <productname>Mac OS X</productname>, write caching can be prevented by |
119 | 116 | setting <varname>wal_sync_method</> to <literal>fsync_writethrough</>.
|
120 | 117 | </para>
|
121 | 118 | </listitem>
|
122 | 119 | </itemizedlist>
|
123 | 120 |
|
124 | 121 | <para>
|
125 |
| - Many file systems that use write barriers (e.g. <acronym>ZFS</>, |
126 |
| - <acronym>ext4</>) internally use <command>FLUSH CACHE EXT</> or |
127 |
| - <command>SYNCHRONIZE CACHE</> commands to flush data to the platters on |
128 |
| - write-back-enabled drives. Unfortunately, such write barrier file |
129 |
| - systems behave suboptimally when combined with battery-backed unit |
| 122 | + Recent SATA drives (those following <acronym>ATAPI-6</> or later) |
| 123 | + offer a drive cache flush command (<command>FLUSH CACHE EXT</>), |
| 124 | + while SCSI drives have long supported a similar command |
| 125 | + <command>SYNCHRONIZE CACHE</>. These commands are not directly |
| 126 | + accessible to <productname>PostgreSQL</>, but some file systems |
| 127 | + (e.g., <acronym>ZFS</>, <acronym>ext4</>) can use them to flush |
| 128 | + data to the platters on write-back-enabled drives. Unfortunately, such |
| 129 | + file systems behave suboptimally when combined with battery-backup unit |
130 | 130 | (<acronym>BBU</>) disk controllers. In such setups, the synchronize
|
131 |
| - command forces all data from the BBU to the disks, eliminating much |
132 |
| - of the benefit of the BBU. You can run the utility |
| 131 | + command forces all data from the controller cache to the disks, |
| 132 | + eliminating much of the benefit of the BBU. You can run the utility |
133 | 133 | <filename>src/tools/fsync</> in the PostgreSQL source tree to see
|
134 | 134 | if you are affected. If you are affected, the performance benefits
|
135 |
| - of the BBU cache can be regained by turning off write barriers in |
| 135 | + of the BBU can be regained by turning off write barriers in |
136 | 136 | the file system or reconfiguring the disk controller, if that is
|
137 | 137 | an option. If write barriers are turned off, make sure the battery
|
138 |
| - remains active; a faulty battery can potentially lead to data loss. |
| 138 | + remains functional; a faulty battery can potentially lead to data loss. |
139 | 139 | Hopefully file system and disk controller designers will eventually
|
140 | 140 | address this suboptimal behavior.
|
141 | 141 | </para>
|
|
148 | 148 | ensure data integrity. Avoid disk controllers that have non-battery-backed
|
149 | 149 | write caches. At the drive level, disable write-back caching if the
|
150 | 150 | drive cannot guarantee the data will be written before shutdown.
|
| 151 | + If you use SSDs, be aware that many of these do not honor cache flush |
| 152 | + commands by default. |
151 | 153 | You can test for reliable I/O subsystem behavior using <ulink
|
152 | 154 | url="http://brad.livejournal.com/2116715.html"><filename>diskchecker.pl</filename></ulink>.
|
153 | 155 | </para>
|
|
157 | 159 | operations themselves. Disk platters are divided into sectors,
|
158 | 160 | commonly 512 bytes each. Every physical read or write operation
|
159 | 161 | processes a whole sector.
|
160 |
| - When a write request arrives at the drive, it might be for 512 bytes, |
161 |
| - 1024 bytes, or 8192 bytes, and the process of writing could fail due |
| 162 | + When a write request arrives at the drive, it might be for some multiple |
| 163 | + of 512 bytes (<productname>PostgreSQL</> typically writes 8192 bytes, or |
| 164 | + 16 sectors, at a time), and the process of writing could fail due |
162 | 165 | to power loss at any time, meaning some of the 512-byte sectors were
|
163 |
| - written, and others were not. To guard against such failures, |
| 166 | + written while others were not. To guard against such failures, |
164 | 167 | <productname>PostgreSQL</> periodically writes full page images to
|
165 | 168 | permanent WAL storage <emphasis>before</> modifying the actual page on
|
166 | 169 | disk. By doing this, during crash recovery <productname>PostgreSQL</> can
|
167 |
| - restore partially-written pages. If you have a battery-backed disk |
| 170 | + restore partially-written pages from WAL. If you have a battery-backed disk |
168 | 171 | controller or file-system software that prevents partial page writes
|
169 |
| - (e.g., ZFS), you can turn off this page imaging by turning off the |
| 172 | + (e.g., ZFS), you can safely turn off this page imaging by turning off the |
170 | 173 | <xref linkend="guc-full-page-writes"> parameter.
|
171 | 174 | </para>
|
172 | 175 | </sect1>
|
|
0 commit comments