|
| 1 | +<!-- |
| 2 | +doc/src/sgml/ref/pg_rewind.sgml |
| 3 | +PostgreSQL documentation |
| 4 | +--> |
| 5 | + |
| 6 | +<refentry id="app-pgrewind"> |
| 7 | + <indexterm zone="app-pgrewind"> |
| 8 | + <primary>pg_rewind</primary> |
| 9 | + </indexterm> |
| 10 | + |
| 11 | + <refmeta> |
| 12 | + <refentrytitle><application>pg_rewind</application></refentrytitle> |
| 13 | + <manvolnum>1</manvolnum> |
| 14 | + <refmiscinfo>Application</refmiscinfo> |
| 15 | + </refmeta> |
| 16 | + |
| 17 | + <refnamediv> |
| 18 | + <refname>pg_rewind</refname> |
| 19 | + <refpurpose>synchronize a <productname>PostgreSQL</productname> data directory with another data directory that was forked from the first one</refpurpose> |
| 20 | + </refnamediv> |
| 21 | + |
| 22 | + <refsynopsisdiv> |
| 23 | + <cmdsynopsis> |
| 24 | + <command>pg_rewind</command> |
| 25 | + <arg rep="repeat"><replaceable>option</replaceable></arg> |
| 26 | + <group choice="plain"> |
| 27 | + <group choice="req"> |
| 28 | + <arg choice="plain"><option>-D </option></arg> |
| 29 | + <arg choice="plain"><option>--target-pgdata</option></arg> |
| 30 | + </group> |
| 31 | + <replaceable> directory</replaceable> |
| 32 | + <group choice="req"> |
| 33 | + <arg choice="plain"><option>--source-pgdata=<replaceable>directory</replaceable></option></arg> |
| 34 | + <arg choice="plain"><option>--source-server=<replaceable>connstr</replaceable></option></arg> |
| 35 | + </group> |
| 36 | + </group> |
| 37 | + </cmdsynopsis> |
| 38 | + </refsynopsisdiv> |
| 39 | + |
| 40 | + <refsect1> |
| 41 | + <title>Description</title> |
| 42 | + |
| 43 | + <para> |
| 44 | + <application>pg_rewind</> is a tool for synchronizing a PostgreSQL cluster |
| 45 | + with another copy of the same cluster, after the clusters' timelines have |
| 46 | + diverged. A typical scenario is to bring an old master server back online |
| 47 | + after failover, as a standby that follows the new master. |
| 48 | + </para> |
| 49 | + |
| 50 | + <para> |
| 51 | + The result is equivalent to replacing the target data directory with the |
| 52 | + source one. All files are copied, including configuration files. The |
| 53 | + advantage of <application>pg_rewind</> over taking a new base backup, or |
| 54 | + tools like <application>rsync</>, is that <application>pg_rewind</> does |
| 55 | + not require reading through all unchanged files in the cluster. That makes |
| 56 | + it a lot faster when the database is large and only a small portion of it |
| 57 | + differs between the clusters. |
| 58 | + </para> |
| 59 | + |
| 60 | + <para> |
| 61 | + <application>pg_rewind</> examines the timeline histories of the source |
| 62 | + and target clusters to determine the point where they diverged, and |
| 63 | + expects to find WAL in the target cluster's <filename>pg_xlog</> directory |
| 64 | + reaching all the way back to the point of divergence. In the typical |
| 65 | + failover scenario where the target cluster was shut down soon after the |
| 66 | + divergence, that is not a problem, but if the target cluster had run for a |
| 67 | + long time after the divergence, the old WAL files might not be present |
| 68 | + anymore. In that case, they can be manually copied from the WAL archive to |
| 69 | + the <filename>pg_xlog</> directory. Fetching missing files from a WAL |
| 70 | + archive automatically is currently not supported. |
| 71 | + </para> |
| 72 | + |
| 73 | + <para> |
| 74 | + When the target server is started up for the first time after running |
| 75 | + <application>pg_rewind</>, it will go into recovery mode and replay all |
| 76 | + WAL generated in the source server after the point of divergence. |
| 77 | + If some of the WAL was no longer available in the source server when |
| 78 | + <application>pg_rewind</> was run, and therefore could not be copied by |
| 79 | + <application>pg_rewind</> session, it needs to be made available when the |
| 80 | + target server is started up. That can be done by creating a |
| 81 | + <filename>recovery.conf</> file in the target data directory with a |
| 82 | + suitable <varname>restore_command</>. |
| 83 | + </para> |
| 84 | + </refsect1> |
| 85 | + |
| 86 | + <refsect1> |
| 87 | + <title>Options</title> |
| 88 | + |
| 89 | + <para> |
| 90 | + <application>pg_rewind</application> accepts the following command-line |
| 91 | + arguments: |
| 92 | + |
| 93 | + <variablelist> |
| 94 | + <varlistentry> |
| 95 | + <term><option>-D</option></term> |
| 96 | + <term><option>--target-pgdata</option></term> |
| 97 | + <listitem> |
| 98 | + <para> |
| 99 | + This option specifies the target data directory that is synchronized |
| 100 | + with the source. The target server must shut down cleanly before |
| 101 | + running <application>pg_rewind</application> |
| 102 | + </para> |
| 103 | + </listitem> |
| 104 | + </varlistentry> |
| 105 | + |
| 106 | + <varlistentry> |
| 107 | + <term><option>--source-pgdata</option></term> |
| 108 | + <listitem> |
| 109 | + <para> |
| 110 | + Specifies path to the data directory of the source server, to |
| 111 | + synchronize the target with. When <option>--source-pgdata</> is |
| 112 | + used, the source server must be cleanly shut down. |
| 113 | + </para> |
| 114 | + </listitem> |
| 115 | + </varlistentry> |
| 116 | + |
| 117 | + <varlistentry> |
| 118 | + <term><option>--source-server</option></term> |
| 119 | + <listitem> |
| 120 | + <para> |
| 121 | + Specifies a libpq connection string to connect to the source |
| 122 | + <productname>PostgreSQL</> server to synchronize the target with. |
| 123 | + The server must be up and running, and must not be in recovery mode. |
| 124 | + </para> |
| 125 | + </listitem> |
| 126 | + </varlistentry> |
| 127 | + |
| 128 | + <varlistentry> |
| 129 | + <term><option>-n</option></term> |
| 130 | + <term><option>--dry-run</option></term> |
| 131 | + <listitem> |
| 132 | + <para> |
| 133 | + Do everything except actually modifying the target directory. |
| 134 | + </para> |
| 135 | + </listitem> |
| 136 | + </varlistentry> |
| 137 | + |
| 138 | + <varlistentry> |
| 139 | + <term><option>-P</option></term> |
| 140 | + <term><option>--progress</option></term> |
| 141 | + <listitem> |
| 142 | + <para> |
| 143 | + Enables progress reporting. Turning this on will deliver an approximate |
| 144 | + progress report while copying data over from the source cluster. |
| 145 | + </para> |
| 146 | + </listitem> |
| 147 | + </varlistentry> |
| 148 | + |
| 149 | + <varlistentry> |
| 150 | + <term><option>--debug</option></term> |
| 151 | + <listitem> |
| 152 | + <para> |
| 153 | + Print verbose debugging output that is mostly useful for developers |
| 154 | + debugging <application>pg_rewind</>. |
| 155 | + </para> |
| 156 | + </listitem> |
| 157 | + </varlistentry> |
| 158 | + |
| 159 | + <varlistentry> |
| 160 | + <term><option>-V</option></term> |
| 161 | + <term><option>--version</option></term> |
| 162 | + <listitem><para>Display version information, then exit</para></listitem> |
| 163 | + </varlistentry> |
| 164 | + |
| 165 | + <varlistentry> |
| 166 | + <term><option>-?</option></term> |
| 167 | + <term><option>--help</option></term> |
| 168 | + <listitem><para>Show help, then exit</para></listitem> |
| 169 | + </varlistentry> |
| 170 | + |
| 171 | + </variablelist> |
| 172 | + </para> |
| 173 | + </refsect1> |
| 174 | + |
| 175 | + <refsect1> |
| 176 | + <title>Environment</title> |
| 177 | + |
| 178 | + <para> |
| 179 | + When <option>--source-server</> option is used, |
| 180 | + <application>pg_rewind</application> also uses the environment variables |
| 181 | + supported by <application>libpq</> (see <xref linkend="libpq-envars">). |
| 182 | + </para> |
| 183 | + </refsect1> |
| 184 | + |
| 185 | + <refsect1> |
| 186 | + <title>Notes</title> |
| 187 | + |
| 188 | + <para> |
| 189 | + <application>pg_rewind</> requires that the <varname>wal_log_hints</> |
| 190 | + option is enabled in <filename>postgresql.conf</>, or that data checksums |
| 191 | + were enabled when the cluster was initialized with <application>initdb</>. |
| 192 | + <varname>full_page_writes</> must also be enabled. |
| 193 | + </para> |
| 194 | + |
| 195 | + <refsect2> |
| 196 | + <title>How it works</title> |
| 197 | + |
| 198 | + <para> |
| 199 | + The basic idea is to copy everything from the new cluster to the old |
| 200 | + cluster, except for the blocks that we know to be the same. |
| 201 | + </para> |
| 202 | + |
| 203 | + <procedure> |
| 204 | + <step> |
| 205 | + <para> |
| 206 | + Scan the WAL log of the old cluster, starting from the last checkpoint |
| 207 | + before the point where the new cluster's timeline history forked off |
| 208 | + from the old cluster. For each WAL record, make a note of the data |
| 209 | + blocks that were touched. This yields a list of all the data blocks |
| 210 | + that were changed in the old cluster, after the new cluster forked off. |
| 211 | + </para> |
| 212 | + </step> |
| 213 | + <step> |
| 214 | + <para> |
| 215 | + Copy all those changed blocks from the new cluster to the old cluster. |
| 216 | + </para> |
| 217 | + </step> |
| 218 | + <step> |
| 219 | + <para> |
| 220 | + Copy all other files like clog, conf files etc. from the new cluster |
| 221 | + to old cluster. Everything except the relation files. |
| 222 | + </para> |
| 223 | + </step> |
| 224 | + <step> |
| 225 | + <para> |
| 226 | + Apply the WAL from the new cluster, starting from the checkpoint |
| 227 | + created at failover. (Strictly speaking, <application>pg_rewind</> |
| 228 | + doesn't apply the WAL, it just creates a backup label file indicating |
| 229 | + that when <productname>PostgreSQL</> is started, it will start replay |
| 230 | + from that checkpoint and apply all the required WAL.) |
| 231 | + </para> |
| 232 | + </step> |
| 233 | + </procedure> |
| 234 | + </refsect2> |
| 235 | + </refsect1> |
| 236 | + |
| 237 | +</refentry> |
0 commit comments