|
143 | 143 | </listitem>
|
144 | 144 | </itemizedlist>
|
145 | 145 | <para>If you have any data that must be present on one of the nodes only, you can exclude a particular table from replication, as follows:
|
146 |
| - <programlisting><function>mtm.make_table_local</function>('table_name') </programlisting> |
| 146 | +<programlisting>SELECT mtm.make_table_local('table_name') </programlisting> |
147 | 147 | </para>
|
148 | 148 | </sect2>
|
149 | 149 |
|
|
266 | 266 | of 2<replaceable>N</replaceable>+1 nodes can tolerate <replaceable>N</replaceable> node failures and stay alive if any
|
267 | 267 | <replaceable>N</replaceable>+1 nodes are alive and connected to each other.
|
268 | 268 | </para>
|
| 269 | + <tip> |
| 270 | + <para> |
| 271 | + For clusters with an even number of nodes, you can override this |
| 272 | + behavior. For details, see <xref linkend="multimaster-quorum-settings">. |
| 273 | + </para> |
| 274 | + </tip> |
269 | 275 | <para>
|
270 | 276 | In case of a partial network split when different nodes have
|
271 | 277 | different connectivity, <filename>multimaster</filename> finds a
|
|
274 | 280 | C, but node B cannot access node C, <filename>multimaster</filename>
|
275 | 281 | isolates node C to ensure data consistency on nodes A and B.
|
276 | 282 | </para>
|
277 |
| - <note> |
278 |
| - <para> |
279 |
| - If you try to access a disconnected node, <filename>multimaster</filename> returns an error |
280 |
| - message indicating the current status of the node. To prevent stale reads, read-only queries are also forbidden. |
281 |
| - Additionally, you can break connections between the disconnected node and the clients using the |
282 |
| - <link linkend="mtm-break-connection"><varname>multimaster.break_connection</varname></link> variable. |
283 |
| - </para> |
284 |
| - </note> |
285 | 283 | <para>
|
286 |
| - If required, you can override this behavior for one of the nodes using the |
287 |
| - <link linkend="mtm-major-node"><varname>multimaster.major_node</varname></link> variable. |
288 |
| - In this case, the node will continue working even if it is isolated. |
| 284 | + If you try to access a disconnected node, <filename>multimaster</filename> returns an error |
| 285 | + message indicating the current status of the node. To prevent stale reads, read-only queries are also forbidden. |
| 286 | + Additionally, you can break connections between the disconnected node and the clients using the |
| 287 | + <link linkend="mtm-break-connection"><varname>multimaster.break_connection</varname></link> variable. |
289 | 288 | </para>
|
290 | 289 | <para>
|
291 | 290 | Each node maintains a data structure that keeps the information about the state of all
|
|
339 | 338 | <para>
|
340 | 339 | To use <filename>multimaster</filename>, you need to install
|
341 | 340 | <productname>&productname;</productname> on all nodes of your cluster. <productname>&productname;</productname> includes all the required dependencies and
|
342 |
| - extensions. |
| 341 | + extensions. |
343 | 342 | </para>
|
344 | 343 | <sect3 id="multimaster-setting-up-a-multi-master-cluster">
|
345 | 344 | <title>Setting up a Multi-Master Cluster</title>
|
@@ -606,6 +605,133 @@ SELECT mtm.get_cluster_state();
|
606 | 605 | <para><link linkend="multimaster-guc-variables">GUC Variables</link></para>
|
607 | 606 | </sect4>
|
608 | 607 | </sect3>
|
| 608 | + <sect3 id="multimaster-quorum-settings"> |
| 609 | + <title>Defining Quorum Settings for Clusters with an Even Number of Nodes</title> |
| 610 | + <para> |
| 611 | + By default, <filename>multimaster</filename> uses a majority-based |
| 612 | + algorithm to determine whether the cluster nodes have a quorum: a cluster |
| 613 | + can only continue working if the majority of its nodes are alive and can |
| 614 | + access each other. For clusters with an even number of nodes, this |
| 615 | + approach is not optimal. For example, if a network failure splits the |
| 616 | + cluster into equal parts, or one of the nodes fails in a two-node |
| 617 | + cluster, all the nodes stop accepting queries, even though at least |
| 618 | + half of the cluster nodes are running normally. |
| 619 | + </para> |
| 620 | + <para> |
| 621 | + To enable a smooth failover for such cases, you can modify the |
| 622 | + <filename>multimaster</filename> majority-based behavior using one |
| 623 | + of the following options: |
| 624 | + <itemizedlist spacing="compact"> |
| 625 | + <listitem> |
| 626 | + <para> |
| 627 | + <link linkend="setting-up-a-referee">Set up a standalone <firstterm>referee</> node</link> |
| 628 | + to assign the quorum status to a subset of nodes that constitutes half of the cluster. |
| 629 | + </para> |
| 630 | + </listitem> |
| 631 | + <listitem> |
| 632 | + <para> |
| 633 | + <link linkend="configuring-the-major-node">Choose the <firstterm>major node</></link> |
| 634 | + that continues working regardless of the status of other nodes. |
| 635 | + Use this option in two-node cluster configurations only. |
| 636 | + </para> |
| 637 | + </listitem> |
| 638 | + </itemizedlist> |
| 639 | + <important> |
| 640 | + <para> |
| 641 | + To avoid split-brain problems, do not use the major node together |
| 642 | + with a referee in the same cluster. |
| 643 | + </para> |
| 644 | + </important> |
| 645 | + </para> |
| 646 | + <sect4 id="setting-up-a-referee"> |
| 647 | + <title>Setting up a Standalone Referee Node</title> |
| 648 | + <para> |
| 649 | + A <firstterm>referee</> is a voting node used to determine which subset |
| 650 | + of nodes has a quorum if the cluster is split into equal parts. The |
| 651 | + referee node does not store any cluster data, so it is not |
| 652 | + resource-intensive and can be configured on virtually any system with |
| 653 | + <productname>&productname;</productname> installed. |
| 654 | + </para> |
| 655 | + <para> |
| 656 | + To set up a referee for your cluster: |
| 657 | +<orderedlist> |
| 658 | + <listitem> |
| 659 | + <para> |
| 660 | + Install <productname>&productname;</productname> on the node you are |
| 661 | + going to make a referee and create the <filename>referee</filename> |
| 662 | + extension: |
| 663 | + <programlisting> |
| 664 | +CREATE EXTENSION referee; |
| 665 | +</programlisting> |
| 666 | + </para> |
| 667 | + </listitem> |
| 668 | + <listitem> |
| 669 | + <para> |
| 670 | + Make sure the <filename>pg_hba.conf</filename> file allows |
| 671 | + access to the referee node. |
| 672 | + </para> |
| 673 | + </listitem> |
| 674 | + <listitem> |
| 675 | + <para> |
| 676 | + On all your cluster nodes, specify the referee connection string |
| 677 | + in the <filename>postgresql.conf</> file: |
| 678 | + <programlisting> |
| 679 | +multimaster.referee_connstring = <replaceable>connstring</> |
| 680 | +</programlisting> |
| 681 | +where <replaceable>connstring</> holds <link linkend="libpq-paramkeywords">libpq options</link> |
| 682 | +required to access the referee. |
| 683 | + </para> |
| 684 | + </listitem> |
| 685 | +</orderedlist> |
| 686 | +</para> |
| 687 | +<para> |
| 688 | +The first subset of nodes that gets connected to the referee wins the voting |
| 689 | +and continues working. The referee keeps the voting result until all the |
| 690 | +other cluster nodes get online again. Then the result is discarded, and |
| 691 | +a new winner can be chosen in case of another network failure. |
| 692 | +</para> |
| 693 | + <para> |
| 694 | + To avoid split-brain problems, you can only have a single referee |
| 695 | + in your cluster. Do not set up a referee if you have already |
| 696 | + <link linkend="configuring-the-major-node">configured the major node</link>. |
| 697 | + </para> |
| 698 | + </sect4> |
| 699 | + <sect4 id="configuring-the-major-node"> |
| 700 | + <title>Configuring the Major Node</title> |
| 701 | + <para> |
| 702 | + If you configure one of the nodes to be the major one, this node |
| 703 | + will continue accepting queries even if it is isolated by a |
| 704 | + network failure, or other nodes get broken. This setting is useful |
| 705 | + in a two-node cluster configuration, or to quickly restore a |
| 706 | + single node in a broken cluster. |
| 707 | + </para> |
| 708 | + <important> |
| 709 | + <para> |
| 710 | + If your cluster has more than two nodes, promoting one of the |
| 711 | + nodes to the major status can lead to split-brain problems |
| 712 | + in case of network failures, and reduce the number of possible |
| 713 | + failover options. Consider |
| 714 | + <link linkend="setting-up-a-referee">setting up a standalone referee</link> |
| 715 | + instead. |
| 716 | + </para> |
| 717 | + </important> |
| 718 | + <para> |
| 719 | + To make one of the nodes major, enable the |
| 720 | + <literal>multimaster.major_node</literal> parameter on this node: |
| 721 | +<programlisting> |
| 722 | +ALTER SYSTEM SET multimaster.major_node TO on |
| 723 | +SELECT pg_reload_conf(); |
| 724 | +</programlisting> |
| 725 | + </para> |
| 726 | + <para> |
| 727 | + Do not set the <varname>major_node</varname> parameter on more |
| 728 | + than one cluster node. When enabled on several nodes, it can |
| 729 | + cause the split-brain problem. If you have already set up a |
| 730 | + referee for your cluster, the <varname>major_node</varname> |
| 731 | + option is forbidden. |
| 732 | + </para> |
| 733 | + </sect4> |
| 734 | + </sect3> |
609 | 735 | </sect2>
|
610 | 736 | <sect2 id="multimaster-administration"><title>Multi-Master Cluster Administration</title>
|
611 | 737 | <itemizedlist>
|
@@ -795,7 +921,7 @@ SELECT mtm.stop_node(3);
|
795 | 921 | set to <literal>true</literal>:
|
796 | 922 | </para>
|
797 | 923 | <programlisting>
|
798 |
| -SELECT mtm.stop_node(3, drop_slot true); |
| 924 | +SELECT mtm.stop_node(3, true); |
799 | 925 | </programlisting>
|
800 | 926 | <para>
|
801 | 927 | This disables replication slots for node 3 on all cluster nodes and stops replication to
|
@@ -959,19 +1085,29 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
|
959 | 1085 | </indexterm>
|
960 | 1086 | </term>
|
961 | 1087 | <listitem>
|
962 |
| - <para>Node with this flag continues working even if it cannot access the majority of other nodes. |
963 |
| - This is needed to break the symmetry if there is an even number of alive nodes in the cluster. |
964 |
| - For example, in a cluster of three nodes, if one of the nodes has crashed and |
965 |
| - the connection between the remaining nodes is lost, the node with <varname>multimaster.major_node</varname> = <literal>true</literal> will continue working. |
| 1088 | + <para>The node with this flag continues working even if it cannot access the majority of other nodes. |
| 1089 | + This may be required to break the symmetry in two-node clusters. |
966 | 1090 | </para>
|
967 | 1091 | <important>
|
968 |
| - <para>This parameter should be used with caution. Only one node in the cluster |
969 |
| - can have this parameter set to <literal>true</literal>. When set to <literal>true</literal> on several |
970 |
| - nodes, this parameter can cause the split-brain problem. |
| 1092 | + <para>This parameter should be used with caution. This parameter can cause the |
| 1093 | + split-brain problem if you use it on clusters with more than two nodes, or set |
| 1094 | + it to <literal>true</literal> on more than one node. |
| 1095 | + Only one node in the cluster can be the major node. |
971 | 1096 | </para>
|
972 | 1097 | </important>
|
973 | 1098 | </listitem>
|
974 | 1099 | </varlistentry>
|
| 1100 | + <varlistentry id="mtm-referee-connstring" xreflabel="multimaster.referee_connstring"> |
| 1101 | + <term><varname>multimaster.referee_connstring</varname> |
| 1102 | + <indexterm><primary><varname>multimaster.referee_connstring</varname></primary> |
| 1103 | + </indexterm> |
| 1104 | + </term> |
| 1105 | + <listitem> |
| 1106 | + <para>Connection string to access the referee node. You must set this parameter |
| 1107 | + on all cluster nodes if the referee is set up. |
| 1108 | + </para> |
| 1109 | + </listitem> |
| 1110 | + </varlistentry> |
975 | 1111 | <varlistentry>
|
976 | 1112 | <term><varname>multimaster.max_workers</varname>
|
977 | 1113 | <indexterm><primary><varname>multimaster.max_workers</varname></primary>
|
|
0 commit comments