Skip to content

Commit e444329

Browse files
author
Liudmila Mantrova
committed
DOC: major node and referee documentation for multimaster
1 parent d23f5b8 commit e444329

File tree

1 file changed

+157
-21
lines changed

1 file changed

+157
-21
lines changed

doc/src/sgml/multimaster.sgml

Lines changed: 157 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@
143143
</listitem>
144144
</itemizedlist>
145145
<para>If you have any data that must be present on one of the nodes only, you can exclude a particular table from replication, as follows:
146-
<programlisting><function>mtm.make_table_local</function>('table_name') </programlisting>
146+
<programlisting>SELECT mtm.make_table_local('table_name') </programlisting>
147147
</para>
148148
</sect2>
149149

@@ -266,6 +266,12 @@
266266
of 2<replaceable>N</replaceable>+1 nodes can tolerate <replaceable>N</replaceable> node failures and stay alive if any
267267
<replaceable>N</replaceable>+1 nodes are alive and connected to each other.
268268
</para>
269+
<tip>
270+
<para>
271+
For clusters with an even number of nodes, you can override this
272+
behavior. For details, see <xref linkend="multimaster-quorum-settings">.
273+
</para>
274+
</tip>
269275
<para>
270276
In case of a partial network split when different nodes have
271277
different connectivity, <filename>multimaster</filename> finds a
@@ -274,18 +280,11 @@
274280
C, but node B cannot access node C, <filename>multimaster</filename>
275281
isolates node C to ensure data consistency on nodes A and B.
276282
</para>
277-
<note>
278-
<para>
279-
If you try to access a disconnected node, <filename>multimaster</filename> returns an error
280-
message indicating the current status of the node. To prevent stale reads, read-only queries are also forbidden.
281-
Additionally, you can break connections between the disconnected node and the clients using the
282-
<link linkend="mtm-break-connection"><varname>multimaster.break_connection</varname></link> variable.
283-
</para>
284-
</note>
285283
<para>
286-
If required, you can override this behavior for one of the nodes using the
287-
<link linkend="mtm-major-node"><varname>multimaster.major_node</varname></link> variable.
288-
In this case, the node will continue working even if it is isolated.
284+
If you try to access a disconnected node, <filename>multimaster</filename> returns an error
285+
message indicating the current status of the node. To prevent stale reads, read-only queries are also forbidden.
286+
Additionally, you can break connections between the disconnected node and the clients using the
287+
<link linkend="mtm-break-connection"><varname>multimaster.break_connection</varname></link> variable.
289288
</para>
290289
<para>
291290
Each node maintains a data structure that keeps the information about the state of all
@@ -339,7 +338,7 @@
339338
<para>
340339
To use <filename>multimaster</filename>, you need to install
341340
<productname>&productname;</productname> on all nodes of your cluster. <productname>&productname;</productname> includes all the required dependencies and
342-
extensions.
341+
extensions.
343342
</para>
344343
<sect3 id="multimaster-setting-up-a-multi-master-cluster">
345344
<title>Setting up a Multi-Master Cluster</title>
@@ -606,6 +605,133 @@ SELECT mtm.get_cluster_state();
606605
<para><link linkend="multimaster-guc-variables">GUC Variables</link></para>
607606
</sect4>
608607
</sect3>
608+
<sect3 id="multimaster-quorum-settings">
609+
<title>Defining Quorum Settings for Clusters with an Even Number of Nodes</title>
610+
<para>
611+
By default, <filename>multimaster</filename> uses a majority-based
612+
algorithm to determine whether the cluster nodes have a quorum: a cluster
613+
can only continue working if the majority of its nodes are alive and can
614+
access each other. For clusters with an even number of nodes, this
615+
approach is not optimal. For example, if a network failure splits the
616+
cluster into equal parts, or one of the nodes fails in a two-node
617+
cluster, all the nodes stop accepting queries, even though at least
618+
half of the cluster nodes are running normally.
619+
</para>
620+
<para>
621+
To enable a smooth failover for such cases, you can modify the
622+
<filename>multimaster</filename> majority-based behavior using one
623+
of the following options:
624+
<itemizedlist spacing="compact">
625+
<listitem>
626+
<para>
627+
<link linkend="setting-up-a-referee">Set up a standalone <firstterm>referee</> node</link>
628+
to assign the quorum status to a subset of nodes that constitutes half of the cluster.
629+
</para>
630+
</listitem>
631+
<listitem>
632+
<para>
633+
<link linkend="configuring-the-major-node">Choose the <firstterm>major node</></link>
634+
that continues working regardless of the status of other nodes.
635+
Use this option in two-node cluster configurations only.
636+
</para>
637+
</listitem>
638+
</itemizedlist>
639+
<important>
640+
<para>
641+
To avoid split-brain problems, do not use the major node together
642+
with a referee in the same cluster.
643+
</para>
644+
</important>
645+
</para>
646+
<sect4 id="setting-up-a-referee">
647+
<title>Setting up a Standalone Referee Node</title>
648+
<para>
649+
A <firstterm>referee</> is a voting node used to determine which subset
650+
of nodes has a quorum if the cluster is split into equal parts. The
651+
referee node does not store any cluster data, so it is not
652+
resource-intensive and can be configured on virtually any system with
653+
<productname>&productname;</productname> installed.
654+
</para>
655+
<para>
656+
To set up a referee for your cluster:
657+
<orderedlist>
658+
<listitem>
659+
<para>
660+
Install <productname>&productname;</productname> on the node you are
661+
going to make a referee and create the <filename>referee</filename>
662+
extension:
663+
<programlisting>
664+
CREATE EXTENSION referee;
665+
</programlisting>
666+
</para>
667+
</listitem>
668+
<listitem>
669+
<para>
670+
Make sure the <filename>pg_hba.conf</filename> file allows
671+
access to the referee node.
672+
</para>
673+
</listitem>
674+
<listitem>
675+
<para>
676+
On all your cluster nodes, specify the referee connection string
677+
in the <filename>postgresql.conf</> file:
678+
<programlisting>
679+
multimaster.referee_connstring = <replaceable>connstring</>
680+
</programlisting>
681+
where <replaceable>connstring</> holds <link linkend="libpq-paramkeywords">libpq options</link>
682+
required to access the referee.
683+
</para>
684+
</listitem>
685+
</orderedlist>
686+
</para>
687+
<para>
688+
The first subset of nodes that gets connected to the referee wins the voting
689+
and continues working. The referee keeps the voting result until all the
690+
other cluster nodes get online again. Then the result is discarded, and
691+
a new winner can be chosen in case of another network failure.
692+
</para>
693+
<para>
694+
To avoid split-brain problems, you can only have a single referee
695+
in your cluster. Do not set up a referee if you have already
696+
<link linkend="configuring-the-major-node">configured the major node</link>.
697+
</para>
698+
</sect4>
699+
<sect4 id="configuring-the-major-node">
700+
<title>Configuring the Major Node</title>
701+
<para>
702+
If you configure one of the nodes to be the major one, this node
703+
will continue accepting queries even if it is isolated by a
704+
network failure, or other nodes get broken. This setting is useful
705+
in a two-node cluster configuration, or to quickly restore a
706+
single node in a broken cluster.
707+
</para>
708+
<important>
709+
<para>
710+
If your cluster has more than two nodes, promoting one of the
711+
nodes to the major status can lead to split-brain problems
712+
in case of network failures, and reduce the number of possible
713+
failover options. Consider
714+
<link linkend="setting-up-a-referee">setting up a standalone referee</link>
715+
instead.
716+
</para>
717+
</important>
718+
<para>
719+
To make one of the nodes major, enable the
720+
<literal>multimaster.major_node</literal> parameter on this node:
721+
<programlisting>
722+
ALTER SYSTEM SET multimaster.major_node TO on
723+
SELECT pg_reload_conf();
724+
</programlisting>
725+
</para>
726+
<para>
727+
Do not set the <varname>major_node</varname> parameter on more
728+
than one cluster node. When enabled on several nodes, it can
729+
cause the split-brain problem. If you have already set up a
730+
referee for your cluster, the <varname>major_node</varname>
731+
option is forbidden.
732+
</para>
733+
</sect4>
734+
</sect3>
609735
</sect2>
610736
<sect2 id="multimaster-administration"><title>Multi-Master Cluster Administration</title>
611737
<itemizedlist>
@@ -795,7 +921,7 @@ SELECT mtm.stop_node(3);
795921
set to <literal>true</literal>:
796922
</para>
797923
<programlisting>
798-
SELECT mtm.stop_node(3, drop_slot true);
924+
SELECT mtm.stop_node(3, true);
799925
</programlisting>
800926
<para>
801927
This disables replication slots for node 3 on all cluster nodes and stops replication to
@@ -959,19 +1085,29 @@ pg_ctl -D <replaceable>datadir</replaceable> -l <replaceable>pg.log</replaceable
9591085
</indexterm>
9601086
</term>
9611087
<listitem>
962-
<para>Node with this flag continues working even if it cannot access the majority of other nodes.
963-
This is needed to break the symmetry if there is an even number of alive nodes in the cluster.
964-
For example, in a cluster of three nodes, if one of the nodes has crashed and
965-
the connection between the remaining nodes is lost, the node with <varname>multimaster.major_node</varname> = <literal>true</literal> will continue working.
1088+
<para>The node with this flag continues working even if it cannot access the majority of other nodes.
1089+
This may be required to break the symmetry in two-node clusters.
9661090
</para>
9671091
<important>
968-
<para>This parameter should be used with caution. Only one node in the cluster
969-
can have this parameter set to <literal>true</literal>. When set to <literal>true</literal> on several
970-
nodes, this parameter can cause the split-brain problem.
1092+
<para>This parameter should be used with caution. This parameter can cause the
1093+
split-brain problem if you use it on clusters with more than two nodes, or set
1094+
it to <literal>true</literal> on more than one node.
1095+
Only one node in the cluster can be the major node.
9711096
</para>
9721097
</important>
9731098
</listitem>
9741099
</varlistentry>
1100+
<varlistentry id="mtm-referee-connstring" xreflabel="multimaster.referee_connstring">
1101+
<term><varname>multimaster.referee_connstring</varname>
1102+
<indexterm><primary><varname>multimaster.referee_connstring</varname></primary>
1103+
</indexterm>
1104+
</term>
1105+
<listitem>
1106+
<para>Connection string to access the referee node. You must set this parameter
1107+
on all cluster nodes if the referee is set up.
1108+
</para>
1109+
</listitem>
1110+
</varlistentry>
9751111
<varlistentry>
9761112
<term><varname>multimaster.max_workers</varname>
9771113
<indexterm><primary><varname>multimaster.max_workers</varname></primary>

0 commit comments

Comments
 (0)