Skip to content

Commit a46fe17

Browse files
committed
Merge branch 'PGPROEE9_6_MULTIMASTER' of https://gitlab.postgrespro.ru/pgpro-dev/postgrespro into PGPROEE9_6_MULTIMASTER
2 parents b017b97 + 3977b21 commit a46fe17

File tree

2 files changed

+67
-21
lines changed

2 files changed

+67
-21
lines changed

contrib/mmts/doc/administration.md

Lines changed: 63 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -169,21 +169,77 @@ After that configure and atart multimaster from step 3 of previous section. See
169169
170170
## Tuning configuration params
171171
172-
While multimaster is usable with default configuration optins several params may require tuning.
172+
While multimaster is usable with default configuration, several params may require tuning.
173173
174-
* Hearbeat timeouts — multimaster periodically send heartbeat packets to check availability of neighbour nodes. ```multimaster.heartbeat_send_timeout``` defines amount of time between sending heartbeats, while ```multimaster.heartbeat_recv_timeout``` sets amount of time following which node assumed to be disconnected if no hearbeats were received during this time. It's good idea to set ```multimaster.heartbeat_send_timeout``` based on typical ping latencies between you nodes. Small recv/senv ratio decraeases time of failure detection, but increases probability of false positive failure detection, so tupical packet loss ratio between nodes should be taken into account.
174+
* Hearbeat timeouts — multimaster periodically send heartbeat packets to check availability of neighbour nodes. ```multimaster.heartbeat_send_timeout``` defines amount of time between sending heartbeats, while ```multimaster.heartbeat_recv_timeout``` sets amount of time following which node assumed to be disconnected if no hearbeats were received during this time. It's good idea to set ```multimaster.heartbeat_send_timeout``` based on typical ping latencies between you nodes. Small recv/senv ratio decreases time of failure detection, but increases probability of false positive failure detection, so tupical packet loss ratio between nodes should be taken into account.
175175
176-
* Min/max recovery lag — when node is disconnected from the cluster other nodes will keep to collect WAL logs for disconnected node until size of WAL log will grow to ```multimaster.max_recovery_lag```. Upon reaching this threshold WAL logs for disconnected node will be deleted, automatic recovery will be no longer possible and disconnected node should be cloned manually from one of alive node by ```pg_basebackup```. Increasing ```multimaster.max_recovery_lag``` increases amount of time while automatic recovery is possible, but also increasing maximum disk usage during WAL collection. On the other hand ```multimaster.min_recovery_lag``` sets difference between acceptor and donor nodes before switching ordanary recovery to exclusive mode, when commits on donor node are stopped. This step is necessary to ensure that no new commits will happend during node promotion from recovery state to online state and nodes will be at sync after that.
176+
* Min/max recovery lag — when node is disconnected from the cluster other nodes will keep to collect WAL logs for disconnected node until size of WAL log will grow to ```multimaster.max_recovery_lag```. Upon reaching this threshold WAL logs for disconnected node will be deleted, automatic recovery will be no longer possible and disconnected node should be cloned manually from one of alive nodes by ```pg_basebackup```. Increasing ```multimaster.max_recovery_lag``` increases amount of time while automatic recovery is possible, but also increasing maximum disk usage during WAL collection. On the other hand ```multimaster.min_recovery_lag``` sets difference between acceptor and donor nodes before switching ordanary recovery to exclusive mode, when commits on donor node are stopped. This step is necessary to ensure that no new commits will happend during node promotion from recovery state to online state and nodes will be at sync after that.
177177
178178
179179
## Monitoring
180180
181-
* `mtm.get_nodes_state()` -- show status of nodes on cluster
182-
* `mtm.get_cluster_state()` -- show whole cluster status
183-
* `mtm.get_cluster_info()` -- print some debug info
181+
Multimaster provides several views to check current cluster state. To access this functions ```multimaster``` extension should be created explicitely. Run in psql:
184182
185-
Read description of all management functions at [functions](doc/functions.md)
183+
```sql
184+
CREATE EXTENSION multimaster;
185+
```
186+
187+
Then it is possible to check nodes specific information via ```mtm.get_nodes_state()```:
188+
189+
```sql
190+
select * from mtm.get_nodes_state();
191+
```
192+
193+
and status of whole cluster can bee seen through:
194+
195+
```sql
196+
select * from mtm.get_cluster_state();
197+
```
198+
199+
Read description of all monitoring functions at [functions](doc/functions.md)
186200
187201
## Adding nodes to cluster
202+
203+
Mulmimaster is able to add/drop cluster nodes without restart. To add new node one should change cluster configuration on alive nodes, than load data to a new node using ```pg_basebackup``` and start node.
204+
205+
Suppose we have working cluster of three nodes (```node1```, ```node2```, ```node3```) and want to add new ```node4``` to the cluster.
206+
207+
1. First we need to figure out connection string that will be used to access new server. Let's assume that in our case that will be "dbname=mydb user=myuser host=node4". Run in psql connected to any live node:
208+
209+
```sql
210+
select * from mtm.add_node('dbname=mydb user=myuser host=node4');
211+
```
212+
213+
this will change cluster configuration on all nodes and start replication slots for a new node.
214+
215+
1. After calling ```mtm.add_node()``` we can copy data from alive node on new node:
216+
217+
```
218+
node4> pg_basebackup -D ./datadir -h node1 -x
219+
```
220+
221+
1. ```pg_basebackup``` will copy entire data directory from ```node1``` among with configs. So we need to change ```postgresql.conf``` for ```node4```:
222+
223+
```
224+
multimaster.max_nodes = 4
225+
multimaster.node_id = 4
226+
multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user=myuser host=node2, dbname=mydb user=myuser host=node3, dbname=mydb user=myuser host=node4'
227+
```
228+
229+
1. Now we can just start postgres on new node:
230+
231+
```
232+
node4> pg_ctl -D ./datadir -l ./pg.log start
233+
```
234+
235+
After switching on node will recover recent transaction and change state to ONLINE. Node status can be checked via ```mtm.get_nodes_state()``` view on any cluster node.
236+
237+
1. Now cluster is using new node, but we also should change ```multimaster.conn_strings``` and ```multimaster.max_nodes``` on old nodes to ensure that right configuration will be loaded in case of postgres restart.
238+
239+
188240
## Excluding nodes from cluster
189241
242+
243+
244+
245+

contrib/mmts/tests/reinit-mm.sh

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,21 @@ n_nodes=3
22
export PATH=~/code/postgres_cluster/tmp_install/bin/:$PATH
33
ulimit -c unlimited
44
pkill -9 postgres
5-
pkill -9 arbiter
65

76
cd ~/code/postgres_cluster/contrib/mmts/
87
make clean && make install
9-
cd ~/code/postgres_cluster/contrib/raftable/
10-
make clean && make install
8+
119
cd ~/code/postgres_cluster/contrib/mmts/tests
1210

1311

14-
rm -fr node? *.log dtm
12+
rm -fr node? *.log
1513
conn_str=""
1614
sep=""
1715
for ((i=1;i<=n_nodes;i++))
1816
do
1917
port=$((5431 + i))
20-
raft_port=$((6665 + i))
2118
arbiter_port=$((7000 + i))
2219
conn_str="$conn_str${sep}dbname=regression user=stas host=127.0.0.1 port=$port arbiter_port=$arbiter_port sslmode=disable"
23-
raft_conn_str="$raft_conn_str${sep}${i}:localhost:$raft_port"
2420
sep=","
2521
initdb node$i
2622
pg_ctl -w -D node$i -l node$i.log start
@@ -50,20 +46,14 @@ do
5046
default_transaction_isolation = 'repeatable read'
5147
5248
multimaster.workers = 1
53-
multimaster.use_raftable = false
54-
multimaster.queue_size=52857600
55-
multimaster.ignore_tables_without_pk = 1
5649
multimaster.heartbeat_recv_timeout = 2000
5750
multimaster.heartbeat_send_timeout = 250
58-
multimaster.twopc_min_timeout = 40000000
59-
multimaster.min_2pc_timeout = 40000000
6051
multimaster.volkswagen_mode = 1
6152
multimaster.conn_strings = '$conn_str'
6253
multimaster.node_id = $i
63-
multimaster.max_nodes = 3
54+
multimaster.max_nodes = 4
6455
multimaster.arbiter_port = $arbiter_port
65-
raftable.id = $i
66-
raftable.peers = '$raft_conn_str'
56+
multimaster.min_2pc_timeout = 100000
6757
SQL
6858
cp pg_hba.conf node$i
6959
pg_ctl -w -D node$i -l node$i.log start

0 commit comments

Comments
 (0)