4 Distributed Algorithms
4 Distributed Algorithms
Distributed algorithms
• Shared data
• Building distributed services
November 2005 Distributed systems: distributed algor 2
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems
• Physical clock
– counts oscillations occurring in a crystal at a
definitive frequency
1 2 3
P1 •a •c •g
0 3 4
P2 • •d •e
1 5
P3 •b •f
• Correct usage:
if x y then C(x) < C(y)
• Incorrect usage:
if C(x) < C(y) then x y
• Solution: Logical vector clocks
November 2005 Distributed systems: distributed algor 9
ithms
Logical clocks (cont.)
• Vector clocks for N processes:
– at process Pi: Vi[j] for j = 1, 2,…,N
– Properties:
(1,0,0) (2,0,0)
p1
a b m1
(2,1,0) (2,2,0)
Physical
p2
time
c d m2
(0,0,1) (2,2,2)
p3
e f
object
reference
message
a. Garbage collection garbage object
p1 p2
wait-for
b. Deadlock wait-for
p1 p2
activate
c. Termination passive passive
H = h1 h2 h3 … hn
consistent?
– Cut of the systems execution
0 1 2 3
e1 e1 e1 e1
p1
m1 m2
p2 Physical
0 1 2 time
e2 e 2 e 2
Inconsistent cut
Consistent cut
• Consistent cut C
e C, f e f C
• Consistent global state
corresponds to consistent cut
November 2005 Distributed systems: distributed algor 18
ithms
Global states (cont.)
• Model execution of a (distributed) system
S0 S1 S2 S3 …
p1 c2 p2
c1
c1 (empty)
2. Global state S 1
<$900, 0> p1 c2 (Order 10, $100), M p2 <$50, 2000>
c1 (empty)
3. Global state S 2
<$900, 0> p1 c2 (Order 10, $100), M p2 <$50, 1995>
c1 (five widgets)
4. Global state S 3
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 (empty) C2 = <>
(M = marker message)
November 2005 Distributed systems: distributed algor 26
ithms
Global states (cont.)
1. Global state S 0
<$1000, 0> p1 c2 (empty) p2 <$50, 2000>
c1 (empty)
4. Global state S 3
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 (empty) C2 = <>
5. Global state S 4
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 M C2 = <>
6. Global state S 5
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 (empty) C2 = <>
(M = marker message)
November 2005 Distributed systems: distributed algor 27
ithms
Global states (cont.)
• Observed state
– Corresponds to consistent cut
– Reachable!
actual execution e 0 ,e 1,...
Ssnap
pre-snap: e '0 ,e '1 ,...e 'R-1 post-snap: e ' R,e 'R+1 ,...
• Implementation
– Every T sec: multicast by P of “P is here”
– Maximum on message transmission time:
• Asynchronous system: estimate E
• Synchronous system: absolute bound A
User
Enter()
P1
P4
P2 P3
User
3
Enter()
P1
P4
P2 P3
User
3
P1
P4
P2 P3
User
3
enter()
P1
P4
P2 P3
P1
P4
P2 P3
enter()
P1
P4
P2 P3
enter()
P1
P4
P2 P3
enter() exit()
P1
P4
P2 P3
enter()
enter()
P1
P4
P2 P3
enter()
P1
P4
P2 P3
P6
P4
P5
P6
P4
P5
P6
P4
P5
– 3 processes
– P1 and P2 will request it concurrently
– P3 not interested in using resource
P1 P3
released released
Queue: Queue:
P2
released
Queue:
P1
<41,P1> P3
wanted 0
Queue: released
Queue:
<41,P1>
P2
released
Queue:
P1
<41,P1> P3
wanted 0
Queue: released
Queue:
<34,P2>
<41,P1> <34,P2>
P2
wanted 0
Queue:
P1
<41,P1> P3
wanted 0
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2
wanted 0
Queue:
P1
<41,P1> P3
wanted 0
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 0
Queue:
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
held 2
Queue:
P1
P1
wanted 1 P3
Queue: released
Queue:
P2
held 2
Queue:
P1
P1
wanted 1 P3
Queue: released
Queue:
P2
released 0
Queue:
P2 P4
answer
P2 P4
answer
P2 P4
answer
P2 P4
answer
P2 P4
answer
P2 P4
answer
P2 P4
answer
P2 P4
answer
P2 P4
P11
P5
11
P21 P8
P11
P5
11
P21 P8
P11
P5
P21 P8
11
P11
P5
P21 P8
11
P11
P5
21
P21 P8
21 P11
P5
P21 P8
P11
P5
P21 P8
21
P11
P5 coordinator
21 P8
P21
P11
P5 coordinator
21 P8
P21
21 P11
P5 coordinator
P21 P8
P11
P5 coordinator
P21 P8
P11
P5 coordinator
P21 P8
P11
P5 coordinator 21
P21 P8
P11
P5 coordinator
P21 P8
21
P11
P5 coordinator
P21 P8
1. Using B-multicast
Message
processing
deliver
Hold-back
queue Delivery queue
When delivery
guarantees are
met
Incoming
messages
November 2005 Distributed systems: distributed algor 129
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
Data structures at process p:
Sgp : sequence number
Sgp ++
On IP-deliver (<m, S, <q, R>>) at q from p
November 2005 Distributed systems: distributed algor 130
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
On IP-deliver (<m, S, <q, R>>) at q from p
if S = Rgp + 1
then R-deliver (m)
Rgp ++
check hold-back queue
else if S > Rgp + 1
then store m in hold-back queue
request missing messages endif
endif
if R > Rg then request missing messages endif
November 2005 Distributed systems: distributed algor 131
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
• Correct?
– Integrity: seq numbers + checksums
– Validity: if missing messages are detected
– Agreement: if copy of message remains available
Q: 0 R: 0
P: -1 R: -1 P: -1 Q: -1
<> <>
November 2005 Distributed systems: distributed algor 134
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• First multicast by P:
P: 1
Q: -1 R: -1
< mp0 > P: mp0, 0, <Q:-1, R:-1>
Q: 0 R: 0
P: -1 R: -1 P: -1 Q: -1
<> <>
November 2005 Distributed systems: distributed algor 135
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Arrival multicast by P at Q:
P: 1
Q: -1 R: -1
< mp0 > P: mp0, 0, <Q:-1, R:-1>
Q: 0 R: 0
P: 0 R: -1 P: -1 Q: -1
< mp0 > <>
November 2005 Distributed systems: distributed algor 136
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• New state:
P: 1
Q: -1 R: -1
< mp0 >
Q: 0 R: 0
P: 0 R: -1 P: -1 Q: -1
< mp0 > <>
November 2005 Distributed systems: distributed algor 137
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Multicast by Q:
P: 1
Q: -1 R: -1
Q: mq0, 0, <P:0, R:-1> < mp0 >
Q: 1 R: 0
P: 0 R: -1 P: -1 Q: -1
< mp0 ,mq0 > <>
November 2005 Distributed systems: distributed algor 138
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Arrival of multicast by Q:
P: 1
Q: 0 R: -1
Q: mq0, 0, <P:0, R:-1> < mp0 ,mq0 >
Q: 1 R: 0
P: 0 R: -1 P: -1 Q: 0
< mp0 , ,mq0 > < mq0 >
November 2005 Distributed systems: distributed algor 139
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• When to delete stored messages?
P: 1
Q: 0 R: -1
< mp0 ,mq0 >
Q: 1 R: 0
P: 0 R: -1 P: -1 Q: 0
< mp0 , ,mq0 > < mq0 >
November 2005 Distributed systems: distributed algor 140
ithms
Multicast communication(cont.)
Ordered multicast
• FIFO if a correct process P:
multicast(g, m);
multicast(g, m’);
then for all correct processes:
deliver(m’)
deliver(m) before deliver(m’)
• Total
November 2005 Distributed systems: distributed algor 141
ithms
Multicast communication(cont.)
Ordered multicast
• Total if p: deliver(m) deliver( m’)
then for all correct processes:
deliver(m’)
deliver(m) before deliver(m’)
C2
C3
P1 P2 P3
Sgp = 0; Rgq = -1
For process p to FO-multicast message m to group g
B-multicast ( g, <m, Sgp >)
Sgp ++
On B-deliver (<m, S >) at q from p
November 2005 Distributed systems: distributed algor 145
ithms
Multicast communication(cont.)
FIFO multicast
Algorithm 2 on top of any B-multicast
On B-deliver (<m, S >) at q from p
if S = Rgp + 1
then FO-deliver (m)
Rgp ++
check hold-back queue
else if S > Rgp + 1
then store m in hold-back queue endif
endif
• Problems?
– A single sequencer process
• bottleneck
• single point of failure
3 Agreed Seq
1
2 P1
3
P3
• Approach:
– Use vector timestamps
– Timestamp = count number of multicast
messages
Meaning?
2:1:v 2:1:w
p2 p3 p2 p3
3:1:u 3:1:x
p4 p4
Faulty processes are shown shaded