Distributed Systems:
Distributed algorithms
November 2005 Distributed systems: distributed algor 1
ithms
Overview of chapters
• Introduction
• Co-ordination models and languages
• General services
• Distributed algorithms
– Ch 10 Time and global states, 11.4-11.5
– Ch 11 Coordination and agreement, 12.1-12.5
• Shared data
• Building distributed services
November 2005 Distributed systems: distributed algor 2
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems
November 2005 Distributed systems: distributed algor 3
ithms
Logical clocks
• Problem: ordering of events
– requirement for many algorithms
– physical clocks cannot be used
• use causality:
– within a single process: observation
– between different processes: sending of a
message happens before receiving the same
message
November 2005 Distributed systems: distributed algor 4
ithms
Logical clocks (cont.)
• Formalization: happens before relation
x y
• Rules:
– if x happens before y in any process p
then x y
– for any message m: send (m) receive (m)
– if x y and y z
then x z
• Implementation: logical clocks
November 2005 Distributed systems: distributed algor 5
ithms
Logical clocks (cont.)
• Logical clock
– counter appropriately incremented
– one counter per process
• Physical clock
– counts oscillations occurring in a crystal at a
definitive frequency
November 2005 Distributed systems: distributed algor 6
ithms
Logical clocks (cont.)
• Rules for incrementing local logical clock
1 for each event (including send) in process p:
Cp := Cp + 1
2 when a process sends a message m, it
piggybacks on m the value of Cp
3 on receiving (m, t), a process q
• computes Cq := max (Cq, t)
• applies rule 1: Cq := Cq +1
Cq is logical time for event receive(m)
November 2005 Distributed systems: distributed algor 7
ithms
Logical clocks (cont.)
• Logical timestamps: example
1 2 3
P1 •a •c •g
0 3 4
P2 • •d •e
1 5
P3 •b •f
November 2005 Distributed systems: distributed algor 8
ithms
Logical clocks (cont.)
• C(x) logical clock value for event x
• Correct usage:
if x y then C(x) < C(y)
• Incorrect usage:
if C(x) < C(y) then x y
• Solution: Logical vector clocks
November 2005 Distributed systems: distributed algor 9
ithms
Logical clocks (cont.)
• Vector clocks for N processes:
– at process Pi: Vi[j] for j = 1, 2,…,N
– Properties:
if x y then V(x) < V(y)
if V(x) < V(y) then x y
November 2005 Distributed systems: distributed algor 10
ithms
Logical clocks (cont.)
• Rules for incrementing logical vector clock
1 for each event (including send) in process P i:
Vi[i] := Vi[i] + 1
2 when a process Pi sends a message m, it
piggybacks on m the value of Vi
3 on receiving (m, t), a process Pi
• apply rule 1
• Vi[j] := max(Vi[j] , t[j]) for j = 1, 2,…, N
November 2005 Distributed systems: distributed algor 11
ithms
Logical clocks (cont.)
• Logical vector clocks : example
(1,0,0) (2,0,0)
p1
a b m1
(2,1,0) (2,2,0)
Physical
p2
time
c d m2
(0,0,1) (2,2,2)
p3
e f
November 2005 Distributed systems: distributed algor 12
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems
November 2005 Distributed systems: distributed algor 13
ithms
Global states
• Detect global properties
p1 p2
object
reference
message
a. Garbage collection garbage object
p1 p2
wait-for
b. Deadlock wait-for
p1 p2
activate
c. Termination passive passive
November 2005 Distributed systems: distributed algor 14
ithms
Global states (cont.)
• Local states & events
– Process Pi : eik events
sik state, before event k
– History of Pi :
hi = < ei0, ei1, ei2,…>
– Finite prefix of history of Pi :
hik = < ei0, ei1, ei2,…, eik >
November 2005 Distributed systems: distributed algor 15
ithms
Global states (cont.)
• Global states & events
– Global history
H = h1 h2 h3 … hn
– Global state (when?)
S = ( s1p, s2q, …, snu)
consistent?
– Cut of the systems execution
C = h1c1 h1c2 … h1cn
November 2005 Distributed systems: distributed algor 16
ithms
Global states (cont.)
• Example of cuts:
0 1 2 3
e1 e1 e1 e1
p1
m1 m2
p2 Physical
0 1 2 time
e2 e 2 e 2
Inconsistent cut
Consistent cut
November 2005 Distributed systems: distributed algor 17
ithms
Global states (cont.)
• Finite prefix of history of Pi :
hik = < ei0, ei1, ei2,…, eik >
• Cut of the systems execution
C = h1c1 h1c2 … h1cn
• Consistent cut C
e C, f e f C
• Consistent global state
corresponds to consistent cut
November 2005 Distributed systems: distributed algor 18
ithms
Global states (cont.)
• Model execution of a (distributed) system
S0 S1 S2 S3 …
– Series of transitions between consistent states
– Each transition corresponds to one single event
• Internal event
• Sending message
• Receiving message
– Simultaneous events
order events
November 2005 Distributed systems: distributed algor 19
ithms
Global states (cont.)
• Definitions:
– Run = ordering of all events (in a global history)
consistent with each local history’s
ordering
– Linearization =
consistent run +
consistent with
– S’ reachable from S
linearization: … S … S’ …
November 2005 Distributed systems: distributed algor 20
ithms
Global states (cont.)
• Kinds of global state predicates:
– Stable = true in S
S’, S … S’ = true in S’
– Safety = undesirable property
S0 = initial state of system
S, S0 … S = false in S
– Liveness = desirable property
S0 = initial state of system
S, S0 … S = true in S
November 2005 Distributed systems: distributed algor 21
ithms
Global states (cont.)
• Snapshot algorithm of Chandy & Lamport
– Record consistent global state
– Assumptions:
• Neither channels nor processes fail
• Channels are unidirectional and provide FIFO-
ordered message delivery
• Graph of channels and processes is strongly
connected
• Any process may initiate a global snapshot
• Process may continue their execution during the
snapshot
November 2005 Distributed systems: distributed algor 22
ithms
Global states (cont.)
• Snapshot algorithm of Chandy & Lamport
– Elements of algorithm
• Players: processes Pi with
– Incoming channels
– Outgoing channels
• Marker messages
• 2 rules
– Marker receiving rule
– Marker sending rule
– Start of algorithm
• A process acts as it received a marker message
November 2005 Distributed systems: distributed algor 23
ithms
Global states (cont.)
Marker receiving rule for process pi
On pi’s receipt of a marker message over channel c:
if (pi has not yet recorded its state) it
records its process state now;
records the state of c as the empty set;
turns on recording of messages arriving over other incoming
channels;
else
pi records the state of c as the set of messages it has received
over c since it saved its state.
end if
Marker sending rule for process pi
After pi has recorded its state, for each outgoing channel c:
pi sends one marker message over c
(before it sends any other message over c).
November 2005 Distributed systems: distributed algor 24
ithms
Global states (cont.)
• Example:
p1 c2 p2
c1
$1000 (none) $50 2000
account widgets account widgets
November 2005 Distributed systems: distributed algor 25
ithms
Global states (cont.)
1. Global state S 0
<$1000, 0> p1 c2 (empty) p2 <$50, 2000>
c1 (empty)
2. Global state S 1
<$900, 0> p1 c2 (Order 10, $100), M p2 <$50, 2000>
c1 (empty)
3. Global state S 2
<$900, 0> p1 c2 (Order 10, $100), M p2 <$50, 1995>
c1 (five widgets)
4. Global state S 3
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 (empty) C2 = <>
(M = marker message)
November 2005 Distributed systems: distributed algor 26
ithms
Global states (cont.)
1. Global state S 0
<$1000, 0> p1 c2 (empty) p2 <$50, 2000>
c1 (empty)
4. Global state S 3
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 (empty) C2 = <>
5. Global state S 4
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 M C2 = <>
6. Global state S 5
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 (empty) C2 = <>
(M = marker message)
November 2005 Distributed systems: distributed algor 27
ithms
Global states (cont.)
• Observed state
– Corresponds to consistent cut
– Reachable!
actual execution e 0 ,e 1,...
Sinit recording recording Sfinal
begins ends
Ssnap
pre-snap: e '0 ,e '1 ,...e 'R-1 post-snap: e ' R,e 'R+1 ,...
November 2005 Distributed systems: distributed algor 28
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems
November 2005 Distributed systems: distributed algor 29
ithms
Failure detectors
• Properties
– Unreliable failure detector: answers with
• Suspected
• Unsuspected No “P is here” within T + E sec
– Reliable failure detector: answers with
• Failed
• Unsuspected No “P is here” within T + A sec
• Implementation
– Every T sec: multicast by P of “P is here”
– Maximum on message transmission time:
• Asynchronous system: estimate E
• Synchronous system: absolute bound A
November 2005 Distributed systems: distributed algor 30
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems
November 2005 Distributed systems: distributed algor 31
ithms
Mutual exclusion
• Problem: how to give a single process
temporarily a privilege?
– Privilege = the right to access a (shared)
resource
– resource = file, device, window,…
• Assumptions
– clients execute the mutual exclusion algorithm
– the resource itself might be managed by a
server
– Reliable communication
November 2005 Distributed systems: distributed algor 32
ithms
Mutual exclusion (cont.)
• Basic requirements:
– ME1: at most one process might execute
in the shared resource at any
time
(Safety)
– ME2: a process requesting access to the
shared resource is eventually
granted it (Liveness)
– ME3: Access to the shared resource should be
granted in happened-before order
(Ordering or fairness)
November 2005 Distributed systems: distributed algor 33
ithms
Mutual exclusion (cont.)
• Solutions:
– central server algorithm
– distributed algorithm using logical clocks
– ring-based algorithm
– voting algorithm
• Evaluation
– Bandwidth (= #messages to enter and exit)
– Client delay (incurred by a process at enter and exit)
– Synchronization delay (delay between exit and enter)
November 2005 Distributed systems: distributed algor 34
ithms
Mutual exclusion (cont.)
central server algorithm
• Central server offering 2 operations:
– enter()
• if resource free
then operation returns without delay
else request is queued and return from operation is
delayed
– exit()
• if request queue is empty
then resource is marked free
else return for a selected request is executed
November 2005 Distributed systems: distributed algor 35
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
User
Enter()
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 36
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
User
3
Enter()
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 37
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
User
3
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 38
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
User
3
enter()
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 39
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4
User
3
enter()
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 40
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4
User
3
enter()
enter()
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 41
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4, 2
User
3
enter()
enter()
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 42
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4, 2
User
3
enter()
enter() exit()
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 43
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4, 2
User
enter()
enter()
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 44
ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
2
User
4
enter()
P1
P4
P2 P3
November 2005 Distributed systems: distributed algor 45
ithms
Mutual exclusion (cont.)
central server algorithm
• Evaluation:
– ME3 not satisfied!
– Performance:
• single server is performance bottleneck
• Enter critical section: 2 messages
• Synchronization: 2 messages between exit of one process and
enter of next
– Failure:
• Central server is single point of failure
• what if a client, holding the resource, fails?
• Reliable communication required
November 2005 Distributed systems: distributed algor 46
ithms
Mutual exclusion (cont.)
ring-based algorithm
• All processes arranged in a
– unidirectional
– logical
ring
• token passed in ring
• process with token has access to resource
November 2005 Distributed systems: distributed algor 47
ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3
P6
P4
P5
November 2005 Distributed systems: distributed algor 48
ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3
P2 can use resource
P6
P4
P5
November 2005 Distributed systems: distributed algor 49
ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3
P2 stopped using resource
and forwarded token
P6
P4
P5
November 2005 Distributed systems: distributed algor 50
ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3
P3 doesn’t need resource
and forwards token
P6
P4
P5
November 2005 Distributed systems: distributed algor 51
ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3
P6
P4
P5
November 2005 Distributed systems: distributed algor 52
ithms
Mutual exclusion (cont.)
ring-based algorithm
• Evaluation:
– ME3 not satisfied
– efficiency
• high when high usage of resource
• high overhead when very low usage
– failure
• Process failure: loss of ring!
• Reliable communication required
November 2005 Distributed systems: distributed algor 53
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Distributed agreement algorithm
– multicast requests to all participating processes
– use resource when all other participants agree
(= reply received)
• Processes
– keep logical clock; included in all request
messages
– behave as finite state machine:
• released
• wanted
• held
November 2005 Distributed systems: distributed algor 54
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Ricart and Agrawala’s algorithm: process Pj
– on initialization:
• state := released;
– to obtain resource:
• state := wanted;
• T = logical clock value for next event;
• multicast request to other processes <T, Pj>;
• wait for n-1 replies;
• state := held;
November 2005 Distributed systems: distributed algor 55
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Ricart and Agrawala’s algorithm: process Pj
– on receipt of request <Ti, Pi> :
• if (state = held) or
(state = wanted and (T,Pj) < (Ti,Pi) )
then queue request from Pi
else reply immediately to Pi
– to release resource:
• state := released;
• reply to any queued requests;
November 2005 Distributed systems: distributed algor 56
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Ricart and Agrawala’s algorithm: example
– 3 processes
– P1 and P2 will request it concurrently
– P3 not interested in using resource
November 2005 Distributed systems: distributed algor 57
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Ricart and Agrawala’s algorithm: example
P1 P3
released released
Queue: Queue:
P2
released
Queue:
November 2005 Distributed systems: distributed algor 58
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
<41,P1> P3
wanted 0
Queue: released
Queue:
<41,P1>
P2
released
Queue:
November 2005 Distributed systems: distributed algor 59
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
<41,P1> P3
wanted 0
Queue: released
Queue:
<34,P2>
<41,P1> <34,P2>
P2
wanted 0
Queue:
November 2005 Distributed systems: distributed algor 60
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
<41,P1> P3
wanted 0
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2
wanted 0
Queue:
November 2005 Distributed systems: distributed algor 61
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
<41,P1> P3
wanted 0
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 0
Queue:
November 2005 Distributed systems: distributed algor 62
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
November 2005 Distributed systems: distributed algor 63
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1
November 2005 Distributed systems: distributed algor 64
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1
November 2005 Distributed systems: distributed algor 65
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1
November 2005 Distributed systems: distributed algor 66
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>
<41,P1> <34,P2>
P2 <45,P3>
held 2
Queue:
P1
November 2005 Distributed systems: distributed algor 67
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
wanted 1 P3
Queue: released
Queue:
P2
held 2
Queue:
P1
November 2005 Distributed systems: distributed algor 68
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
P1
wanted 1 P3
Queue: released
Queue:
P2
released 0
Queue:
November 2005 Distributed systems: distributed algor 69
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Evaluation
– Performance:
• expensive algorithm:
2 * ( n - 1 ) messages to get resource
• Client delay: round trip
• Synchronization delay: 1 message to pass section to another process
• does not solve the performance bottleneck
– Failure
• each process must know all other processes
• Reliable communication required
November 2005 Distributed systems: distributed algor 70
ithms
Mutual exclusion (cont.)
voting algorithm
• Approach of Maekawa:
– Communication with subset of partners should suffice
– Candidate collects sufficient votes
• Voting set: Vi = voting set for pi
i, j: Vi Vj
– | Vi | = K
– Pj contained in M voting sets
– Optimal solution
• K ~ N
• M=K
November 2005 Distributed systems: distributed algor 71
ithms
Mutual exclusion (cont.)
voting algorithm
On initialization
state := RELEASED; Maekawa’s
voted := FALSE; algorithm
For pi to enter the critical section
state := WANTED;
Multicast request to all processes in Vi – {pi};
Wait until (number of replies received = (K – 1));
state := HELD;
On receipt of a request from pi at pj (i ≠ j)
if (state = HELD or voted = TRUE)
then
queue request from pi without replying;
else
send reply to pi;
voted := TRUE;
end if
November 2005 Distributed systems: distributed algor 72
ithms
Mutual exclusion (cont.)
voting algorithm
For pi to exit the critical section Maekawa’s algorithm (cont.)
state := RELEASED;
Multicast release to all processes in Vi – {pi};
On receipt of a release from pi at pj (i ≠ j)
if (queue of requests is non-empty)
then
remove head of queue – from pk, say;
send reply to pk;
voted := TRUE;
else
voted := FALSE;
end if
November 2005 Distributed systems: distributed algor 73
ithms
Mutual exclusion (cont.)
voting algorithm
• Evaluation
– Properties
• ME1: OK
• ME2: NOK, deadlock possible
solution: process requests in order
• ME3: Ok
– Performance
• Bandwidth: on enter 2 N messages + on exit N messages
• Client delay: round trip
• Synchronization delay: round trip
– Failure:
• Crash of process on another voting set can be tolerated
• Reliable communication required
November 2005 Distributed systems: distributed algor 74
ithms
Mutual exclusion (cont.)
• Discussion
– algorithms are expensive and not practical
– algorithms are extremely complex in the
presence of failures
– better solution in most cases:
• let the server, managing the resource, perform
concurrency control
• gives more transparency for the clients
November 2005 Distributed systems: distributed algor 75
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems
November 2005 Distributed systems: distributed algor 76
ithms
Elections
• Problem statement:
– select a process from a group of processes
– several processes may start election
concurrently
• Main requirement:
– unique choice
– Select process with highest id
• Process id
• <1/load, process id>
November 2005 Distributed systems: distributed algor 77
ithms
Elections (cont.)
• Basic requirements:
– E1: participant pi set electedi = or
electedi = P; P process with
highest id (Safety)
– E2: all processes pi participate and set
electedi or crash
(Liveness)
not yet defined
November 2005 Distributed systems: distributed algor 78
ithms
Elections (cont.)
• Solutions:
– Bully election algorithm
– Ring based election algorithm
• Evaluation
– Bandwidth ( ~ total number of messages)
– Turnaround time (the number of serialized message
transmission times between initiation and termination
of a single run)
November 2005 Distributed systems: distributed algor 79
ithms
Elections (cont.)
Bully election
• Assumptions:
– each process has identifier
– processes can fail during an election
– communication is reliable
• Goal:
– surviving member with the largest identifier is
elected as coordinator
November 2005 Distributed systems: distributed algor 80
ithms
Elections (cont.)
Bully election
• Roles for processes:
– coordinator
• elected process
• has highest identifier, at the time of election
– initiator
• process starting the election for some reason
November 2005 Distributed systems: distributed algor 81
ithms
Elections (cont.)
Bully election
• Three types of messages:
– election message
• sent by an initiator of the election
to all other processes with a higher identifier
– answer message
• a reply message sent by the receiver of an election
message
– coordinator message
• sent by the process becoming the coordinator
to all other processes with lower identifiers
November 2005 Distributed systems: distributed algor 82
ithms
Elections (cont.)
Bully election
• Algorithm:
– send election message:
• process doing it is called initiator
• any process may do it at any time
• when a failed process is restarted, it starts an
election, even though the current coordinator is
functioning (bully)
– a process receiving an election message
• replies with an answer message
• will start an election itself (why?)
November 2005 Distributed systems: distributed algor 83
ithms
Elections (cont.)
Bully election
• Algorithm:
– actions of an initiator
• when not receiving an answer message within a
certain time (2Ttrans +Tprocess) becomes coordinator
• when having received an answer message
( a process with a higher identifier is active)
and not receiving a coordinator message (after x
time units)
will restart elections
November 2005 Distributed systems: distributed algor 84
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P1 initiator
P1 P3
election
P2 P4
November 2005 Distributed systems: distributed algor 85
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P2 and P3 reply and start election
P1 P3
election
answer
P2 P4
November 2005 Distributed systems: distributed algor 86
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
Election messages of P2 arrive
P1 P3
election
answer
P2 P4
November 2005 Distributed systems: distributed algor 87
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P3 replies
P1 P3
election
answer
P2 P4
November 2005 Distributed systems: distributed algor 88
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P3 fails
P1 P3
election
answer
P2 P4
November 2005 Distributed systems: distributed algor 89
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
Timeout at P1 : election starts again
P1 P3
election
answer
P2 P4
November 2005 Distributed systems: distributed algor 90
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
Timeout at P1 : election starts again
P1 P3
election
answer
P2 P4
November 2005 Distributed systems: distributed algor 91
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P2 replies and starts election
P1 P3
election
answer
P2 P4
November 2005 Distributed systems: distributed algor 92
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P2 receives no replies coordinator
P1 P3
election
answer
P2 P4
November 2005 Distributed systems: distributed algor 93
ithms
Elections (cont.)
Bully election
• Evaluation
– Correctness: E1 & E2 ok, if
• Reliable communication
• No process replaces crashed process
– Correctness: no guarantee for E1, if
• Crashed process is replaced by process with same id
• Assumed timeout values are inaccurate (= unreliable failure
detector)
– Performance
• Worst case: O(n2)
• Optimal: bandwidth: n-2 messages
turnaround: 1 message
November 2005 Distributed systems: distributed algor 94
ithms
Elections (cont.)
Ring based election
• Assumptions:
– processes arranged in a logical ring
– each process has an identifier: i for Pi
– processes remain functional and reachable during the
algorithm
November 2005 Distributed systems: distributed algor 95
ithms
Elections (cont.)
Ring based election
• Messages:
– forwarded over logical ring
– 2 types:
• election: used during election
contains identifier
• elected: used to announce new coordinator
• Process States:
– participant
– non-participant
November 2005 Distributed systems: distributed algor 96
ithms
Elections (cont.)
Ring based election
• Algorithm
– process initiating an election
• becomes participant
• sends election message to its neighbour
November 2005 Distributed systems: distributed algor 97
ithms
Elections (cont.)
Ring based election
P14
P1 initiator
P11
P5
11
P21 P8
November 2005 Distributed systems: distributed algor 98
ithms
Elections (cont.)
Ring based election
• Algorithm
– upon receiving an election message, a process
compares identifiers:
• Received: identifier in message
• own: identifier of process
– 3 cases:
• Received > own
• Received < own
• Received = own
November 2005 Distributed systems: distributed algor 99
ithms
Elections (cont.)
Ring based election
• Algorithm
– receive election message
• Received > own
– message forwarded
– process becomes participant
November 2005 Distributed systems: distributed algor 100
ithms
Elections (cont.)
Ring based election
P14
P1 initiator
P11
P5
11
P21 P8
November 2005 Distributed systems: distributed algor 101
ithms
Elections (cont.)
Ring based election
P14
P1 initiator
P11
P5
P21 P8
11
November 2005 Distributed systems: distributed algor 102
ithms
Elections (cont.)
Ring based election
• Algorithm
– receive election message
• Received > own
– message forwarded
– process becomes participant
• Received < own and process is non-participant
– substitutes own identifier in message
– message forwarded
– process becomes participant
November 2005 Distributed systems: distributed algor 103
ithms
Elections (cont.)
Ring based election
P14
P1 initiator
P11
P5
P21 P8
11
November 2005 Distributed systems: distributed algor 104
ithms
Elections (cont.)
Ring based election
P14
P1 initiator
P11
P5
21
P21 P8
November 2005 Distributed systems: distributed algor 105
ithms
Elections (cont.)
Ring based election
P14
P1 initiator
21 P11
P5
P21 P8
November 2005 Distributed systems: distributed algor 106
ithms
Elections (cont.)
Ring based election
P14
P1 initiator
P11
P5
P21 P8
21
November 2005 Distributed systems: distributed algor 107
ithms
Elections (cont.)
Ring based election
• Algorithm:
– receive election message
• Received > own
– ...
• Received < own and process is non-participant
– ...
• Received = own
– identifier must be greatest
– process becomes coordinator
– new state: non-participant
– sends elected message to neighbour
November 2005 Distributed systems: distributed algor 108
ithms
Elections (cont.)
Ring based election
P14
P1
P11
P5 coordinator
21 P8
P21
November 2005 Distributed systems: distributed algor 109
ithms
Elections (cont.)
Ring based election
• Algorithm receive election message
• Received > own
– message forwarded
– process becomes participant
• Received < own and process is non-participant
– substitutes own identifier in message
– message forwarded
– process becomes participant
• Received = own
– identifier must be greatest
– process becomes coordinator
– new state: non-participant
– sends elected message to neighbour
November 2005 Distributed systems: distributed algor 110
ithms
Elections (cont.)
Ring based election
• Algorithm
– receive elected message
• participant:
– new state: non-participant
– forwards message
• coordinator:
– election process completed
November 2005 Distributed systems: distributed algor 111
ithms
Elections (cont.)
Ring based election
P14
P1
P11
P5 coordinator
21 P8
P21
November 2005 Distributed systems: distributed algor 112
ithms
Elections (cont.)
Ring based election
P14
P1
21 P11
P5 coordinator
P21 P8
November 2005 Distributed systems: distributed algor 113
ithms
Elections (cont.)
Ring based election
21 P14
P1
P11
P5 coordinator
P21 P8
November 2005 Distributed systems: distributed algor 114
ithms
Elections (cont.)
Ring based election
P14
P1 21
P11
P5 coordinator
P21 P8
November 2005 Distributed systems: distributed algor 115
ithms
Elections (cont.)
Ring based election
P14
P1
P11
P5 coordinator 21
P21 P8
November 2005 Distributed systems: distributed algor 116
ithms
Elections (cont.)
Ring based election
P14
P1
P11
P5 coordinator
P21 P8
21
November 2005 Distributed systems: distributed algor 117
ithms
Elections (cont.)
Ring based election
P14
P1
P11
P5 coordinator
P21 P8
November 2005 Distributed systems: distributed algor 118
ithms
Elections (cont.)
Ring based election
• Evaluation
– Why is condition
Received < own and process is non-participant
necessary? (see next slide for full algorithm)
– Number of messages:
• worst case: 3 * n - 1
• best case: 2 * n
– concurrent elections: messages are extinguished
• as soon as possible
• before winning result is announced
November 2005 Distributed systems: distributed algor 119
ithms
Elections (cont.)
Ring based election
• Algorithm receive election message
• Received > own
– message forwarded
– process becomes participant
• Received < own and process is non-participant
– substitutes own identifier in message
– message forwarded
– process becomes participant
• Received = own
– identifier must be greatest
– process becomes coordinator
– new state: non-participant
– sends elected message to neighbour
November 2005 Distributed systems: distributed algor 120
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems
November 2005 Distributed systems: distributed algor 121
ithms
Multicast communication
• Essential property:
– 1 multicast operation <> multiple sends
Higher efficiency
Stronger delivery guarantees
• Operations: g = group, m = message
– X-multicast(g, m)
– X-deliver(m)
• <> receive(m)
– X additional property
Basic, Reliable, FifO,….
November 2005 Distributed systems: distributed algor 122
ithms
Multicast communication(cont.)
IP multicast
• Datagram operations
– with multicast IP address
• Failure model cfr UDP
– Omission failures
– No ordering or reliability guarantees
November 2005 Distributed systems: distributed algor 123
ithms
Multicast communication(cont.)
Basic multicast
• = IP multicast + delivery guarantee
if multicasting process does not crash
• Straightforward algorithm: (with reliable send)
To B-multicast(g, m):
p g: send(p, m)
On receive(m) at p:
B-deliver(m)
• Ex. practical algorithm using IP-multicast
November 2005 Distributed systems: distributed algor 124
ithms
Multicast communication(cont.)
Reliable multicast
• Properties:
– Integrity (safety)
• A correct process delivers a message at most once
– Validity (liveness)
• Correct process p multicasts m p delivers m
– Agreement (liveness)
correct process p delivering m
all correct processes will deliver m
– Uniform agreement (liveness)
process p (correct or failing) delivering m
all correct processes will deliver m
November 2005 Distributed systems: distributed algor 125
ithms
Multicast communication(cont.)
Reliable multicast
• 2 algorithms:
1. Using B-multicast
2. Using IP-multicast + piggy backed acks
November 2005 Distributed systems: distributed algor 126
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 1 with B-multicast
November 2005 Distributed systems: distributed algor 127
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 1 with B-multicast
• Correct?
– Integrity
– Validity
– Agreement
• Efficient?
– NO: each message transmitted g times
November 2005 Distributed systems: distributed algor 128
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
Message
processing
deliver
Hold-back
queue Delivery queue
When delivery
guarantees are
met
Incoming
messages
November 2005 Distributed systems: distributed algor 129
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
Data structures at process p:
Sgp : sequence number
Rgq : sequence number of the latest message it has delivered from q
On initialization:
Sgp = 0
For process p to R-multicast message m to group g
IP-multicast (g, <m, Sgp , <q, Rgq > >)
Sgp ++
On IP-deliver (<m, S, <q, R>>) at q from p
November 2005 Distributed systems: distributed algor 130
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
On IP-deliver (<m, S, <q, R>>) at q from p
if S = Rgp + 1
then R-deliver (m)
Rgp ++
check hold-back queue
else if S > Rgp + 1
then store m in hold-back queue
request missing messages endif
endif
if R > Rg then request missing messages endif
November 2005 Distributed systems: distributed algor 131
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
• Correct?
– Integrity: seq numbers + checksums
– Validity: if missing messages are detected
– Agreement: if copy of message remains available
November 2005 Distributed systems: distributed algor 132
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• 3 processes in group: P, Q, R
• State of process:
– S: Next_sequence_number
– Rq: Already_delivered from Q
– Stored messages P: 2
• Presentation: Q: 3 R: 5
<>
November 2005 Distributed systems: distributed algor 133
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Initial state:
P: 0
Q: -1 R: -1
<>
Q: 0 R: 0
P: -1 R: -1 P: -1 Q: -1
<> <>
November 2005 Distributed systems: distributed algor 134
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• First multicast by P:
P: 1
Q: -1 R: -1
< mp0 > P: mp0, 0, <Q:-1, R:-1>
Q: 0 R: 0
P: -1 R: -1 P: -1 Q: -1
<> <>
November 2005 Distributed systems: distributed algor 135
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Arrival multicast by P at Q:
P: 1
Q: -1 R: -1
< mp0 > P: mp0, 0, <Q:-1, R:-1>
Q: 0 R: 0
P: 0 R: -1 P: -1 Q: -1
< mp0 > <>
November 2005 Distributed systems: distributed algor 136
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• New state:
P: 1
Q: -1 R: -1
< mp0 >
Q: 0 R: 0
P: 0 R: -1 P: -1 Q: -1
< mp0 > <>
November 2005 Distributed systems: distributed algor 137
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Multicast by Q:
P: 1
Q: -1 R: -1
Q: mq0, 0, <P:0, R:-1> < mp0 >
Q: 1 R: 0
P: 0 R: -1 P: -1 Q: -1
< mp0 ,mq0 > <>
November 2005 Distributed systems: distributed algor 138
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Arrival of multicast by Q:
P: 1
Q: 0 R: -1
Q: mq0, 0, <P:0, R:-1> < mp0 ,mq0 >
Q: 1 R: 0
P: 0 R: -1 P: -1 Q: 0
< mp0 , ,mq0 > < mq0 >
November 2005 Distributed systems: distributed algor 139
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• When to delete stored messages?
P: 1
Q: 0 R: -1
< mp0 ,mq0 >
Q: 1 R: 0
P: 0 R: -1 P: -1 Q: 0
< mp0 , ,mq0 > < mq0 >
November 2005 Distributed systems: distributed algor 140
ithms
Multicast communication(cont.)
Ordered multicast
• FIFO if a correct process P:
multicast(g, m);
multicast(g, m’);
then for all correct processes:
deliver(m’)
deliver(m) before deliver(m’)
• Causal if multicast(g, m) multicast(g, m’)
then for all correct processes:
deliver(m’)
deliver(m) before deliver(m’)
• Total
November 2005 Distributed systems: distributed algor 141
ithms
Multicast communication(cont.)
Ordered multicast
• Total if p: deliver(m) deliver( m’)
then for all correct processes:
deliver(m’)
deliver(m) before deliver(m’)
• FIFO-Total = FIFO + Total
• Causal-Total = Causal + Total
• Atomic = reliable + Total
November 2005 Distributed systems: distributed algor 142
ithms
Multicast communication(cont.)
Ordered
T
multicast
1
T2
Notice the consistent
ordering of totally ordered
messages T1 and T2,
F1
the FIFO-related messages
F1 and F2 and the causally F3
F2
related messages C1 and C3
– and the otherwise
arbitrary delivery ordering of Time
messages. C1
C2
C3
P1 P2 P3
November 2005 Distributed systems: distributed algor 143
ithms
Multicast communication(cont.)
FIFO multicast
• Alg. 1: R-multicast using IP-multicast
– Correct?
• Sender assigns Sgp
• Receivers deliver in this order
• Alg. 2 on top of any B-multicast
November 2005 Distributed systems: distributed algor 144
ithms
Multicast communication(cont.)
FIFO multicast
Algorithm 2 on top of any B-multicast
Data structures at process p:
Sgp : sequence number
Rgq : sequence number of the latest message it has delivered from q
On initialization:
Sgp = 0; Rgq = -1
For process p to FO-multicast message m to group g
B-multicast ( g, <m, Sgp >)
Sgp ++
On B-deliver (<m, S >) at q from p
November 2005 Distributed systems: distributed algor 145
ithms
Multicast communication(cont.)
FIFO multicast
Algorithm 2 on top of any B-multicast
On B-deliver (<m, S >) at q from p
if S = Rgp + 1
then FO-deliver (m)
Rgp ++
check hold-back queue
else if S > Rgp + 1
then store m in hold-back queue endif
endif
November 2005 Distributed systems: distributed algor 146
ithms
Multicast communication(cont.)
TOTAL multicast
• Basic approach:
– Sender: assign totally ordered identifiers iso process
ids
– Receiver: deliver as for FIFO ordering
• Alg. 1: use a (single) sequencer process
• Alg. 2: participants collectively agree on the
assignment of sequence numbers
November 2005 Distributed systems: distributed algor 147
ithms
Multicast communication(cont.)
TOTAL multicast: sequencer process
November 2005 Distributed systems: distributed algor 148
ithms
Multicast communication(cont.)
TOTAL multicast: sequencer process
• Correct?
• Problems?
– A single sequencer process
• bottleneck
• single point of failure
November 2005 Distributed systems: distributed algor 149
ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
• Approach:
– Sender:
• B-multicasts message
– Receivers:
• Propose sequence numbers to sender
– Sender:
• uses returned sequence numbers to
generate agreed sequence number
November 2005 Distributed systems: distributed algor 150
ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
P2
1 Message
3
22 ed Seq P4
s
2P ropo
1
3 Agreed Seq
1
2 P1
3
P3
November 2005 Distributed systems: distributed algor 151
ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
Data structures at process p:
Agp : largest agreed sequence number
Pgp : largest proposed sequence number by P
On initialization:
Pgp = 0
For process p to TO-multicast message m to group g
B-multicast ( g, <m, i >) /i = unique id for m
On B-deliver (<m, i> ) at q from p
Pgq = max (Agq, Pgq) + 1
send (p, < i, Pgq >)
store <m, i, Pgq > in hold-back queue
November 2005 Distributed systems: distributed algor 152
ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
On receive( i, P) at p from q
wait for all replies; a is the largest reply
B-multicast(g, < “order”, i, a >)
On B-deliver (<“order”, i, a > ) at q from p
Agq = max (Agq, a)
attach a to message i in hold-back queue
reorder messages in hold-back queue (increasing sequence numbers)
while message m in front of hold-back queue has been assigned
an agreed sequence number
do remove m from hold-back queue
TO-deliver (m)
November 2005 Distributed systems: distributed algor 153
ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
• Correct?
– Processes will agree on sequence number for
a message
– Sequence numbers are monotonically
increasing
– No process can prematurely deliver a
message
• Performance
– 3 serial messages!
• Total ordering
– <> FIFO
– <> causal
November 2005 Distributed systems: distributed algor 154
ithms
Multicast communication(cont.)
Causal multicast
• Limitations:
– Causal order only by multicast operations
– Non-overlapping, closed groups
• Approach:
– Use vector timestamps
– Timestamp = count number of multicast
messages
November 2005 Distributed systems: distributed algor 155
ithms
Multicast communication(cont.)
Causal multicast: vector timestamps
Meaning?
November 2005 Distributed systems: distributed algor 156
ithms
Multicast communication(cont.)
Causal multicast: vector timestamps
• Correct?
– Message timestamp
m V
m’ V’
– Given
multicast(g,m) multicast(g,m’)
proof
V < V’
November 2005 Distributed systems: distributed algor 157
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems
November 2005 Distributed systems: distributed algor 158
ithms
Consensus & related problems
• System model
– N processes pi
– Communication is reliable
– Processes mail fail
• Crash
• Byzantine
– No message signing
• Message signing limits the harm a faulty process can do
• Problems
– Consensus
– Byzantine generals
– Interactive consistency
November 2005 Distributed systems: distributed algor 159
ithms
Consensus
• Problem statement
pi: undecided state & pi proposes vi
– Message exchanges
– Finally: each process pi sets decision variable di, enters
decided state and may not change di
• Requirements:
– Termination: eventually each correct process pi sets di
– Agreement: pi and pj correct & in decided state di = dj
– Integrity: correct processes all propose same value d
any process in decided state has
chosen d
November 2005 Distributed systems: distributed algor 160
ithms
Consensus
• Simple algorithm (no process failures)
– Collect processes in group g
– For each process pi:
• R-multicast(g, vi)
• Collect N values
• d = majority (v1,v2,...,vN)
• Problems with failures:
– Crash: detected? not in asynchronous systems
– Byzantine? Faulty process can send around
different values
November 2005 Distributed systems: distributed algor 161
ithms
Byzantine generals
• Problem statement
– Informal:
• agree to attack or to retreat
• commander issues the order
• lieutenants are to decide to attack or retreat
• all can be ‘treacherous’
– Formal
• One process proposes value
• Others are to agree
• Requirements:
– Termination: each process eventually decides
– Agreement: all correct processes select the same value
– Integrity: commander correct other correct processes
select value of commander
November 2005 Distributed systems: distributed algor 162
ithms
Interactive consistency
• Problem statement
– Correct processes agree upon a vector of values (one
value for each process)
• Requirements:
– Termination: each process eventually decides
– Agreement: all correct processes select the same value
– Integrity: if pi correct
then all correct processes decide on v i as
the i-the component of their vector
November 2005 Distributed systems: distributed algor 163
ithms
Related problems & solutions
• Basic solutions:
– Consensus:
• Ci(v1, v2, ..., vN) = decision value of pi
– Byzantine generals
• j is commander, proposing v
• BGi (j, v) = decision value of pi
– Interactive consensus
• ICi(v1, v2, ..., vN)[j] = jth value in the decision vector of pi
with v1, v2, ..., vN values proposed by processes
November 2005 Distributed systems: distributed algor 164
ithms
Related problems & solutions
• Derived solutions:
– IC from BG:
• Run BG N times once with each process as
commander
• ICi(v1, v2, ..., vN)[j] = BG(j, vj)
– C from IC:
• Run IC and produce vector of values
• Derive single value with appropriate function
– BG from C:
November 2005 Distributed systems: distributed algor 165
ithms
Related problems & solutions
• Derived solutions:
– IC from BG:
– C from IC:
– BG from C:
• Commander pj sends value v to itself & other
processes
• Each process runs C with the values v1, v2, ...vN
they received
• BGi(j,v) = Ci(v1, v2, ...vN)
November 2005 Distributed systems: distributed algor 166
ithms
Consensus in a synchronous system
• Assumptions
– Use B-multicast
– f of N processes may fail
• Approach
– Proceed in f + 1 rounds
– In each round correct processes B-multicast values
• Variables
– Valuesir = set of proposed values known to process pi
at the beginning of round r
November 2005 Distributed systems: distributed algor 167
ithms
Consensus in a synchronous system
November 2005 Distributed systems: distributed algor 168
ithms
Consensus in a synchronous system
• Termination?
– Synchronous system
• Correct?
– Each process arrives at the same set of values at the end of the
final round
– Proof?
• Assume sets are different ...
• Pi has value v & Pj doesn’t have value v
Pk : managed to send v to Pi and not to Pj
• Agreement & Integrity?
– Processes apply the same function
November 2005 Distributed systems: distributed algor 169
ithms
Byzantine generals
in a synchronous system
• Assumptions
– Arbitrary failures
– f of N processes may be faulty
– Channels are reliable: no message injections
– Unsigned messages
November 2005 Distributed systems: distributed algor 170
ithms
Byzantine generals
in a synchronous system
• Impossibility with 3 processes
p1 (Commander) p1 (Commander)
1:v 1:v 1:w 1:x
2:1:v 2:1:w
p2 p3 p2 p3
3:1:u 3:1:x
Faulty processes are shown shaded
• Impossibility with N <= 3f processes
November 2005 Distributed systems: distributed algor 171
ithms
Byzantine generals
in a synchronous system
• Solution with one faulty process
– N = 4, f = 1
– 2 rounds of messages:
• Commander sends its value to each of the lieutenants
• Each of the lieutenants sends the value it received to its peers
– Lieutenant receives
• Value of commander
• N-2 values of peers
Use majority function
November 2005 Distributed systems: distributed algor 172
ithms
Byzantine generals
in a synchronous system
• Solution with one faulty process
p1 (Commander) p1 (Commander)
1:v 1:v 1:u 1:w
1:v 1:v
2:1:v 2:1:u
p2 3:1:u p3 p2 3:1:w p3
4:1:v 4:1:v 4:1:v 4:1:v
2:1:v 3:1:w 2:1:u 3:1:w
p4 p4
Faulty processes are shown shaded
November 2005 Distributed systems: distributed algor 173
ithms
Consensus & related problems
• Impossibility in asynchronous systems
proof: No algorithm exists to reach consensus
– No guaranteed solution to
• Byzantine generals problem
• Interactive consistency
• Reliable & totally ordered multicast
• Approaches for work around:
– Masking faults: restart crashed process and use persistent storage
– Use failure detectors: make failure fail-silent by discarding
messages
November 2005 Distributed systems: distributed algor 174
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems
November 2005 Distributed systems: distributed algor 175
ithms
Distributed Systems:
Distributed algorithms
November 2005 Distributed systems: distributed algor 176
ithms