0% found this document useful (0 votes)
27 views

4 Distributed Algorithms

The document discusses distributed algorithms and covers topics like logical clocks for ordering events in distributed systems, using vector clocks to represent causality relationships between processes, defining global states and cuts of a distributed system, and an algorithm by Chandy and Lamport for taking snapshots of global states in a distributed system.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

4 Distributed Algorithms

The document discusses distributed algorithms and covers topics like logical clocks for ordering events in distributed systems, using vector clocks to represent causality relationships between processes, defining global states and cuts of a distributed system, and an algorithm by Chandy and Lamport for taking snapshots of global states in a distributed system.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 176

Distributed Systems:

Distributed algorithms

November 2005 Distributed systems: distributed algor 1


ithms
Overview of chapters
• Introduction
• Co-ordination models and languages
• General services
• Distributed algorithms
– Ch 10 Time and global states, 11.4-11.5
– Ch 11 Coordination and agreement, 12.1-12.5

• Shared data
• Building distributed services
November 2005 Distributed systems: distributed algor 2
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems

November 2005 Distributed systems: distributed algor 3


ithms
Logical clocks
• Problem: ordering of events
– requirement for many algorithms
– physical clocks cannot be used
• use causality:
– within a single process: observation
– between different processes: sending of a
message happens before receiving the same
message

November 2005 Distributed systems: distributed algor 4


ithms
Logical clocks (cont.)
• Formalization: happens before relation
x y
• Rules:
– if x happens before y in any process p
then x  y
– for any message m: send (m)  receive (m)
– if x  y and y  z
then x  z
• Implementation: logical clocks
November 2005 Distributed systems: distributed algor 5
ithms
Logical clocks (cont.)
• Logical clock
– counter appropriately incremented
– one counter per process

• Physical clock
– counts oscillations occurring in a crystal at a
definitive frequency

November 2005 Distributed systems: distributed algor 6


ithms
Logical clocks (cont.)
• Rules for incrementing local logical clock
1 for each event (including send) in process p:
Cp := Cp + 1
2 when a process sends a message m, it
piggybacks on m the value of Cp
3 on receiving (m, t), a process q
• computes Cq := max (Cq, t)
• applies rule 1: Cq := Cq +1
Cq is logical time for event receive(m)
November 2005 Distributed systems: distributed algor 7
ithms
Logical clocks (cont.)
• Logical timestamps: example

1 2 3
P1 •a •c •g

0 3 4
P2 • •d •e

1 5
P3 •b •f

November 2005 Distributed systems: distributed algor 8


ithms
Logical clocks (cont.)
• C(x) logical clock value for event x

• Correct usage:
if x  y then C(x) < C(y)

• Incorrect usage:
if C(x) < C(y) then x  y
• Solution: Logical vector clocks
November 2005 Distributed systems: distributed algor 9
ithms
Logical clocks (cont.)
• Vector clocks for N processes:
– at process Pi: Vi[j] for j = 1, 2,…,N
– Properties:

if x  y then V(x) < V(y)

if V(x) < V(y) then x  y

November 2005 Distributed systems: distributed algor 10


ithms
Logical clocks (cont.)
• Rules for incrementing logical vector clock
1 for each event (including send) in process P i:
Vi[i] := Vi[i] + 1
2 when a process Pi sends a message m, it
piggybacks on m the value of Vi
3 on receiving (m, t), a process Pi
• apply rule 1
• Vi[j] := max(Vi[j] , t[j]) for j = 1, 2,…, N

November 2005 Distributed systems: distributed algor 11


ithms
Logical clocks (cont.)
• Logical vector clocks : example

(1,0,0) (2,0,0)
p1
a b m1

(2,1,0) (2,2,0)
Physical
p2
time
c d m2

(0,0,1) (2,2,2)
p3
e f

November 2005 Distributed systems: distributed algor 12


ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems

November 2005 Distributed systems: distributed algor 13


ithms
Global states
• Detect global properties
p1 p2

object
reference
message
a. Garbage collection garbage object

p1 p2
wait-for

b. Deadlock wait-for

p1 p2
activate
c. Termination passive passive

November 2005 Distributed systems: distributed algor 14


ithms
Global states (cont.)
• Local states & events
– Process Pi : eik events
sik state, before event k
– History of Pi :

hi = < ei0, ei1, ei2,…>

– Finite prefix of history of Pi :

hik = < ei0, ei1, ei2,…, eik >


November 2005 Distributed systems: distributed algor 15
ithms
Global states (cont.)
• Global states & events
– Global history

H = h1  h2  h3  …  hn

– Global state (when?)

S = ( s1p, s2q, …, snu)

consistent?
– Cut of the systems execution

C = h1c1  h1c2  …  h1cn


November 2005 Distributed systems: distributed algor 16
ithms
Global states (cont.)
• Example of cuts:

0 1 2 3
e1 e1 e1 e1
p1

m1 m2

p2 Physical
0 1 2 time
e2 e 2 e 2

Inconsistent cut
Consistent cut

November 2005 Distributed systems: distributed algor 17


ithms
Global states (cont.)
• Finite prefix of history of Pi :
hik = < ei0, ei1, ei2,…, eik >
• Cut of the systems execution
C = h1c1  h1c2  …  h1cn

• Consistent cut C
 e  C, f  e  f  C
• Consistent global state
corresponds to consistent cut
November 2005 Distributed systems: distributed algor 18
ithms
Global states (cont.)
• Model execution of a (distributed) system

S0  S1  S2  S3  …

– Series of transitions between consistent states


– Each transition corresponds to one single event
• Internal event
• Sending message
• Receiving message
– Simultaneous events
 order events
November 2005 Distributed systems: distributed algor 19
ithms
Global states (cont.)
• Definitions:
– Run = ordering of all events (in a global history)
consistent with each local history’s
ordering
– Linearization =
consistent run +
consistent with 
– S’ reachable from S
 linearization: … S  …  S’ …

November 2005 Distributed systems: distributed algor 20


ithms
Global states (cont.)
• Kinds of global state predicates:
– Stable = true in S
S’, S  …  S’  = true in S’

– Safety  = undesirable property


S0 = initial state of system
S, S0  …  S   = false in S

– Liveness  = desirable property


S0 = initial state of system
S, S0  …  S   = true in S
November 2005 Distributed systems: distributed algor 21
ithms
Global states (cont.)
• Snapshot algorithm of Chandy & Lamport
– Record consistent global state
– Assumptions:
• Neither channels nor processes fail
• Channels are unidirectional and provide FIFO-
ordered message delivery
• Graph of channels and processes is strongly
connected
• Any process may initiate a global snapshot
• Process may continue their execution during the
snapshot

November 2005 Distributed systems: distributed algor 22


ithms
Global states (cont.)
• Snapshot algorithm of Chandy & Lamport
– Elements of algorithm
• Players: processes Pi with
– Incoming channels
– Outgoing channels
• Marker messages
• 2 rules
– Marker receiving rule
– Marker sending rule
– Start of algorithm
• A process acts as it received a marker message

November 2005 Distributed systems: distributed algor 23


ithms
Global states (cont.)
Marker receiving rule for process pi
On pi’s receipt of a marker message over channel c:
if (pi has not yet recorded its state) it
records its process state now;
records the state of c as the empty set;
turns on recording of messages arriving over other incoming
channels;
else
pi records the state of c as the set of messages it has received
over c since it saved its state.
end if

Marker sending rule for process pi


After pi has recorded its state, for each outgoing channel c:
pi sends one marker message over c
(before it sends any other message over c).
November 2005 Distributed systems: distributed algor 24
ithms
Global states (cont.)
• Example:

p1 c2 p2
c1

$1000 (none) $50 2000

account widgets account widgets

November 2005 Distributed systems: distributed algor 25


ithms
Global states (cont.)
1. Global state S 0
<$1000, 0> p1 c2 (empty) p2 <$50, 2000>

c1 (empty)

2. Global state S 1
<$900, 0> p1 c2 (Order 10, $100), M p2 <$50, 2000>

c1 (empty)

3. Global state S 2
<$900, 0> p1 c2 (Order 10, $100), M p2 <$50, 1995>

c1 (five widgets)

4. Global state S 3
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 (empty) C2 = <>

(M = marker message)
November 2005 Distributed systems: distributed algor 26
ithms
Global states (cont.)
1. Global state S 0
<$1000, 0> p1 c2 (empty) p2 <$50, 2000>

c1 (empty)

4. Global state S 3
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 (empty) C2 = <>

5. Global state S 4
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 M C2 = <>

6. Global state S 5
<$900, 5> p1 c2 (Order 10, $100) p2 <$50, 1995>
C1=<(five widgets)> c1 (empty) C2 = <>

(M = marker message)
November 2005 Distributed systems: distributed algor 27
ithms
Global states (cont.)
• Observed state
– Corresponds to consistent cut
– Reachable!
actual execution e 0 ,e 1,...

Sinit recording recording Sfinal


begins ends

Ssnap
pre-snap: e '0 ,e '1 ,...e 'R-1 post-snap: e ' R,e 'R+1 ,...

November 2005 Distributed systems: distributed algor 28


ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems

November 2005 Distributed systems: distributed algor 29


ithms
Failure detectors
• Properties
– Unreliable failure detector: answers with
• Suspected
• Unsuspected No “P is here” within T + E sec
– Reliable failure detector: answers with
• Failed
• Unsuspected No “P is here” within T + A sec

• Implementation
– Every T sec: multicast by P of “P is here”
– Maximum on message transmission time:
• Asynchronous system: estimate E
• Synchronous system: absolute bound A

November 2005 Distributed systems: distributed algor 30


ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems

November 2005 Distributed systems: distributed algor 31


ithms
Mutual exclusion
• Problem: how to give a single process
temporarily a privilege?
– Privilege = the right to access a (shared)
resource
– resource = file, device, window,…
• Assumptions
– clients execute the mutual exclusion algorithm
– the resource itself might be managed by a
server
– Reliable communication
November 2005 Distributed systems: distributed algor 32
ithms
Mutual exclusion (cont.)
• Basic requirements:
– ME1: at most one process might execute
in the shared resource at any
time
(Safety)
– ME2: a process requesting access to the
shared resource is eventually
granted it (Liveness)
– ME3: Access to the shared resource should be
granted in happened-before order
(Ordering or fairness)
November 2005 Distributed systems: distributed algor 33
ithms
Mutual exclusion (cont.)
• Solutions:
– central server algorithm
– distributed algorithm using logical clocks
– ring-based algorithm
– voting algorithm
• Evaluation
– Bandwidth (= #messages to enter and exit)
– Client delay (incurred by a process at enter and exit)
– Synchronization delay (delay between exit and enter)

November 2005 Distributed systems: distributed algor 34


ithms
Mutual exclusion (cont.)
central server algorithm
• Central server offering 2 operations:
– enter()
• if resource free
then operation returns without delay
else request is queued and return from operation is
delayed
– exit()
• if request queue is empty
then resource is marked free
else return for a selected request is executed

November 2005 Distributed systems: distributed algor 35


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:

User

Enter()
P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 36


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:

User
3

Enter()
P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 37


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:

User
3

P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 38


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:

User
3
enter()

P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 39


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4
User
3
enter()

P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 40


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4
User
3
enter()

enter()
P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 41


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4, 2
User
3
enter()

enter()
P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 42


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4, 2
User
3
enter()

enter() exit()
P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 43


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
4, 2
User

enter()

enter()
P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 44


ithms
Mutual exclusion (cont.)
central server algorithm
• Example:
Server
Queue:
2
User
4

enter()
P1
P4
P2 P3

November 2005 Distributed systems: distributed algor 45


ithms
Mutual exclusion (cont.)
central server algorithm
• Evaluation:
– ME3 not satisfied!
– Performance:
• single server is performance bottleneck
• Enter critical section: 2 messages
• Synchronization: 2 messages between exit of one process and
enter of next
– Failure:
• Central server is single point of failure
• what if a client, holding the resource, fails?
• Reliable communication required
November 2005 Distributed systems: distributed algor 46
ithms
Mutual exclusion (cont.)
ring-based algorithm
• All processes arranged in a
– unidirectional
– logical
ring

• token passed in ring

• process with token has access to resource


November 2005 Distributed systems: distributed algor 47
ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3

P6
P4
P5

November 2005 Distributed systems: distributed algor 48


ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3
P2 can use resource

P6
P4
P5

November 2005 Distributed systems: distributed algor 49


ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3
P2 stopped using resource
and forwarded token
P6
P4
P5

November 2005 Distributed systems: distributed algor 50


ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3
P3 doesn’t need resource
and forwards token
P6
P4
P5

November 2005 Distributed systems: distributed algor 51


ithms
Mutual exclusion (cont.)
ring-based algorithm
P2
P1
P3

P6
P4
P5

November 2005 Distributed systems: distributed algor 52


ithms
Mutual exclusion (cont.)
ring-based algorithm
• Evaluation:
– ME3 not satisfied
– efficiency
• high when high usage of resource
• high overhead when very low usage
– failure
• Process failure: loss of ring!
• Reliable communication required

November 2005 Distributed systems: distributed algor 53


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Distributed agreement algorithm
– multicast requests to all participating processes
– use resource when all other participants agree
(= reply received)
• Processes
– keep logical clock; included in all request
messages
– behave as finite state machine:
• released
• wanted
• held
November 2005 Distributed systems: distributed algor 54
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Ricart and Agrawala’s algorithm: process Pj
– on initialization:
• state := released;
– to obtain resource:
• state := wanted;
• T = logical clock value for next event;
• multicast request to other processes <T, Pj>;
• wait for n-1 replies;
• state := held;
November 2005 Distributed systems: distributed algor 55
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Ricart and Agrawala’s algorithm: process Pj
– on receipt of request <Ti, Pi> :
• if (state = held) or
(state = wanted and (T,Pj) < (Ti,Pi) )
then queue request from Pi
else reply immediately to Pi
– to release resource:
• state := released;
• reply to any queued requests;
November 2005 Distributed systems: distributed algor 56
ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Ricart and Agrawala’s algorithm: example

– 3 processes
– P1 and P2 will request it concurrently
– P3 not interested in using resource

November 2005 Distributed systems: distributed algor 57


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Ricart and Agrawala’s algorithm: example

P1 P3
released released
Queue: Queue:

P2
released
Queue:

November 2005 Distributed systems: distributed algor 58


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
<41,P1> P3
wanted 0
Queue: released
Queue:

<41,P1>
P2
released
Queue:

November 2005 Distributed systems: distributed algor 59


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
<41,P1> P3
wanted 0
Queue: released
Queue:
<34,P2>

<41,P1> <34,P2>
P2
wanted 0
Queue:

November 2005 Distributed systems: distributed algor 60


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
<41,P1> P3
wanted 0
Queue: released
<43,P3>
Queue:
<34,P2>

<41,P1> <34,P2>
P2
wanted 0
Queue:

November 2005 Distributed systems: distributed algor 61


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
<41,P1> P3
wanted 0
Queue: released
<43,P3>
Queue:
<34,P2>

<41,P1> <34,P2>
P2 <45,P3>
wanted 0
Queue:

November 2005 Distributed systems: distributed algor 62


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>

<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:

November 2005 Distributed systems: distributed algor 63


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>

<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1

November 2005 Distributed systems: distributed algor 64


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>

<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1

November 2005 Distributed systems: distributed algor 65


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>

<41,P1> <34,P2>
P2 <45,P3>
wanted 1
Queue:
P1

November 2005 Distributed systems: distributed algor 66


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
<41,P1> P3
wanted 1
Queue: released
<43,P3>
Queue:
<34,P2>

<41,P1> <34,P2>
P2 <45,P3>
held 2
Queue:
P1

November 2005 Distributed systems: distributed algor 67


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
wanted 1 P3
Queue: released
Queue:

P2
held 2
Queue:
P1

November 2005 Distributed systems: distributed algor 68


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks

P1
wanted 1 P3
Queue: released
Queue:

P2
released 0
Queue:

November 2005 Distributed systems: distributed algor 69


ithms
Mutual exclusion (cont.)
distributed algorithm using logical clocks
• Evaluation
– Performance:
• expensive algorithm:
2 * ( n - 1 ) messages to get resource
• Client delay: round trip
• Synchronization delay: 1 message to pass section to another process
• does not solve the performance bottleneck
– Failure
• each process must know all other processes
• Reliable communication required

November 2005 Distributed systems: distributed algor 70


ithms
Mutual exclusion (cont.)
voting algorithm
• Approach of Maekawa:
– Communication with subset of partners should suffice
– Candidate collects sufficient votes
• Voting set: Vi = voting set for pi
  i, j: Vi  Vj  
– | Vi | = K
– Pj contained in M voting sets
– Optimal solution
• K ~ N
• M=K

November 2005 Distributed systems: distributed algor 71


ithms
Mutual exclusion (cont.)
voting algorithm
On initialization
state := RELEASED; Maekawa’s
voted := FALSE; algorithm
For pi to enter the critical section
state := WANTED;
Multicast request to all processes in Vi – {pi};
Wait until (number of replies received = (K – 1));
state := HELD;
On receipt of a request from pi at pj (i ≠ j)
if (state = HELD or voted = TRUE)
then
queue request from pi without replying;
else
send reply to pi;
voted := TRUE;
end if
November 2005 Distributed systems: distributed algor 72
ithms
Mutual exclusion (cont.)
voting algorithm
For pi to exit the critical section Maekawa’s algorithm (cont.)
state := RELEASED;
Multicast release to all processes in Vi – {pi};

On receipt of a release from pi at pj (i ≠ j)


if (queue of requests is non-empty)
then
remove head of queue – from pk, say;
send reply to pk;
voted := TRUE;
else
voted := FALSE;
end if

November 2005 Distributed systems: distributed algor 73


ithms
Mutual exclusion (cont.)
voting algorithm
• Evaluation
– Properties
• ME1: OK
• ME2: NOK, deadlock possible
solution: process requests in  order
• ME3: Ok
– Performance
• Bandwidth: on enter 2 N messages + on exit N messages
• Client delay: round trip
• Synchronization delay: round trip
– Failure:
• Crash of process on another voting set can be tolerated
• Reliable communication required

November 2005 Distributed systems: distributed algor 74


ithms
Mutual exclusion (cont.)
• Discussion
– algorithms are expensive and not practical
– algorithms are extremely complex in the
presence of failures
– better solution in most cases:
• let the server, managing the resource, perform
concurrency control
• gives more transparency for the clients

November 2005 Distributed systems: distributed algor 75


ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems

November 2005 Distributed systems: distributed algor 76


ithms
Elections
• Problem statement:
– select a process from a group of processes
– several processes may start election
concurrently
• Main requirement:
– unique choice
– Select process with highest id
• Process id
• <1/load, process id>

November 2005 Distributed systems: distributed algor 77


ithms
Elections (cont.)
• Basic requirements:
– E1: participant pi set electedi = or
electedi = P; P process with
highest id (Safety)
– E2: all processes pi participate and set
electedi  or crash
(Liveness)

 not yet defined

November 2005 Distributed systems: distributed algor 78


ithms
Elections (cont.)
• Solutions:
– Bully election algorithm
– Ring based election algorithm
• Evaluation
– Bandwidth ( ~ total number of messages)
– Turnaround time (the number of serialized message
transmission times between initiation and termination
of a single run)

November 2005 Distributed systems: distributed algor 79


ithms
Elections (cont.)
Bully election
• Assumptions:
– each process has identifier
– processes can fail during an election
– communication is reliable
• Goal:
– surviving member with the largest identifier is
elected as coordinator

November 2005 Distributed systems: distributed algor 80


ithms
Elections (cont.)
Bully election
• Roles for processes:
– coordinator
• elected process
• has highest identifier, at the time of election
– initiator
• process starting the election for some reason

November 2005 Distributed systems: distributed algor 81


ithms
Elections (cont.)
Bully election
• Three types of messages:
– election message
• sent by an initiator of the election
to all other processes with a higher identifier
– answer message
• a reply message sent by the receiver of an election
message
– coordinator message
• sent by the process becoming the coordinator
to all other processes with lower identifiers
November 2005 Distributed systems: distributed algor 82
ithms
Elections (cont.)
Bully election
• Algorithm:
– send election message:
• process doing it is called initiator
• any process may do it at any time
• when a failed process is restarted, it starts an
election, even though the current coordinator is
functioning (bully)
– a process receiving an election message
• replies with an answer message
• will start an election itself (why?)
November 2005 Distributed systems: distributed algor 83
ithms
Elections (cont.)
Bully election
• Algorithm:
– actions of an initiator
• when not receiving an answer message within a
certain time (2Ttrans +Tprocess) becomes coordinator
• when having received an answer message
( a process with a higher identifier is active)
and not receiving a coordinator message (after x
time units)
will restart elections
November 2005 Distributed systems: distributed algor 84
ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P1 initiator
P1 P3
election

P2 P4

November 2005 Distributed systems: distributed algor 85


ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P2 and P3 reply and start election
P1 P3
election

answer

P2 P4

November 2005 Distributed systems: distributed algor 86


ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
Election messages of P2 arrive
P1 P3
election

answer

P2 P4

November 2005 Distributed systems: distributed algor 87


ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P3 replies
P1 P3
election

answer

P2 P4

November 2005 Distributed systems: distributed algor 88


ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P3 fails
P1 P3
election

answer

P2 P4

November 2005 Distributed systems: distributed algor 89


ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
Timeout at P1 : election starts again
P1 P3
election

answer

P2 P4

November 2005 Distributed systems: distributed algor 90


ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
Timeout at P1 : election starts again
P1 P3
election

answer

P2 P4

November 2005 Distributed systems: distributed algor 91


ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P2 replies and starts election
P1 P3
election

answer

P2 P4

November 2005 Distributed systems: distributed algor 92


ithms
Elections (cont.)
Bully election
• Example: election of P2 after failure of P3 and P4
P2 receives no replies coordinator
P1 P3
election

answer

P2 P4

November 2005 Distributed systems: distributed algor 93


ithms
Elections (cont.)
Bully election
• Evaluation
– Correctness: E1 & E2 ok, if
• Reliable communication
• No process replaces crashed process
– Correctness: no guarantee for E1, if
• Crashed process is replaced by process with same id
• Assumed timeout values are inaccurate (= unreliable failure
detector)
– Performance
• Worst case: O(n2)
• Optimal: bandwidth: n-2 messages
turnaround: 1 message

November 2005 Distributed systems: distributed algor 94


ithms
Elections (cont.)
Ring based election
• Assumptions:
– processes arranged in a logical ring

– each process has an identifier: i for Pi

– processes remain functional and reachable during the


algorithm

November 2005 Distributed systems: distributed algor 95


ithms
Elections (cont.)
Ring based election
• Messages:
– forwarded over logical ring
– 2 types:
• election: used during election
contains identifier
• elected: used to announce new coordinator
• Process States:
– participant
– non-participant

November 2005 Distributed systems: distributed algor 96


ithms
Elections (cont.)
Ring based election
• Algorithm
– process initiating an election
• becomes participant
• sends election message to its neighbour

November 2005 Distributed systems: distributed algor 97


ithms
Elections (cont.)
Ring based election
P14
P1 initiator

P11

P5
11

P21 P8

November 2005 Distributed systems: distributed algor 98


ithms
Elections (cont.)
Ring based election
• Algorithm
– upon receiving an election message, a process
compares identifiers:
• Received: identifier in message
• own: identifier of process
– 3 cases:
• Received > own
• Received < own
• Received = own

November 2005 Distributed systems: distributed algor 99


ithms
Elections (cont.)
Ring based election
• Algorithm
– receive election message
• Received > own
– message forwarded
– process becomes participant

November 2005 Distributed systems: distributed algor 100


ithms
Elections (cont.)
Ring based election
P14
P1 initiator

P11

P5
11

P21 P8

November 2005 Distributed systems: distributed algor 101


ithms
Elections (cont.)
Ring based election
P14
P1 initiator

P11

P5

P21 P8
11

November 2005 Distributed systems: distributed algor 102


ithms
Elections (cont.)
Ring based election
• Algorithm
– receive election message
• Received > own
– message forwarded
– process becomes participant
• Received < own and process is non-participant
– substitutes own identifier in message
– message forwarded
– process becomes participant

November 2005 Distributed systems: distributed algor 103


ithms
Elections (cont.)
Ring based election
P14
P1 initiator

P11

P5

P21 P8
11

November 2005 Distributed systems: distributed algor 104


ithms
Elections (cont.)
Ring based election
P14
P1 initiator

P11

P5

21
P21 P8

November 2005 Distributed systems: distributed algor 105


ithms
Elections (cont.)
Ring based election
P14
P1 initiator

21 P11

P5

P21 P8

November 2005 Distributed systems: distributed algor 106


ithms
Elections (cont.)
Ring based election
P14
P1 initiator

P11

P5

P21 P8
21

November 2005 Distributed systems: distributed algor 107


ithms
Elections (cont.)
Ring based election
• Algorithm:
– receive election message
• Received > own
– ...
• Received < own and process is non-participant
– ...
• Received = own
– identifier must be greatest
– process becomes coordinator
– new state: non-participant
– sends elected message to neighbour

November 2005 Distributed systems: distributed algor 108


ithms
Elections (cont.)
Ring based election
P14
P1

P11

P5 coordinator

21 P8
P21

November 2005 Distributed systems: distributed algor 109


ithms
Elections (cont.)
Ring based election
• Algorithm receive election message
• Received > own
– message forwarded
– process becomes participant
• Received < own and process is non-participant
– substitutes own identifier in message
– message forwarded
– process becomes participant
• Received = own
– identifier must be greatest
– process becomes coordinator
– new state: non-participant
– sends elected message to neighbour
November 2005 Distributed systems: distributed algor 110
ithms
Elections (cont.)
Ring based election
• Algorithm
– receive elected message
• participant:
– new state: non-participant
– forwards message
• coordinator:
– election process completed

November 2005 Distributed systems: distributed algor 111


ithms
Elections (cont.)
Ring based election
P14
P1

P11

P5 coordinator

21 P8
P21

November 2005 Distributed systems: distributed algor 112


ithms
Elections (cont.)
Ring based election
P14
P1

21 P11

P5 coordinator

P21 P8

November 2005 Distributed systems: distributed algor 113


ithms
Elections (cont.)
Ring based election
21 P14
P1

P11

P5 coordinator

P21 P8

November 2005 Distributed systems: distributed algor 114


ithms
Elections (cont.)
Ring based election
P14
P1 21

P11

P5 coordinator

P21 P8

November 2005 Distributed systems: distributed algor 115


ithms
Elections (cont.)
Ring based election
P14
P1

P11

P5 coordinator 21

P21 P8

November 2005 Distributed systems: distributed algor 116


ithms
Elections (cont.)
Ring based election
P14
P1

P11

P5 coordinator

P21 P8
21

November 2005 Distributed systems: distributed algor 117


ithms
Elections (cont.)
Ring based election
P14
P1

P11

P5 coordinator

P21 P8

November 2005 Distributed systems: distributed algor 118


ithms
Elections (cont.)
Ring based election
• Evaluation
– Why is condition
Received < own and process is non-participant
necessary? (see next slide for full algorithm)
– Number of messages:
• worst case: 3 * n - 1
• best case: 2 * n
– concurrent elections: messages are extinguished
• as soon as possible
• before winning result is announced

November 2005 Distributed systems: distributed algor 119


ithms
Elections (cont.)
Ring based election
• Algorithm receive election message
• Received > own
– message forwarded
– process becomes participant
• Received < own and process is non-participant
– substitutes own identifier in message
– message forwarded
– process becomes participant
• Received = own
– identifier must be greatest
– process becomes coordinator
– new state: non-participant
– sends elected message to neighbour
November 2005 Distributed systems: distributed algor 120
ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems

November 2005 Distributed systems: distributed algor 121


ithms
Multicast communication
• Essential property:
– 1 multicast operation <> multiple sends
Higher efficiency
Stronger delivery guarantees
• Operations: g = group, m = message
– X-multicast(g, m)
– X-deliver(m)
• <> receive(m)
– X  additional property
Basic, Reliable, FifO,….
November 2005 Distributed systems: distributed algor 122
ithms
Multicast communication(cont.)
IP multicast
• Datagram operations
– with multicast IP address
• Failure model cfr UDP
– Omission failures
– No ordering or reliability guarantees

November 2005 Distributed systems: distributed algor 123


ithms
Multicast communication(cont.)
Basic multicast
• = IP multicast + delivery guarantee
if multicasting process does not crash
• Straightforward algorithm: (with reliable send)
To B-multicast(g, m):
p  g: send(p, m)
On receive(m) at p:
B-deliver(m)

• Ex. practical algorithm using IP-multicast


November 2005 Distributed systems: distributed algor 124
ithms
Multicast communication(cont.)
Reliable multicast
• Properties:
– Integrity (safety)
• A correct process delivers a message at most once
– Validity (liveness)
• Correct process p multicasts m  p delivers m
– Agreement (liveness)
  correct process p delivering m
 all correct processes will deliver m
– Uniform agreement (liveness)
  process p (correct or failing) delivering m
 all correct processes will deliver m

November 2005 Distributed systems: distributed algor 125


ithms
Multicast communication(cont.)
Reliable multicast
• 2 algorithms:

1. Using B-multicast

2. Using IP-multicast + piggy backed acks

November 2005 Distributed systems: distributed algor 126


ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 1 with B-multicast

November 2005 Distributed systems: distributed algor 127


ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 1 with B-multicast
• Correct?
– Integrity
– Validity
– Agreement
• Efficient?
– NO: each message transmitted g times

November 2005 Distributed systems: distributed algor 128


ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast

Message
processing

deliver
Hold-back
queue Delivery queue

When delivery
guarantees are
met
Incoming
messages
November 2005 Distributed systems: distributed algor 129
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
Data structures at process p:
Sgp : sequence number

Rgq : sequence number of the latest message it has delivered from q


On initialization:
Sgp = 0
For process p to R-multicast message m to group g
IP-multicast (g, <m, Sgp , <q, Rgq > >)

Sgp ++
On IP-deliver (<m, S, <q, R>>) at q from p
November 2005 Distributed systems: distributed algor 130
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
On IP-deliver (<m, S, <q, R>>) at q from p
if S = Rgp + 1
then R-deliver (m)
Rgp ++
check hold-back queue
else if S > Rgp + 1
then store m in hold-back queue
request missing messages endif
endif
if R > Rg then request missing messages endif
November 2005 Distributed systems: distributed algor 131
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast
• Correct?
– Integrity: seq numbers + checksums
– Validity: if missing messages are detected
– Agreement: if copy of message remains available

November 2005 Distributed systems: distributed algor 132


ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• 3 processes in group: P, Q, R
• State of process:
– S: Next_sequence_number
– Rq: Already_delivered from Q
– Stored messages P: 2
• Presentation: Q: 3 R: 5
<>

November 2005 Distributed systems: distributed algor 133


ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Initial state:
P: 0
Q: -1 R: -1
<>

Q: 0 R: 0
P: -1 R: -1 P: -1 Q: -1
<> <>
November 2005 Distributed systems: distributed algor 134
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• First multicast by P:
P: 1
Q: -1 R: -1
< mp0 > P: mp0, 0, <Q:-1, R:-1>

Q: 0 R: 0
P: -1 R: -1 P: -1 Q: -1
<> <>
November 2005 Distributed systems: distributed algor 135
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Arrival multicast by P at Q:
P: 1
Q: -1 R: -1
< mp0 > P: mp0, 0, <Q:-1, R:-1>

Q: 0 R: 0
P: 0 R: -1 P: -1 Q: -1
< mp0 > <>
November 2005 Distributed systems: distributed algor 136
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• New state:
P: 1
Q: -1 R: -1
< mp0 >

Q: 0 R: 0
P: 0 R: -1 P: -1 Q: -1
< mp0 > <>
November 2005 Distributed systems: distributed algor 137
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Multicast by Q:
P: 1
Q: -1 R: -1
Q: mq0, 0, <P:0, R:-1> < mp0 >

Q: 1 R: 0
P: 0 R: -1 P: -1 Q: -1
< mp0 ,mq0 > <>
November 2005 Distributed systems: distributed algor 138
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• Arrival of multicast by Q:
P: 1
Q: 0 R: -1
Q: mq0, 0, <P:0, R:-1> < mp0 ,mq0 >

Q: 1 R: 0
P: 0 R: -1 P: -1 Q: 0
< mp0 , ,mq0 > < mq0 >
November 2005 Distributed systems: distributed algor 139
ithms
Multicast communication(cont.)
Reliable multicast
Algorithm 2 with IP-multicast: example
• When to delete stored messages?
P: 1
Q: 0 R: -1
< mp0 ,mq0 >

Q: 1 R: 0
P: 0 R: -1 P: -1 Q: 0
< mp0 , ,mq0 > < mq0 >
November 2005 Distributed systems: distributed algor 140
ithms
Multicast communication(cont.)
Ordered multicast
• FIFO if a correct process P:
multicast(g, m);
multicast(g, m’);
then for all correct processes:
deliver(m’) 
deliver(m) before deliver(m’)

• Causal if multicast(g, m)  multicast(g, m’)


then for all correct processes:
deliver(m’) 
deliver(m) before deliver(m’)

• Total
November 2005 Distributed systems: distributed algor 141
ithms
Multicast communication(cont.)
Ordered multicast
• Total if p: deliver(m)  deliver( m’)
then for all correct processes:
deliver(m’) 
deliver(m) before deliver(m’)

• FIFO-Total = FIFO + Total


• Causal-Total = Causal + Total
• Atomic = reliable + Total

November 2005 Distributed systems: distributed algor 142


ithms
Multicast communication(cont.)
Ordered
T
multicast
1
T2

Notice the consistent


ordering of totally ordered
messages T1 and T2,
F1
the FIFO-related messages
F1 and F2 and the causally F3
F2
related messages C1 and C3
– and the otherwise
arbitrary delivery ordering of Time
messages. C1

C2
C3

P1 P2 P3

November 2005 Distributed systems: distributed algor 143


ithms
Multicast communication(cont.)
FIFO multicast
• Alg. 1: R-multicast using IP-multicast
– Correct?
• Sender assigns Sgp
• Receivers deliver in this order

• Alg. 2 on top of any B-multicast


November 2005 Distributed systems: distributed algor 144
ithms
Multicast communication(cont.)
FIFO multicast
Algorithm 2 on top of any B-multicast
Data structures at process p:
Sgp : sequence number

Rgq : sequence number of the latest message it has delivered from q


On initialization:

Sgp = 0; Rgq = -1
For process p to FO-multicast message m to group g
B-multicast ( g, <m, Sgp >)

Sgp ++
On B-deliver (<m, S >) at q from p
November 2005 Distributed systems: distributed algor 145
ithms
Multicast communication(cont.)
FIFO multicast
Algorithm 2 on top of any B-multicast
On B-deliver (<m, S >) at q from p
if S = Rgp + 1
then FO-deliver (m)
Rgp ++
check hold-back queue
else if S > Rgp + 1
then store m in hold-back queue endif
endif

November 2005 Distributed systems: distributed algor 146


ithms
Multicast communication(cont.)
TOTAL multicast
• Basic approach:
– Sender: assign totally ordered identifiers iso process
ids
– Receiver: deliver as for FIFO ordering

• Alg. 1: use a (single) sequencer process

• Alg. 2: participants collectively agree on the


assignment of sequence numbers

November 2005 Distributed systems: distributed algor 147


ithms
Multicast communication(cont.)
TOTAL multicast: sequencer process

November 2005 Distributed systems: distributed algor 148


ithms
Multicast communication(cont.)
TOTAL multicast: sequencer process
• Correct?

• Problems?
– A single sequencer process
• bottleneck
• single point of failure

November 2005 Distributed systems: distributed algor 149


ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
• Approach:
– Sender:
• B-multicasts message
– Receivers:
• Propose sequence numbers to sender
– Sender:
• uses returned sequence numbers to
generate agreed sequence number
November 2005 Distributed systems: distributed algor 150
ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
P2
1 Message
3
22 ed Seq P4
s
2P ropo
1

3 Agreed Seq
1
2 P1
3

P3

November 2005 Distributed systems: distributed algor 151


ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
Data structures at process p:
Agp : largest agreed sequence number

Pgp : largest proposed sequence number by P


On initialization:
Pgp = 0
For process p to TO-multicast message m to group g
B-multicast ( g, <m, i >) /i = unique id for m
On B-deliver (<m, i> ) at q from p
Pgq = max (Agq, Pgq) + 1

send (p, < i, Pgq >)

store <m, i, Pgq > in hold-back queue


November 2005 Distributed systems: distributed algor 152
ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
On receive( i, P) at p from q
wait for all replies; a is the largest reply
B-multicast(g, < “order”, i, a >)
On B-deliver (<“order”, i, a > ) at q from p
Agq = max (Agq, a)
attach a to message i in hold-back queue
reorder messages in hold-back queue (increasing sequence numbers)
while message m in front of hold-back queue has been assigned
an agreed sequence number
do remove m from hold-back queue
TO-deliver (m)
November 2005 Distributed systems: distributed algor 153
ithms
Multicast communication(cont.)
TOTAL multicast: ISIS algorithm
• Correct?
– Processes will agree on sequence number for
a message
– Sequence numbers are monotonically
increasing
– No process can prematurely deliver a
message
• Performance
– 3 serial messages!
• Total ordering
– <> FIFO
– <> causal
November 2005 Distributed systems: distributed algor 154
ithms
Multicast communication(cont.)
Causal multicast
• Limitations:
– Causal order only by multicast operations
– Non-overlapping, closed groups

• Approach:
– Use vector timestamps
– Timestamp = count number of multicast
messages

November 2005 Distributed systems: distributed algor 155


ithms
Multicast communication(cont.)
Causal multicast: vector timestamps

Meaning?

November 2005 Distributed systems: distributed algor 156


ithms
Multicast communication(cont.)
Causal multicast: vector timestamps
• Correct?
– Message timestamp
m V
m’ V’
– Given
multicast(g,m)  multicast(g,m’)
proof
V < V’

November 2005 Distributed systems: distributed algor 157


ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems

November 2005 Distributed systems: distributed algor 158


ithms
Consensus & related problems
• System model
– N processes pi
– Communication is reliable
– Processes mail fail
• Crash
• Byzantine
– No message signing
• Message signing limits the harm a faulty process can do
• Problems
– Consensus
– Byzantine generals
– Interactive consistency
November 2005 Distributed systems: distributed algor 159
ithms
Consensus
• Problem statement
 pi: undecided state & pi proposes vi
– Message exchanges
– Finally: each process pi sets decision variable di, enters
decided state and may not change di
• Requirements:
– Termination: eventually each correct process pi sets di
– Agreement: pi and pj correct & in decided state  di = dj
– Integrity: correct processes all propose same value d
 any process in decided state has
chosen d
November 2005 Distributed systems: distributed algor 160
ithms
Consensus
• Simple algorithm (no process failures)
– Collect processes in group g
– For each process pi:
• R-multicast(g, vi)
• Collect N values
• d = majority (v1,v2,...,vN)
• Problems with failures:
– Crash: detected? not in asynchronous systems
– Byzantine? Faulty process can send around
different values
November 2005 Distributed systems: distributed algor 161
ithms
Byzantine generals
• Problem statement
– Informal:
• agree to attack or to retreat
• commander issues the order
• lieutenants are to decide to attack or retreat
• all can be ‘treacherous’
– Formal
• One process proposes value
• Others are to agree
• Requirements:
– Termination: each process eventually decides
– Agreement: all correct processes select the same value
– Integrity: commander correct  other correct processes
select value of commander
November 2005 Distributed systems: distributed algor 162
ithms
Interactive consistency
• Problem statement
– Correct processes agree upon a vector of values (one
value for each process)
• Requirements:
– Termination: each process eventually decides
– Agreement: all correct processes select the same value
– Integrity: if pi correct
then all correct processes decide on v i as
the i-the component of their vector

November 2005 Distributed systems: distributed algor 163


ithms
Related problems & solutions
• Basic solutions:
– Consensus:
• Ci(v1, v2, ..., vN) = decision value of pi
– Byzantine generals
• j is commander, proposing v
• BGi (j, v) = decision value of pi
– Interactive consensus
• ICi(v1, v2, ..., vN)[j] = jth value in the decision vector of pi
with v1, v2, ..., vN values proposed by processes

November 2005 Distributed systems: distributed algor 164


ithms
Related problems & solutions
• Derived solutions:
– IC from BG:
• Run BG N times once with each process as
commander
• ICi(v1, v2, ..., vN)[j] = BG(j, vj)
– C from IC:
• Run IC and produce vector of values
• Derive single value with appropriate function
– BG from C:
November 2005 Distributed systems: distributed algor 165
ithms
Related problems & solutions
• Derived solutions:
– IC from BG:
– C from IC:
– BG from C:
• Commander pj sends value v to itself & other
processes
• Each process runs C with the values v1, v2, ...vN
they received
• BGi(j,v) = Ci(v1, v2, ...vN)
November 2005 Distributed systems: distributed algor 166
ithms
Consensus in a synchronous system
• Assumptions
– Use B-multicast
– f of N processes may fail
• Approach
– Proceed in f + 1 rounds
– In each round correct processes B-multicast values
• Variables
– Valuesir = set of proposed values known to process pi
at the beginning of round r
November 2005 Distributed systems: distributed algor 167
ithms
Consensus in a synchronous system

November 2005 Distributed systems: distributed algor 168


ithms
Consensus in a synchronous system
• Termination?
– Synchronous system
• Correct?
– Each process arrives at the same set of values at the end of the
final round
– Proof?
• Assume sets are different ...
• Pi has value v & Pj doesn’t have value v
  Pk : managed to send v to Pi and not to Pj

• Agreement & Integrity?


– Processes apply the same function

November 2005 Distributed systems: distributed algor 169


ithms
Byzantine generals
in a synchronous system
• Assumptions
– Arbitrary failures
– f of N processes may be faulty
– Channels are reliable: no message injections
– Unsigned messages

November 2005 Distributed systems: distributed algor 170


ithms
Byzantine generals
in a synchronous system
• Impossibility with 3 processes
p1 (Commander) p1 (Commander)

1:v 1:v 1:w 1:x

2:1:v 2:1:w
p2 p3 p2 p3
3:1:u 3:1:x

Faulty processes are shown shaded

• Impossibility with N <= 3f processes


November 2005 Distributed systems: distributed algor 171
ithms
Byzantine generals
in a synchronous system
• Solution with one faulty process
– N = 4, f = 1
– 2 rounds of messages:
• Commander sends its value to each of the lieutenants
• Each of the lieutenants sends the value it received to its peers
– Lieutenant receives
• Value of commander
• N-2 values of peers
Use majority function

November 2005 Distributed systems: distributed algor 172


ithms
Byzantine generals
in a synchronous system
• Solution with one faulty process
p1 (Commander) p1 (Commander)

1:v 1:v 1:u 1:w


1:v 1:v
2:1:v 2:1:u
p2 3:1:u p3 p2 3:1:w p3
4:1:v 4:1:v 4:1:v 4:1:v
2:1:v 3:1:w 2:1:u 3:1:w

p4 p4
Faulty processes are shown shaded

November 2005 Distributed systems: distributed algor 173


ithms
Consensus & related problems
• Impossibility in asynchronous systems
  proof: No algorithm exists to reach consensus
– No guaranteed solution to
• Byzantine generals problem
• Interactive consistency
• Reliable & totally ordered multicast
• Approaches for work around:
– Masking faults: restart crashed process and use persistent storage
– Use failure detectors: make failure fail-silent by discarding
messages

November 2005 Distributed systems: distributed algor 174


ithms
This chapter: overview
• Introduction
• Logical clocks
• Global states
• Failure detectors
• Mutual exclusion
• Elections
• Multicast communication
• Consensus and related problems

November 2005 Distributed systems: distributed algor 175


ithms
Distributed Systems:
Distributed algorithms

November 2005 Distributed systems: distributed algor 176


ithms

You might also like