Synchronization in Distributed Systems
Synchronization in Distributed Systems
Synchronization in Distributed Systems
Chapter 6
Background
Synchronization: coordination of actions between processes. Processes are usually asynchronous, (operate independent of events in other processes) Sometimes need to cooperate/synchronize
For mutual exclusion For event ordering (was message x from process P sent before or after message y from process Q?)
Introduction
Synchronization in centralized systems is primarily accomplished through shared memory
Event ordering is clear because all events are timed by the same clock
Clock Synchronization
Some applications rely on event ordering to be successful
See page 232 for some examples Event ordering is easier if you can accurately time-stamp events, but in a distributed system the clocks may not always be synchronized
Clock Skew
In a distributed system each computer has its own clock Each crystal will oscillate at slightly different rate. Over time, the software clock values on the different computers are no longer the same.
Clock Skew
Clock skew(offset): the difference between the times on two different clocks Clock drift : the difference between a clock and actual time Ordinary quartz clocks drift by ~ 1sec in 11-12 days. (10-6 secs/sec) High precision quartz clocks drift rate is somewhat better
Formalization
The distributed system consists of n processes, p1, p2, pn (e.g, a MPI group) Each pi executes on a separate processor No shared memory Each pi has a state si Process execution: a sequence of events
Changes to the local state Message Send or Receive
Two Versions
Lamports logical clocks: synchronizes logical clocks
Can be used to determine an absolute ordering among a set of events although the order doesnt necessarily reflect causal relations between events.
If a b, then a and b are causally related; i.e., event a potentially has a causal effect on event b.
Concurrent Events
Happens-before defines a partial order of events in a distributed system. Some events cant be placed in the order a and b are concurrent (a || b) if !(a b) and !(b a). If a and b arent connected by the happened-before relation, theres no way one could affect the other.
Logical Clocks
Needed: method to assign a timestamp to event a (call it C(a)), even in the absence of a global clock The method must guarantee that the clocks have certain properties, in order to reflect the definition of happens-before. Define a clock (event counter), Ci, at each process (processor) Pi. When an event a occurs, its timestamp ts(a) = C(a), the local clock value at the time the event takes place.
Correctness Conditions
If a and b are in the same process, and a b then C (a) < C (b) If a is the event of sending a message from Pi, and b is the event of receiving the message by Pj, then Ci (a) < Cj (b). The value of C must be increasing (time doesnt go backward).
Corollary: any clock corrections must be made by adding a positive number to a time.
Implementation Rules
Between any two successive events a & b in Pi, increment the local clock (Ci = Ci + 1)
thus Ci(b) = Ci(a) + 1
When a message m is sent from Pi, set its time-stamp tsm to Ci, the time of the send event after following previous step. When the message is received at Pj the local time must be greater than tsm . The rule is (Cj = max{Cj, tsm} + 1).
Figure 6-9. (a) Three processes, each with its own clock. The clocks run at different rates.
Message mi is received
Figure 6-10. The positioning of Lamports logical clocks in distributed systems Handling clock management as a middleware operation
P1
P2
e21
e22
e23
e24
e25
Which events are causally related? Which events are concurrent? eij represents event j on processor i
A total ordering of events can be obtained if we ensure that no two events happen at the same time (have the same timestamp). Why? So all processors can agree on an unambiguous order. How? Attach process number to low-order end of time, separated by decimal point; e.g., event at time 40 at process P1 is 40.1,event at time 40 at process P2 is 40.2
P1
e11
e12
e13
e14
e15
e16
e17
P2
e21
e22
e23
e24
e25
Message 1: add $100.00 Message 2: increment account by 1% The replica that sees the messages in the order m1, m2 will have a final balance of $1111 The replica that sees the messages in the order m2, m1 will have a final balance of $1110
The Problem
Site 1 has final account balance of $1,111 after both transactions complete and Site 2 has final balance of $1,100. Which is right? Either, from the standpoint of consistency. Problem: lack of consistency.
Both values should be the same
Solution: make sure both sites see/process all messages in the same order.
Implementation
When a process receives a message, put it in a local message queue, ordered by timestamp. Multicast an acknowledgement to all sites Each ack has a timestamp larger than the timestamp on the message it acknowledges The message queue at each site will eventually be in the same order
Implementation
Deliver a message to the application only when the following conditions are true:
The message is at the head of the queue The message has been acknowledged by all other receivers. This guarantees that no update messages with earlier timestamps are still in transit.
Acknowledgements are deleted when the message they acknowledge is processed. Since all queues have the same order, all sites process the messages in the same order.
Causality
Causally related events:
Event a may causally affect event b if a b Events a and b are causally related if either a b or b a. If neither of the above relations hold, then there is no causal relation between a & b. We say that a || b (a and b are concurrent)
In other words, you cannot look at the clock values of events on two different processors and decide which one happens before. Lamport clocks do not capture causality
m3
Figure 6-12.
Figure 5.4
Time Space P1 e11 . (1) e21 (1) e31 (1) e12 (2) e22 (3)
P2
P3
e32
(2)
e33 (3)
C(e11) < C(e22) and C(e11) < C(e32) but while e11 e22, we cannot say e11 e32 since there is no causal path connecting them. So, with Lamport clocks we can guarantee that if C(a) < C(b) then b a , but by looking at the clock values alone we cannot say whether or not the events are causally related.
Implementation Rules
IR1: Increment VCi[i] before each new event. IR2: When process i sends a message m it sets ms (vector) timestamp to VCi (after incrementing VCi[i]) IR3: When a process receives a message it does a component-by-component comparison of the message timestamp to its local time and picks the maximum of the two corresponding components. Adjust local components accordingly. Then deliver the message to the application.
Review
Physical clocks: hard to keep synchronized Logical clocks: can provide some notion of relative event occurrence Lamports logical time
happened-before relation defines causal relations logical clocks dont capture causality total ordering relation use in establishing totally ordered multicast
Vector clocks
Unlike Lamport clocks, vector clocks capture causality Have a component for each process in the system
P1
P2
(0, 1, 0) e21
(2, 2, 0) e22
(2, 3, 1) e23
(2,4,2)
(2, 5, 2)
e24
e25
(0, 0, 1) P3 e31
(0, 0, 2) e32
(0, 0, 3) e33
In Figure 5.5, VC(e24) = (2, 4, 2). Two events in P1 and two events in P3 happened before e24.
Even though P1 and P3 may have executed other events, they dont have a causal effect on e24.
Figure 5.4
Time
P2
P3
e32
(0, 0, 2)
e33 (0, 0, 3)
ts(e11) = (1, 0, 0) and ts(e32) = (0, 0, 2), which shows that the two events are concurrent. ts(e11) = (1, 0, 0) and ts(e22) = (2, 2, 0), which shows that e11 e22
In other words, if a message m is received from Pi, you should also have received every message that Pi received before it sent m; e.g.,
if m is sent by P1 and ts(m) is (3, 4, 0) and you are P3, you should already have received exactly 2 messages from P1 and at least 4 from P2 if m is sent by P2 and ts(m) is (4, 5, 1, 3) and if you are P3 and VC3 is (3, 3, 4, 3) then you need to wait for a fourth message from P2 and at least one more message from P1.
Figure 6-13. Enforcing Causal Communication P0 VC0 (1, 0, 0) m (1, 1, 0) VC1 m* P2 (0, 0, 0) VC2 (1, 0, 0) VC2 (1, 1, 0) VC2 VC0 (1, 1, 0)
P1
P1 received message m from P0 before sending message m* to P2; P2 must wait for delivery of m before receiving m* (Increment own clock only on message send) Before sending or receiving any messages, ones own clock is (0, 0, 0)
History
ISIS and Horus were middleware systems that supported the building of distributed environments through virtually synchronous process groups Provided both totally ordered and causally ordered message delivery.
Lightweight Causal and Atomic Group Multicast Birman, K., Schiper, A., Stephenson, P, ACM Transactions on Computer Systems, Vol 9, No. 3, August 1991, pp 272-314.
End-to-end argument: the application is better equipped to know which messages are causally related. But developers are now forced to do more work; re-inventing the wheel.