Challenges in Distributed System
1. Heterogeneity : Heterogeneity is applied to the network, computer hardware, operating system and
implementation of different developers. A key component of the heterogeneous distributed system
client-server environment is middleware.
2. Openness: The openness of the distributed system is determined primarily by the degree to which new
resource-sharing services can be made available to the users.
3. Scalability: Scalability of the system should remain efficient even with a significant increase in the
number of users and resources connected. It shouldn’t matter if a programme has 10 or 100 nodes;
performance shouldn’t vary.
4. Security : Security of information system has three components Confidentially, integrity and availability.
Encryption protects shared resources, keeps sensitive information secrets when transmitted.
5. Failure Handling: When some faults occur in hardware and the software program, it may produce
incorrect results or they may stop before they have completed the intended computation so corrective
measures should to implemented to handle this case.
6. Concurrency: There is a possibility that several clients will attempt to access a shared resource at the
same time. Multiple users make requests on the same resources, i.e read, write, and update.
7. Transparency : Transparency ensures that the distributes system should be perceived as a single entity
by the users or the application programmers rather than the collection of autonomous systems, which is
cooperating.
Examples Of Distributed Systems
Networks- The earliest example of a distributed system happened in the 1970s when ethernet
was invented and LAN (local area networks) were created. For the first time computers would be
able to send messages to other systems with a local IP address.
Telecommunication networks - Telephone and cellular networks are also examples of distributed
networks. Telephone networks have been around for over a century and it started as an early
example of a peer to peer network.
Distributed Real-time Systems - Many industries use real-time systems that are distributed locally and
globally.
Distributed artificial intelligence - Distributed Artificial Intelligence is a way to use large scale computing
power and parallel processing to learn and process very large data sets using multi-agents.
System Models
Architectural Models: Architectural model defines the way in which the components of the system are
placed and how they interact with one another and the way in which they are mapped onto the underlying
network of computers.
● Concerned with placement of its parts and relationship among them.
● Example: client-server model, peer-to-peer model
● Abstracts the functions of the individual components.
● Defines patterns for distribution of data and workload.
● Defines patterns of communication among the components.
Fundamental Models:
Interaction model deals with communication details among the components and their timing and
performance details.
Failure model gives specification of faults and defines reliable communication and correct processes.
Security model specifies possible threats and defines the concept of secure channels.
Limitation of Distributed System
1. Absence of a Global Clock - Time is important because it is used for → Ordering the events. →
Scheduling the processes.
● No concept of global time
● It’s difficult to reason about the temporal ordering of events
● It’s difficult to design and debug algorithms in a distributed system
- Mutual exclusion
- Synchronization
- Deadlock
2. Absence of shared memory - All distributed system have their own specific physical memory.
Impact of Absense of Shared Memory - (i) Debugging (ii) Fault Tolerance (iii) Recovery from failure.
● “State” is distributed throughout system
● It is very difficult for each process to get a complete and coherent view of the global state
● Example: one person has two bank accounts, and is in process of transferring $50 between the two
accounts
Lamport’s logical clock - It is a procedure to determine the order of events occurring. It provides a basis
for the more advanced Vector Clock Algorithm. Due to the absence of a Global Clock in a Distributed
Operating System Lamport Logical Clock is needed.
Algorithm:
Happened before relation(->): a -> b, means ‘a’ happened before ‘b’.
Logical Clock: The criteria for the logical clocks are:
[C1]: Ci (a) < Ci(b), [ Ci -> Logical Clock, If ‘a’ happened before ‘b’, then time of ‘a’ will be less than ‘b’ in a
particular process. ]
[C2]: Ci(a) < Cj(b), [ Clock value of Ci(a) is less than Cj(b) ]
Implementation Rules[IR]:
[IR1]: If a -> b [‘a’ happened before ‘b’ within the same process] then, Ci(b) =Ci(a) + d
[IR2]: Cj = max(Cj, tm + d) [If there’s more number of processes, then tm = value of Ci(a), Cj = max value
between Cj and tm + d]
Vector Clock: Vector Clock is used to overcome the limitation of logical clock. Like Lamport’s Clock, Vector
Clock is also a logical clock, which is used to assign timestamps for events in a distributed system. Vector
clock also gives a partial ordering of the events. Here, instead of using integer values for the timestamp, we
use a vector of integer values to represent the timestamp. If we have N processes in the group, then each
process will have a vector with N elements.
Causal Ordering Of Messages: It deals with the notion of maintaining the casual relationship that holds
among “message send” events with the corresponding “message receive” events. (diagram in copy)
Causal Ordering Of Messages Protocol
Birman-Schiper-Stephenson Protocol(BSS) - This protocol tells how the causal ordering of messages
has been done. → Broadcast Based → More Messages → Smaller size for the messages → Limited state
Information.
● Each process increment its vector clock (by 1)
● Delivery of same message by Pj until →Pj received all message from Pi that proceed m. → Pj
received all messages by Pi before sending m.
● Pi updates vector clock.
Schiper-Eggli-Sandoz Protocol - Instead of maintaining a vector clock based on the number of messages
sent to each process, the vector clock for this protocol can increment at any rate it would like to and has no
additional meaning related to the number of messages currently outstanding.
Global State - The global state of a distributed system is the set of local states of each individual
processes involved in the system plus the state of the communication channels.
Need For Global State
(i) Distributed Deadlock Detection
(ii) Termination Detection
(iii) Check Point
(diagram copy)
Chandy Lamports Global State Recording Algorithm.
1. Marker Sending Rule for a Process P
→ P records its state
→ P sends a marker along c before P sends further message along c.
2. Marker Receiving Rule for a process Q.
→ If Q not recorded its state
→ Records empty channel
→ Record the state of c as a sequence of message.
Termination Detection - It is an example of the usage of the coherent view of a distributed system.
Basic Idea: There are two types of process active or idle. One of the cooperating process monitors the
computation is called controlling agent.
(i) B(Wi) → Computation message along with weight.
(ii) C(Wi) → Control message with weight.
(diagram in copy)
Huang’s Termination Detection Algorithm
Rule1: The controlling agent having weight W may send a computation msg to a process P
W=W1 + W2, W1> 0, W2 > 0,
W=W1
Send B(W2) to P
Rule2: On receipt of B(W2) a process P having weight W=W+W2, If Pidle, P become active.
Rule3: An active process having weight W ma become idle send (W2) to the controlling agent, W=0
Rule4: On receiving C(W2) the Controlling Agnet have weight W=W1+W2=1 (Computation Terminated)
Distributed Mutual Exclusion: A mutual exclusion is a program object that prevents simultaneous access
to a shared resource. This concept is used in concurrent programming with a Critical section, a piece of
code in which process accessed shared resource.
Requirements of Mutual exclusion Algorithm:
● No Deadlock: Two or more site should not endlessly wait for any message that will never arrive.
● No Starvation: Every site who wants to execute critical section should get an opportunity to execute
it in finite time. Any site should not wait indefinitely to execute critical section while other site are
repeatedly executing critical section
● Fairness: Each site should get a fair chance to execute critical section. Any request to execute
critical section must be executed in the order they are made i.e Critical section execution requests
should be executed in the order of their arrival in the system.
● Fault Tolerance: In case of failure, it should be able to recognize it by itself in order to continue
functioning without any disruption.
Mutual Exclusion Algorithm: In this algorithm a site communicates with a set of other sites to arbitrate
who should execute the CS next.
Mutual Exclusion Algorithms can be broadly classified into Token and Non-token based algorithm
1. Token Based Algorithm:
● A unique token is shared among all the sites.
● If a site possesses the unique token, it is allowed to enter its critical section
● This approach uses sequence number to order requests for the critical section.
● Each requests for critical section contains a sequence number. This sequence number is used to
distinguish old and current requests.
● This approach insures Mutual exclusion as the token is unique
● Example: Suzuki-Kasami’s Broadcast Algorithm, Singhal’s Heuristic Algorithm, Raymond’s
Tree-Based Algorithm
2. Non-token based approach:
● A site communicates with other sites in order to determine which sites should execute critical
section next. This requires exchange of two or more successive round of messages among sites.
● This approach use timestamps instead of sequence number to order requests for the critical section.
● When ever a site make request for critical section, it gets a timestamp. Timestamp is also used to
resolve any conflict between critical section requests.
● All algorithm which follows non-token based approach maintains a logical clock. Logical clocks get
updated according to Lamport’s scheme
● Example: Lamport's algorithm, Ricart–Agrawala algorithm, Maekawa’s Algorithm
Basic Idea: In this algorithm the sites having 3 states (i) Requesting CS (ii) Executing CS (iii) Releasing CS
Lamport’s Non-Token Algorithm:
(i) Requesting the CS
● When a site Si wants to enter the critical section, it sends a request message Request(tsi, i) to all
other sites and places the request on request_queuei. Here, Tsi denotes the timestamp of Site Si
● When a site Sj receives the request message REQUEST(tsi, i) from site Si, it returns a
timestamped REPLY message to site Si and places the request of site Si on request_queuej
(ii) Executing the CS
● A site Si can enter the critical section if it has received the message with timestamp larger than (tsi,
i) from all other sites and its own request is at the top of request_queuei
(iii) Releasing the CS
● When a site Si exits the critical section, it removes its own request from the top of its request queue
and sends a timestamped RELEASE message to all other sites
● When a site Sj receives the timestamped RELEASE message from site Si, it removes the request of
Si from its request queue
Ricart–Agrawala Algorithm:
(i)Requesting the CS:
● When a site Si wants to enter the critical section, it send a timestamped REQUEST message to all
other sites.
● When a site Sj receives a REQUEST message from site Si, It sends a REPLY message to site Si if
and only if
→Site Sj is neither requesting nor currently executing the critical section.
→In case Site Sj is requesting, the timestamp of Site Si‘s request is smaller than its own request.
Otherwise the request is deferred by site Sj.
(ii) Executing CS:
● Site Si enters the critical section if it has received the REPLY message from all other sites.
(iii) Releasing the CS:
● Upon exiting site Si sends REPLY message to all the deferred requests.
Maekawa’s Algorithm:
(i) Requesting the CS:
● When a site Si wants to enter the critical section, it sends a request message REQUEST(i) to all
other sites in the request set Ri.
● When a site Sj receives the request message REQUEST(i) from site Si, it returns a REPLY
message to site Si if it has not sent a REPLY message to the site from the time it received the last
RELEASE message. Otherwise, it queues up the request.
(ii) Executing the CS:
● A site Si can enter the critical section if it has received the REPLY message from all the site in
request set Ri
(iii) Releasing the CS:
● When a site Si exits the critical section, it sends RELEASE(i) message to all other sites in request
set Ri
● When a site Sj receives the RELEASE(i) message from site Si, it send REPLY message to the next
site waiting in the queue and deletes that entry from the queue
● In case queue is empty, site Sj update its status to show that it has not sent any REPLY message
since the receipt of the last RELEASE message
Suzuki-Kasami’s Broadcast Algorithm: It is a token algo, a site is allowed to enter its CS if it possesses
token.
(i) Requesting the CS:
● When a site Si wants to enter the critical section and it does not have the token then it increments
its sequence number RNi[i] and sends a request message REQUEST(i, sn) to all other sites in
order to request the token.
● Here sn is update value of RNi[i]
● When a site Sj receives the request message REQUEST(i, sn) from site Si, it sets RNj[i] to
maximum of RNj[i] and sn i.e RNj[i] = max(RNj[i], sn).
● After updating RNj[i], Site Sj sends the token to site Si if it has token and RNj[i] = LN[i] + 1
(ii) Executing the CS:
● Site Si executes the critical section if it has acquired the token.
(iii) Releasing the CS:
After finishing the execution Site Si exits the critical section and does following:
● sets LN[i] = RNi[i] to indicate that its critical section request RNi[i] has been executed
● For every site Sj, whose ID is not present in the token queue Q, it appends its ID to Q if RNi[j] =
LN[j] + 1 to indicate that site Sj has an outstanding request.
● After above updation, if the Queue Q is non-empty, it pops a site ID from the Q and sends the token
to site indicated by popped ID.
● If the queue Q is empty, it keeps the token
Performance Metrics for Mutual Exclusion
1. Response time: The interval of time when a request waits for the end of its critical section execution
after its solicitation messages have been conveyed.
2. Synchronization Delay: The time required for the next process to enter the critical section after a
process leaves the critical section is known as Synchronization delay.
3. Message complexity: The number of messages needed to execute each critical section by the
process.
4. Throughput: Throughput is the amount at which the system executes requests for the critical
section.
Throughput = 1/(Synchronization delay + Avg Critical Section execution t