DOS Lectures

DISTRIBUTED
COMPUTING
INTRODUCTION
• Advancements in microelectronic technology have
resulted in the availability of fast, inexpensive
processors, and advancements in communication
technology have resulted in the availability of cost-
effective and highly efficient computer networks.
• Computer architectures consisting of interconnected,

multiple processors are of two types:
 Tightly coupled systems.
 Loosely coupled systems.
Tightly Coupled Systems
(parallel processing systems )
• There is a single
system wide
primary memory
(address space) that
is shared by all the
processors.
• Any communication
between the
processors usually
takes place through
the shared memory.
Loosely Coupled Systems
(distributed computing systems )
• The processors do
not share memory,
and each processor
has its own local
memory.
• All physical
communication
between the
processors is done
by passing messages
across the network
that interconnects
the processors.
• Processors of loosely coupled systems can be located far
from each other to cover a wider geographical area.
• In tightly coupled systems, the number of processors that

can be usefully deployed is usually small and limited by the
bandwidth of the shared memory.
• This is not the case with distributed computing systems

that are more freely expandable and can have an almost
unlimited number of processors.
• In distributed systems its own resources are local, whereas

the other processors and their resources are remote.
• A processor and its resources are referred to as a

node/site/machine of the distributed computing system.
TIME-SHARING SYSTEMS
• In 1970s that computers started to use the concept of time sharing.
• Parallel advancements in hardware technology allowed reduction in

the size and increase in the processing speed of computers, causing
large-sized computers to be gradually replaced by smaller and
cheaper ones that had more processing capability than their
predecessors. These systems were called, minicomputers.
• The advent of time-sharing systems was the first step toward

distributed computing systems.
• It provided with two important concepts used in distributed

computing systems
 the sharing of computer resources simultaneously by many users,
 accessing of computers from a place different from the main
computer room.
……….contd
Centralized Time-Sharing Systems
• Most of the processing of a user's job could be done at
the user's own computer, allowing the main computer
to be simultaneously shared by a larger number of
users.
• Shared resources such as files, databases, and software

libraries were placed on the main computer.
• Limitation: the terminals could not be placed very far

from the main computer room since ordinary cables
were used to connect the terminals to the main
computer.
Evolution of Distributed Computing Systems
• Advancements in computer networking
technology between 1960s and early 1970s that
emerged as two key networking technologies the
LAN (local area network) and WAN (wide area
network).
• In 1990’s there was another major advancement

in networking technology, the ATM
(Asynchronous Transfer Mode) technology.
• ATM’s can make very high speed networking

possible, providing data transmission rates up to
1.2Gbps in both LAN and WAN environments.
• The availability of such high-bandwidth networks can allow distributed
computing systems to support multimedia applications.
• The merging of computer and networking technologies gave birth to

distributed computing systems in the late 1970s.
• Hardware issues of building such systems were fairly well understood,
• Major stumbling block was the availability of adequate software for making
these systems easy to use and for fully exploiting their power.
MODELS OF DISTRIBUTED COMPUTING SYSTEM
1. Minicomputer
2. Workstation
3. Workstation-server
4. Processor-pool
5. Hybrid
Minicomputer Model
• Is an extension of the centralized time-sharing system, which
consists of a few minicomputers interconnected by a
communication network.
• Each minicomputer has multiple users simultaneously logged on to

it.
• Each user is logged on to one specific minicomputer, with remote

access to other minicomputers.
• The network allows a user to access remote resources that are

available on some machine other than the one on to which the user
is currently logged.
• This model may be used when resource sharing with remote users
is desired.
• The early a ARPANet is an example of a distributed computing

system based on the minicomputer model.
Distributed computing system based
on the minicomputer model
Workstation Model
• It consists of several workstations interconnected by a
communication network. Each workstation is equipped with
its own disk and serve as a single-user computer.
• In such an environment, at any one time a significant

proportion of the workstations are idle resulting in the waste
of CPU time.
• Therefore, the idea of the workstation model is to

interconnect all these workstations by a high-speed LAN
• Advantage - idle workstations may be used to process jobs of

users who are logged onto other workstations and do not
have sufficient processing power at their own workstations to
get their jobs processed efficiently.
• In this model, a user logs onto one of the workstations called "home”
workstation and submits jobs for execution.
• When the system finds that the user's workstation does not have sufficient
processing power for executing the processes of the submitted jobs efficiently,
it transfers one or more of the processes from the user's workstation to some
other workstation that is currently idle and gets the process executed there,
and finally the result of execution is returned to the user's workstation.
• The Sprite system and an experimental system developed at Xerox PARC are
the distributed computing systems based on the workstation model.
Distributed computing system based
on the workstation model
Limitations of Workstation Model
• This model is not so simple to implement due to issues
like: -
1. How does the system find an idle workstation?
2. How is a process transferred from one workstation to

get it executed on another workstation?
3. What happens to a remote process if a user logs onto

a workstation that was idle until now and was being
used to execute a process of another workstation?
Workstation-Server Model
• It is a network of personal workstations, each
with its own disk and a local file system.
• A workstation with its own local disk is called a

diskful workstation and a workstation without a
local disk is called a diskless workstation.
• diskless workstations are more popular in

network environments than diskful workstations,
making the workstation-server model more
popular than the workstation model for building
distributed computing systems.
Distributed computing system based on the workstation
server model
• Distributed computing system based on the workstation server
model consists of a few minicomputers and several workstations
interconnected by a communication network.
• Minicomputers are used for implementing the file system and other
for providing other types of services, such as database service and
print service. Therefore, each minicomputer is used as a server
machine to provide one or more types of services.
• There are specialized machines (or the specialized workstations) for

running server processes for managing and providing access to
shared resources.
• For higher reliability and better scalability, multiple servers are used
for managing the resources of a particular type in a distributed
computing system.
• Example - there may be multiple file servers, each running on a

separate minicomputer and cooperating via the network, for
managing the files of all the users in the system.
ADVANTAGES OF WORKSTATION-SERVER MODEL
• Cheaper to use a few minicomputers equipped with large, fast disks that are
accessed over the network than a large number of diskful workstations,with each
workstation having a small, slow disk.
• Diskless workstations are also preferred to diskful workstations from a system

maintenance point of view. Backup and hardware maintenance are easier to
perform with a few large disks than with many small disks scattered all over a
building or campus.
• All files are managed by the file servers, so users have the flexibility to use any
workstation and access the files in the same manner irrespective of which
workstation the user is currently logged on.
• The request-response protocol is mainly used to access the services of the server
machines.
• A user has guaranteed response time because workstations are not used for
executing remote processes.
• LIMITATION: the model does not utilize the processing capability of idle
workstations.
REQUEST-RESPONSE PROTOCOL
• Also known as the client-server model of communication.
• A client process sends a request to a server process for getting some service
such as reading a block of a file. The server executes the request and sends
back a reply to the client that contains the result of request processing.
• This model provides an effective general-purpose approach to the sharing

of information and resources in distributed computing systems.
• It can also be implemented in a variety of hardware and software

environments.
• It is even possible for both the client and server processes to be run on the
same computer.
• Some processes are both client and server processes. That is, a server
process may use the services of another server, appearing as a client to the
latter.
Processor-Pool Model
• It is based on the observation that most of the time a user does not need
any computing power but once in a while user may need a very large
amount of computing power for a short time (e.g., when recompiling a
program consisting of a large number of files after changing a basic shared
declaration).
• Unlike the workstation-server model in which a processor is allocated to

each user, in the processor pool model the processors are pooled
together to be shared by the users as needed.
• The pool of processors consists of a large number of microcomputers and

minicomputers attached to the network.
• Each processor in the pool has its own memory to load and run a system
program or an application program of the distributed computing system.
• Amoeba, Plan 9 and the Cambridge Distributed Computing System are

the distributed computing systems based on the processor-pool model.
Distributed computing system based on
processor-pool model
• Compared to the workstation-server model, this model allows better
utilization of the available processing power of a distributed computing
system, as entire processing power of the system is available for use by the
currently logged-on users, whereas in the workstation-server model several
workstations may be idle at a particular time but they cannot be used for
processing the jobs of other users.
• Provides greater flexibility than the workstation-server model. The system's

services can be easily expanded without the need to install any more
computers; the processors in the pool can be allocated to act as extra servers
to carry any additional load arising from an increased user population or to
provide new services.
• Limitation: This model is usually not suitable for high performance interactive
applications, especially those using graphics or window systems. This is due to
the slow speed of communication between the computer on which the
application program of a user is being executed and the terminal via which the
user is interacting with the system. The workstation-server model is generally
considered to be more suitable for such applications.
Hybrid Model
• The workstation-server model, is the most widely used model for building distributed
computing systems as a large number of computer users only perform simple
interactive tasks such as editing jobs, sending electronic mails, and executing small
programs.
• In a working environment that has groups of users who often perform jobs needing
massive computation, the processor-pool model is more suitable.

• To combine the advantages of both the workstation-server and processor-pool
models, a hybrid model came into existence.
• The hybrid model is based on the workstation-server model with the addition of a
pool of processors. The processors in the pool can be allocated dynamically for
computations that are too large for workstations or that require several computers
concurrently for efficient execution.
• Advantages – 1. Efficient execution of computation-intensive jobs; 2. Gives

guaranteed response to interactive jobs by allowing them to be processed on local
workstations of the users.
• Limitation: Expensive to implement than the workstation-server model or the

processor-pool model.
Limitations of Distributed Computing Systems
• Distributed computing systems are much more complex and difficult to
build than traditional centralized systems.
• Complexity is mainly due to:

 Using and managing a very large number of distributed resources,
 Handling the communication and security problems
• The performance and reliability of a distributed computing system

depends to a great extent on the performance and reliability of the
underlying communication network.
• Special software is needed to handle loss of messages during transmission

across the network or to prevent overloading of the network, which
degrades the performance and responsiveness to the users.
• Special software security measures are needed to protect the widely

distributed shared resources and services against intentional or accidental
violation of access control and privacy constraints.
Advantages of
Distributed Computing Systems
• Despite the increased complexity and the difficulty of
building distributed computing systems, the
installation and use of distributed computing systems
are rapidly increasing.
• The technical needs, the economic pressures, and the

major advantages that have led to the emergence and
popularity of distributed computing systems
………….contd
Inherently Distributed Applications
• Several applications are inherently distributed in nature
and require a distributed computing system for their
realization.
• Inherently distributed applications includes

computerized worldwide airline reservation system, a
computerized banking system and a factory
automation system controlling robots and machines all
along an assembly line.
• These applications require that some processing power

be available at many distributed locations for
collecting, preprocessing and accessing data, resulting
in the need for distributed computing systems.
INFORMATION SHARING AMONG
DISTRIBUTED USERS
• The use of distributed computing systems by a

group of users to work cooperatively is known as
Computer-Supported Cooperative Working
(CSCW), or Groupware.
• Groupware applications depend heavily on the

sharing of data objects between programs running
on different nodes of a distributed computing
system.
RESOURCE SHARING
• Sharing of software resources such as software

libraries and databases as well as hardware
resources such as printers, hard disks, etc. can also
be done in a very effective way among all the
computers and the users of a single distributed
computing system.
BETTER PRICE-PERFORMANCE RATIO
• Small number of CPUs in a distributed computing
system based on the processor-pool model can be
effectively used by a large number of users from
inexpensive terminals, giving a fairly high price-
performance ratio as compared to either a
centralized time-sharing system or a personal
computer.
• They also facilitate resource sharing among

multiple computers.
Shorter Response Times and Higher Throughput
• Multiple processors of a distributed computing system can be utilized
properly for providing shorter response times and higher throughput than
a single-processor centralized system.
• If a particular computation can be partitioned into a number of

subcomputations that can run concurrently, in a distributed computing
system all the subcomputations can be simultaneously run with each one
on a different processor.
• For achieving better overall performance the DC distribute the load more
evenly among the multiple processors by moving jobs from currently
overloaded processors to lightly loaded ones.
• Example - in a DC system based on the workstation model, if a user

currently has two processes to run, out of which one is an interactive
process and the other is a process that can be run in the background, it
may be advantageous to run the interactive process on the home node of
the user and the other one on a remote idle node (if any node is idle).
Higher Reliability
• Reliability refers to the degree of tolerance against errors and component
failures in a system.
• A reliable system prevents loss of information even in the event of

component failures.
• The multiplicity of storage devices and processors in a distributed

computing system allows the maintenance of multiple copies of critical
information within the system and the execution of important
computations redundantly to protect them against catastrophic failures.
• If one of the processors fails, the computation can be successfully

completed at the other processor, and if one of the storage devices fails,
the information can still be used from the other storage device.
• The geographical distribution of the processors and other resources in a

distributed computing system limits the scope of failures caused by natural
disasters.
…………contd
• Availability, refers to the fraction of lime for which a system is available
for use. In comparison to a centralized system, a distributed computing
system also enjoys the advantage of increased availability.
• Example - If the processor of a centralized system fails the entire system

breaks down and no useful work can be performed. But in the case of a
DC system, a few parts of the system can be down without interrupting
the jobs of the users who are using the other parts of the system.
• In a workstation of a distributed computing system that is based on the

workstation-server model fails, only the user of that workstation is
affected. Other users of the system are not affected by this failure.
• In a distributed computing system based on the processor pool model, if

some of the processors in the pool are down at any moment, the system
can continue to function normally, simply with some loss in performance
that is proportional to the number of processors that are down.
• In this case, none of the users is affected and the users cannot even know
that some of the processors are down.
Extensibility and Incremental Growth
• Distributed computing systems are capable of incremental growth.
It is possible to gradually extend the power and functionality of a
distributed computing system by simply adding additional
resources (both hardware and software) to the system as and when
the need arises.
• Example - additional processors can be easily added to the system

to handle the increased workload of an organization that might
have resulted from its expansion.
• Extensibility is also easier in a distributed computing system

because addition of new resources to an existing system can be
performed without significant disruption of the normal functioning
of the system.
• Properly designed distributed computing systems that have the

property of extensibility and incremental growth are called open
distributed systems.
Better Flexibility in Meeting Users' Needs
• Different types of computers are suitable for performing different types of
computations.
• Example - computers with ordinary power are suitable for ordinary data
processing jobs, whereas high-performance computers are more suitable
for complex mathematical computations.
• In a centralized system, the users have to perform all types of

computations on the only available computer.
• A distributed computing system may have a pool of different types of

computers, in which case the most appropriate one can be selected for
processing a user's job depending on the nature of the job.
• In a distributed computing system that is based on the hybrid model,

interactive jobs can be processed at a user's own workstation and the
processors in the pool may be used to process non-interactive,
computation-intensive jobs.
DISTRIBUTED
OPERATING
SYSTEM
INTRODUCTION
• The operating systems used for Distributed Computing Systems (DCS) is
classified into two Network Operating Systems (NOS) and Distributed
Operating Systems (DOS).
• Difference between NOS and DOS:

 In a NOS the users view the DCS as a collection of distinct machines
connected by a communication subsystem. Users are aware of the fact
that multiple computers are being used. But a DOS hides the existence of
multiple computers and provides a single-system image to its users. It
makes a collection of networked machines act as a virtual uniprocessor.
 In NOS each computer of the DCS has its own local operating system and
there is essentially no coordination at all among the computers except for
the rule that when two processes of different computers communicate
with each other they must use a mutually agreed on communication
protocol. While with a DOS there is a single system wide operating system
and each computer of the DCS runs a part of this global operating system.
 Fault tolerance capability of a DOS is usually very high as compared to

that of a NOS.
What is Distributed Operating
Systems?
• A distributed operating system is one that looks to its
users like an ordinary centralized operating system but
runs on multiple, independent CPUs.
• The key concept here is transparency i.e., the use of

multiple processors should be invisible (transparent) to
the user.
• A distributed computing system that uses a network

operating system is referred as a NETWORK SYSTEM,
whereas one that uses a distributed operating system
is referred as a TRUE DISTRIBUTED SYSTEM.
MESSAGE PASSING
INTRODUCTION
• In a distributed system, processes executing on different
computers need to communicate with each other .
• Each computer of a distributed system may have a resource

manager process to monitor the current status of usage of
its local resources, and the resource managers of all the
computers might communicate with each other from time
to time to dynamically balance the system load among all
the computers.
• Therefore, a distributed operating system needs to provide

Inter-Process Communication (lPC) mechanisms to facilitate
such communication activities.
Inter-Process Communication (IPC) methods
Shared-data approach Message-passing approach.

• Message-passing provides a set of message-based IPC
protocols and does so by shielding the details of
complex network protocols and multiple
heterogeneous platforms from programmers.
• It enables processes to communicate by exchanging

messages.
• It allows programs to be written by using simple

communication primitives, such as send and receive.
• It serves as a suitable infrastructure for building other

higher level IPC systems, such as Remote Procedure
Call (RPC) and Distributed Shared Memory (DSM)
Desirable Features of a Good Message Passing System
• Simplicity (Clean and simple semantics of the IPC
protocols.)
• Uniform Semantics (semantics of remote communications

should be as close as possible to those of local
communications.)
• Efficiency (reducing the number of message exchanges

during the communication process. Optimizations normally
adopted for efficiency.)
• Reliability (Cope with failure problems and guarantees the

delivery of a message. Acknowledgments and
retransmissions on the basis of timeouts. Duplicate
messages may be sent in the event of failures or because of
timeouts.)
• Correctness (Atomicity ensures that every message sent to a group
of receivers will be delivered to either all of them or none of them.
Ordered delivery ensures that messages arrive at all receivers in an
order acceptable to the application. Survivability guarantees that
messages will be delivered correctly despite partial failures of
processes, machines, or communication links.)
• Flexibility (IPC primitives must have the flexibility to permit any

kind of control flow between the cooperating processes.)
• Security (Authentication of the Receiver/Sender of a message by

the Sender/Receiver; Encryption of a message before sending it
over the network).
• Portability (Easily construct a new IPC facility on another system by

reusing the basic design of the existing message-passing system.
The applications written by using the primitives of the IPC protocols
of the message-passing system should be portable.)
Message is a block of information.
Important issues need to be considered in the design of IPC protocol
• Who is the sender?
• Who is the receiver?
• Is there one receiver or many receivers?
• Is the message guaranteed to have been accepted by its receiver(s)?
• Does the sender need to wait for a reply?
• What should be done if a catastrophic event such as a node crash or a

communication link failure occurs during the course of communication?
• What should be done if the receiver is not ready to accept the message:
Will the message be discarded or stored in a buffer? In the case of
buffering, what should be done if the buffer is full?
• If there are several outstanding messages for a receiver, can it choose the
order in which to service the outstanding messages?
SYNCHRONIZATION
• The semantics used are blocking and nonblocking types.
• Depends on one of the two types of semantics used for the send and receive
primitives.
• In blocking send primitive, after execution of the send statement, the sending
process is blocked until it receives an acknowledgment from the receiver that the
message has been received.
• For nonblocking send primitive, after execution of the send statement, the
sending process is allowed to proceed with its execution as soon as the message
has been copied to a buffer.
• In blocking receive primitive, after execution of the receive statement, the

receiving process is blocked until it receives a message.
• For a nonblocking receive primitive, the receiving process proceeds with its
execution after execution of the receive statement, which returns control almost
immediately just after telling the kernel where the message buffer is.
How receiving process knows that message has
arrived in the message buffer in a
nonblocking receive primitive?
• Polling. A test primitive is provided to allow the

receiver to check the buffer status. The receiver uses
this primitive to periodically poll the kernel to check if
the message is already available in the buffer.
• Interrupt. When the message has been filled in the

buffer and is ready for use by the receiver, a software
interrupt is used to notify the receiving process. This
method is highly efficient and allows maximum
parallelism, but its main drawback is that user-level
interrupts make programming difficult
Synchronous Communication
• When both the send and receive primitives of a communication
between two processes use blocking semantics, the communication
is said to be synchronous; otherwise it is asynchronous.
• The sending process sends a message to the receiving process, then

waits for an acknowledgment.
• After executing the receive statement, the receiver remains blocked

until it receives the message sent by the sender.
• On receiving the message, the receiver sends an acknowledgment

message to the sender.
• The sender resumes execution only after receiving this

acknowledgment message.
ADVANTAGES
• As compared to asynchronous communication, synchronous
communication is simple and easy to implement.
• It contributes to reliability as it assures the sending process that its
message has been accepted before the sending process resumes
execution.
• If the message gets lost or is undelivered, no backward error
recovery is necessary for the sending process to establish a
consistent state and resume execution.
LIMITATIONS
• Main drawback of synchronous communication is that it limits
concurrency and is subject to communication deadlocks.
• Is less flexible than asynchronous communication because the
sending process always has to wait for an acknowledgment from
the receiving process even when this is not necessary.
FAILURE HANDLING
• Distributed system offer potential for parallelism, but is also prone
to partial failures such as a node crash or a communication link
failure.
• Loss of request message. This may happen either due to the failure
of communication link between the sender and receiver or because
the receiver's node is down at the time the request message
reaches there.
• Loss of response message. This may happen either due to the

failure of communication link between the sender and receiver or
because the sender's node is down at the time the response
message reaches there.
• Unsuccessful execution of the request. This happens due to the

receiver's node crashing while the request is being processed.
Request message is lost
Response message is lost
Receiver's computer crashed
• To cope with these problems, a reliable IPC protocol of a message-
passing system is designed based on the idea of internal
retransmissions of messages after timeouts and the return of an
acknowledgment message to the sending machine's kernel by the
receiving machine's kernel.
• Kernel of the sending machine is responsible for retransmitting the

message after waiting for a timeout period if no acknowledgment is
received from the receiver's machine within this time.
• The kernel of the sending machine frees the sending process only
when the acknowledgment is received.
• The time duration for which the sender waits before retransmitting
the request is slightly more than the approximate round-trip time
between the sender and the receiver nodes plus the average time
required for executing the request.
Four-message IPC protocol
for client-server communication
• The client sends a request message to the server.
• When the request message is received at the server's machine, the

kernel of that machine returns an acknowledgment message to the
kernel of the client machine.
• If the acknowledgment is not received within the timeout period,

the kernel of the client machine retransmits the request message.
• When the server finishes processing the client's request, it returns a

reply message (containing the result of processing) to the client.
• When the reply message is received at the client's machine, the

kernel of the server machine.
• If the acknowledgment message is not received within the timeout

period, the kernel of the server machine retransmits the reply
message.
• Problem occurs if a request processing takes a
long time.
• If the request message is lost, it will be

retransmitted only after the timeout period,
which has been set to a large value to avoid
unnecessary retransmissions of the request
message.
• If the timeout value is not set properly taking into

consideration the long time needed for request
processing, unnecessary retransmissions of the
request message will take place.
SOLUTION
• When the request message is received at the server's machine, the kernel
of that machine starts a timer. If the server finishes processing the client's
request and returns the reply message to the client before the timer
expires, the reply serves as the acknowledgment of the request message.
• Otherwise, a separate acknowledgment is sent by the kernel of the server

machine to acknowledge the request message. If an acknowledgment is
not received within the timeout period, the kernel of the client machine
retransmits the request message.
• When the reply message is received at the client's machine, the kernel of
that machine returns an acknowledgment message to the kernel of the
server machine. If the acknowledgment message is not received within
the timeout period, the kernel of the server machine retransmits the reply
message.
Three-message IPC protocol for
client-server communication
• When the server finishes processing the client's request, it returns a

reply message (containing the result of processing) to the client.
• The client remains blocked until the reply is received. If the reply is
not received within the timeout period, the kernel of the client
machine retransmits the request message.
• When the reply message is received at the client's machine, the

kernel of the server machine.
• If the acknowledgment message is not received within the timeout

period, the kernel of the server machine retransmits the reply
message.
Two-message IPC protocol for
client-server communication
• The client sends a request message to. the server
and remains blocked until a reply is received
from the server.
• When the server finishes processing the client's

request, it returns a reply message (containing
the result of processing) to the client.
• If the reply is not received within the timeout

period, the kernel of the client machine
retransmits the request message.
Fault-tolerant communication between client- server
QUESTIONS
• Why are MAC Protocols needed?
• Write an algo. to detect the loss of a token in

a token ring scheme for MAC?
• Suggest a priority based token ring scheme for

MAC that does not lead to the starvation of a
low priority site when higher priority sites
always have something to transmit.
• Explain how following can be achieved:
1. LAN Emulation over ATM.
2. IP over ATM
• Write the code for implementing a producer-

consumer pair of processes for the two cases:
1. They use single message buffer.
2. They use a buffer that can accommodate up to n
messages.
IDEMPOTENCY & HANDLING DUPLICATE REQUEST
• An idempotent operation produces the same

results without any side effects no matter how
many times it is performed with the same
arguments.
• Operations that do not necessarily produce

the same results when executed repeatedly
with the same arguments are said to be
nonidempotent.
A NONIDEMPOTENT ROUTINE
• To implement exactly-once semantics a unique
identifier is used for every request that the client
makes.
• Before forwarding a request to a server for processing,

the kernel of the server machine checks to see if a
reply already exists in the reply cache for the request.
• If yes, means that this is a duplicate request that has

already been processed.
• Then the previously computed result is extracted from

the reply cache and a new response message is sent to
the client. Otherwise, the request is a new one.
MULTIDATAGRAM MESSAGES
• Networks have an upper bound on the size of
data that can be transmitted at a time. This
size is known as the maximum transfer unit
(MTU) of a network.
• A message whose size is greater than the MTU

has to be fragmented into multiples of the
MTU, and then each fragment has to be sent
separately.
• Each fragment is sent in a packet that has some control
information in addition to the message data. Each
packet is known as a datagram.
• Messages smaller than the MTU of the network can be

sent in a single packet and are known as single-
datagram messages.
• Messages larger than the MTU of the network have to

be fragmented and sent in multiple packets. Such
messages are known as multidatagram messages.
Keeping Track of Lost and Out-of-Sequence Packets in
Multidatagram Messages
• The logical transfer of a message consists of physical transfer of
several packets.
• A message transmission is complete only when all the packets of

the message have been received by the process to which it is sent.
• For successful completion of a multidatagram message transfer,

reliable delivery of every packet is important.
• Stop-and-wait protocol is used for the purpose.
• But a separate acknowledgment packet for each request packet

leads to a communication overhead.
• Better approach is to use a single acknowledgment packet for all

the packets of a multidatagram message (called blast protocol).
• When blast protocol is used, a node crash or a
communication link failure may lead to the
problems like:
1.One or more packets of the multidatagram

message are lost in communication.
2. The packets are received out of sequence by

the receiver.
MECHANISM TO COPE WITH PROBLEMS
• Use a bitmap to identify the packets of a message.
• Header part of each packet consists of two extra fields, one specifies the total
number of packets in the multidatagram message and the other is the bitmap
field that specifies the position of this packet in the complete message.
• All packets have information about the total number of packets in the
message, so even in the case of out-of-sequence receipt of the packets, a
buffer area can be set by the receiver for the entire message and the received
packet can be placed in its proper position inside the buffer area.
• After timeout. if all packets have not yet been received, a bitmap indicating
the unreceived packets is sent to the sender. The sender retransmits only
those packets that have not been received by the receiver. This technique is
called selective repeat.
GROUP COMMUNICATION
• Depending on single or multiple senders and
receivers three types of group communication
are possible:
1. One to many (single sender and multiple

receivers).
2. Many to one (multiple senders and single

receiver).
3. Many to many (multiple senders and multiple

receivers).
One-to-Many Communication
• There are multiple receivers for a message
sent by a single sender.
• Also known as Multicast Communication.
• A special case of multicast communication is

Broadcast Communication, in which message
is sent to all processors connected to a
network.
Group Management
• Receiver processes of a message form a group.
Such groups are of two types-closed and open.
• In a closed group only the members of the group

can send a message to the group.
• An outside process cannot send a message to the

group as a whole, although it may send a
message to an individual member of the group.
• In an open group any process in the system can

send a message to the group as a whole.
• A group of processes working on a common
problem need not communicate with outside
processes and can form a closed group.
• A group of replicated servers meant for

distributed processing of client requests must
form an open group so that client processes can
send their requests to them.
• A flexible message-passing system with group

communication facility should support both types
of groups.
• A message-passing system with group
communication facility provides the flexibility
to create and delete groups dynamically and
to allow a process to join or leave a group at
any time.
• The message-passing system have a

mechanism to manage the groups and their
membership information, called centralized
group server process.
• All requests to create a group, to delete a group, to
add a member to a group, or to remove a member
from. a group are sent to this process.
• It is easy for the group server to maintain up-to-date

information of all existing groups and their exact
membership.
• Limitations
1. Suffers from the problems of poor reliability and poor
scalability common to all centralized techniques.
2. Replication of the group server may be done to solve
these problems but replication leads to the extra
overhead involved in keeping the group information of
all group servers consistent.
Buffered and Unbuffered Multicast
• Multicast send cannot be synchronous due to: -
 It is unrealistic to expect a sending process to

wait until all the receiving processes that belong
to the multicast group are ready to receive the
multicast message.
 The sending process may not be aware of all the

receiving processes that belong to the multicast
group.
• A multicast message treatment on a receiving process
side depends on whether the multicast mechanism is
buffered or unbuffered.
• For an unbuffered multicast the message is not

buffered for the receiving process and is lost if the
receiving process is not in a state ready to receive it.
• Message is received only by those processes of the

multicast group that are ready to receive it.
• For a buffered multicast, message is buffered for the

receiving processes, so each process of the multicast
group will receive the message.
Send-to-All Semantics
• Copy of the message is sent to each process of
the multicast group
• and the message is buffered until it is

accepted by the process.
Bulletin-Board Semantics
• Message to be multicast is addressed to a channel
instead of being sent to every individual process of the
multicast group.
• Channel plays the role of a bulletin board.
• A receiving process copies the message from the
channel instead of removing it when it makes a receive
request on the channel.
• Multicast message remains available to other
processes as if it has been posted on the bulletin
board.
• Bulletin-board semantics is more flexible than
send-to-all semantics, due to: -
 The relevance of a message to a particular

receiver may depend on the receiver's state.
 Messages not accepted within a certain time

after transmission may no longer be useful;
their value may depend on the sender's state.
Flexible reliability in multicast
communication
• 0 -reliable: No response is expected by the sender from any of
the receivers. Useful for applications using asynchronous multicast
in which the sender does not wait for any response after
multicasting the message.
• 1-reliable: Sender expects a response from any of the receivers.
• m-out-of-n reliable: The multicast group consists of m receivers and

the sender expects a response from m (I < m < n) of the n receivers.
• All-reliable: The sender expects a response message from all the

receivers of the multicast group.
Atomic Multicast
• Atomic multicast has an all-or-nothing property.
• When a message is sent to a group by atomic multicast, it is either

received by all the processes that are members of the group or else
it is not received by any of them.
• When a process fails, it is no longer a member of the multicast

group.
• When the process comes up after failure, it must join the group
afresh.
• Applications for which the degree of reliability requirement is O-reliable, I-
reliable, or m-out-of-n-reliable do not need atomic multicast facility.
• Applications for which the degree of reliability requirement is all-reliable need

atomic multicast facility.
• A flexible message-passing system : -

1. should support both atomic and nonatomic multicast facilities and
2. should provide the flexibility to the sender of a multicast message to specify in
the send primitive whether atomicity property is required or not for the
message being multicast.
Method to implement atomic multicast in
all-reliable communication
• Kernel of the sending machine sends the message to all members of the
group and waits for an acknowledgment from each member.
• After a timeout period, the kernel retransmits the message to all those
members from whom an acknowledgment message has not yet been
received.
• The timeout-based retransmission of the message is repeated until an

acknowledgment is received from all members of the group.
• When all acknowledgments have been received, the kernel confirms to

the sender that the atomic multicast process is complete.
• This method works fine only as long as the machines of the sender
process and the receiver processes do not fail during an atomic multicast
operation.
• A fault-tolerant atomic multicast protocol
must ensure that a multicast will be delivered
to all members of the multicast group even in
the event of failure of the sender's machine or
a receiver's machine.
• Each message has a message identifier field to

distinguish it from all other messages and a
field to indicate that it is an atomic multicast
message.
• The sender sends the message to a multicast group.
• The kernel of the sending machine sends the

message to all members of the group and uses
timeout-based retransmissions.
• A process that receives the message checks its

message identifier field to see if it is a new
message. If not, it is simply discarded.
Many-to-One Communication
• Multiple senders send messages to a single receiver.
• The single receiver may be selective or nonselective.
• A selective receiver specifies a unique sender; a

message exchange takes place only if that sender
sends a message.
• A nonselective receiver specifies a set of senders, and

if anyone sender in the set sends a message to this
receiver, a message exchange takes place.
Many-to-Many Communication
• Multiple senders send messages to multiple receivers.
• The one-to-many and many-to-one schemes are

implicit in this scheme.
• The issues related to one-to-many and many-to-one

schemes, also apply to the many-to-many
communication scheme.
• An important issue related to many-to-many

communication scheme is that of ordered message
delivery.
REMOTE
PROCEDURE
CALL
(RPC)
Remote Procedure Call (RPC)
• An independently developed IPC (Inter Process
Communication) protocol is tailored specifically to one
application and does not provide a foundation on
which to build a variety of distributed applications.
• A need was felt for a general IPC protocol that can be

used for designing several distributed applications.
• RPC provide a valuable communication mechanism

that is suitable for building a fairly large number of
distributed applications.
THE RPC MODEL
• For making a procedure call, the caller places arguments to
the procedure in some well-specified location.
• Control is then transferred to the sequence of instructions

that constitutes the body of the procedure.
• The procedure is executed in a newly created execution

environment that includes copies of the arguments given in
the calling instruction.
• After the procedure's execution is over, control returns to

the calling point, returning a result.
Model of RPC
Features of RPC
• Simple call syntax.
• Well-defined interface. This property is used to support compile-

time type checking and automated interface generation.
• The clean and simple semantics of a procedure call.
• Generality - in single-machine computations procedure calls are the

most important mechanism for communication between parts of
the algorithm.
• Efficiency - Procedure calls are simple enough for communication to

be quite rapid.
• It can be used as an IPC mechanism to communicate between

processes on different machines as well as between different
processes on the same machine.
TRANSPARENCY OF RPC
• Syntactic transparency - a remote procedure
call should have exactly the same syntax as a
local procedure call.
• Semantic transparency - the semantics of a

remote procedure call are identical to those of
a local procedure call.
LIMITATIONS
• Achieving exactly the same semantics for RPC as for LPC is
almost impossible, mainly due to : -
• In remote procedure calls, the called procedure is executed

in an address space that is disjoint from the calling program's
address space.
• Due to this reason, the called (remote) procedure cannot

have access to any variables or data values in the calling
program's environment.
• In the absence of shared memory, it is meaningless to pass

addresses in arguments.
• Remote procedure calls are more vulnerable to failure than local
procedure calls, since they involve two different processes and a
network and two different computers.
• Programs that make use of remote procedure calls must have the
capability of handling even those errors that cannot occur in local
procedure calls.
• The need for the ability to take care of the possibility of processor
crashes and communication problems of a network makes it more
difficult to obtain the same semantics for remote procedure calls as
for local procedure calls.
• Remote procedure calls consume much more time (100-1000 times

more) than local procedure calls. This is mainly due to the
involvement of a communication network in RPCs.
• Applications using RPCs must also have the capability to handle the
long delays that may possibly occur due to network congestion.
IMPLEMENTING RPC MECHANISM
• RPC involves a client process and a server process. Therefore, to
conceal the interface of the RPC system from both the client and
server processes, a separate stub procedure is associated with each
of the two processes.
• To hide the existence and functional details of the underlying

network, an RPC communication package (known as RPCRuntime) is
used on both the client and server sides.
• Implementation of an RPC mechanism involves the five elements

of:
• The client
• The client stub
• The RPCRuntime
• The server stub
• The server
• Client
 A user process that initiates a remote procedure call.
To make a remote procedure call the client makes a
perfectly normal local call that invokes a corresponding
procedure in the client stub.
• Client Stub
 The client stub is responsible for carrying out the
following two tasks:
1. On receipt of a call request from the client, it packs a
specification of the target procedure and the
arguments into a message and then asks the local
RPCRuntime to send it to the server stub.
2. On receipt of the result of procedure execution, it
unpacks the result and passes it to the client.
• RPCRuntime
 The RPCRuntime handles transmission of messages across the network
between client and server machines.
 It is responsible for retransmissions, acknowledgments, packet routing,

and encryption.
 The RPCRuntime on the client machine receives the call request message
from the client stub and sends it to the server machine.
 It also receives the message containing the result of procedure execution

from the server machine and passes it to the client stub.
 RPCRuntime on the server machine receives the message containing the

result of procedure execution from the server stub and sends it to the
client machine.
 It also receives the call request message from the client machine and
passes it to the server stub.
• Server
• On receiving a call request from the server

stub, the server executes the appropriate
procedure and returns the result of procedure
execution to the server stub.
• The beauty of the whole scheme is the total

ignorance on the part of the client that the
work was done remotely instead of by the
local kernel.
Implementation of RPC mechanism.
STUB GENERATION
• Stubs can be generated in two ways: -
1. Manually
RPC implementer provides a set of translation functions
from which a user can construct his or her own stubs. This
method is simple to implement and can handle very
complex parameter types.
2. Automatically
This method uses Interface Definition Language (IDL) that
is used to define the interface between a client and a
server. An interface definition is a list of procedure names
supported by the interface, together with the types of their
arguments and results. This is sufficient information for the
client and server to independently perform compile-time
type checking and to generate appropriate calling
sequences.
RPC MESSAGES
• Call messages that are sent by the client to
the server for requesting execution of a
particular remote procedure.
• Reply messages that are sent by the server to

the client for returning the result of remote
procedure execution.
COMPONENTS NECESSARY IN A CALL MESSAGE
• The identification information of the remote procedure to be executed.
• The arguments necessary for the execution of the procedure.
• A message identification field that consists of a sequence number (for

identifying lost messages and duplicate messages in case of system
failures and for properly matching reply messages to outstanding call
messages, in cases where the replies of several outstanding call messages
arrive out of order).
• A message type field that is used to distinguish call messages from reply
messages. For example, in an RPC system, this field may be set to 0 for all
call messages and set to 1 for all reply messages.
• A client identification field that may be used for two purposes-to allow
the server of the RPC to identify the client to whom the reply message has
to be returned and to allow the server to check the authentication of the
client process for executing the concerned procedure
RPC call message format
Server receives a call message it could face conditions like
• Call message is not intelligible to server. This may happen when a call
message violates the RPC protocol. The server will reject such calls.
• The server detects by scanning the client's identifier field that the client is
not authorized to use the service. The server will return an unsuccessful
reply without bothering to make an attempt to execute the procedure.
• The server finds that the remote program, version. or procedure number
specified in the remote procedure identifier field of the call message is
not available with it. Server will return an unsuccessful reply without
bothering to make an attempt to execute the procedure.
• Incompatible RPC interface being used by the client and server.
• An exception condition (such as division by zero) occurs while executing

the specified remote procedure.
• The specified remote procedure is executed successfully.

RPC REPLY MESSAGE FORMAT
SUCCESSFUL REPLY
UNSUCCESSFUL REPLY

DOS Lectures

Uploaded by

Copyright:

Available Formats

DOS Lectures

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DOS Lectures

Uploaded by

Copyright:

Available Formats

DISTRIBUTED

• Computer architectures consisting of interconnected,

• In tightly coupled systems, the number of processors that

• This is not the case with distributed computing systems

• In distributed systems its own resources are local, whereas

• A processor and its resources are referred to as a

• Parallel advancements in hardware technology allowed reduction in

• The advent of time-sharing systems was the first step toward

• It provided with two important concepts used in distributed

• Shared resources such as files, databases, and software

• Limitation: the terminals could not be placed very far

• In 1990’s there was another major advancement

• ATM’s can make very high speed networking

• The merging of computer and networking technologies gave birth to

• Hardware issues of building such systems were fairly well understood,

• Each minicomputer has multiple users simultaneously logged on to

• Each user is logged on to one specific minicomputer, with remote

• The network allows a user to access remote resources that are

• The early a ARPANet is an example of a distributed computing

• In such an environment, at any one time a significant

• Therefore, the idea of the workstation model is to

• Advantage - idle workstations may be used to process jobs of

1. How does the system find an idle workstation?

2. How is a process transferred from one workstation to

3. What happens to a remote process if a user logs onto

• A workstation with its own local disk is called a

• diskless workstations are more popular in

• There are specialized machines (or the specialized workstations) for

• Example - there may be multiple file servers, each running on a

• Diskless workstations are also preferred to diskful workstations from a system

• This model provides an effective general-purpose approach to the sharing

• It can also be implemented in a variety of hardware and software

• Unlike the workstation-server model in which a processor is allocated to

• The pool of processors consists of a large number of microcomputers and

• Amoeba, Plan 9 and the Cambridge Distributed Computing System are

• Provides greater flexibility than the workstation-server model. The system's

• Advantages – 1. Efficient execution of computation-intensive jobs; 2. Gives

• Limitation: Expensive to implement than the workstation-server model or the

• Complexity is mainly due to:

• The performance and reliability of a distributed computing system

• Special software is needed to handle loss of messages during transmission

• Special software security measures are needed to protect the widely

• The technical needs, the economic pressures, and the

• Inherently distributed applications includes

• These applications require that some processing power

• The use of distributed computing systems by a

• Groupware applications depend heavily on the

• Sharing of software resources such as software

• They also facilitate resource sharing among

• If a particular computation can be partitioned into a number of

• Example - in a DC system based on the workstation model, if a user

• A reliable system prevents loss of information even in the event of

• The multiplicity of storage devices and processors in a distributed

• If one of the processors fails, the computation can be successfully

• The geographical distribution of the processors and other resources in a

• Example - If the processor of a centralized system fails the entire system

• In a workstation of a distributed computing system that is based on the

• In a distributed computing system based on the processor pool model, if

• Example - additional processors can be easily added to the system

• Extensibility is also easier in a distributed computing system