Distributed Systems and Cloud
Computing
Examination Scheme
Course Contact Theory
Course Name End
Code Hours
Sem. Total
Total Exam
CA MSE
(CA + MSE)
ME-MCA-3 Distributed Systems
03 25 20 50 50 100
15 and Cloud Computing
Syllabus and Course Outcomes
No. Module Course Outcomes(CO)
Introduction to Distributed Computing Concepts:
CO1: Explain the architecture, design issues
1 Inter Process Communication, Remote Communication and communication mechanisms in distributed
systems
2 Time & Coordination in Distributed Systems CO2: Apply synchronization techniques and
shared memory management in distributed
3 Distributed Shared Memory & Data Consistency environments.
4 Distributed System Management: Resource Management, CO3: Analyze process and resource
Process Management, Distributed File System management in distributed systems.
5 Cloud Computing Foundations & Architectures - Cloud
Computing Architecture
CO4: Analyze Cloud computing and cloud
6 Cloud Platforms, Security & Challenges - Cloud models
Service Providers, Implementation, Security and
Compliance
Reference Books
● Pradeep K. Sinha , Distributed Operating System: Concepts and Design
● Dr. Sunita Mahajan , Seema Shah, Distributed Computing
Module 1 - Introduction to Distributed
Computing Concepts
Module 1 - Introduction to Distributed Computing
Concepts
No. Topics
Introduction to Distributed Computing Concepts:
Basic concepts of distributed systems, distributed computing models, issues in designing
1
distributed systems. CAP Theorem: Trade-offs between Consistency, Availability, and Partition
Tolerance. Scalability & Fault Tolerance in Distributed Systems
Inter Process Communication:
2 Fundamental concepts related to inter process communication including message passing
mechanism, Concepts of group communication
3 Remote Communication:
Remote Procedure Call (RPC), Remote Method Invocation (RMI)
Basic Terminology of Computing
● Uses of a network
— Information sharing
— Resource sharing
— Higher reliability
— All of above
● Response time is
— Time taken to complete task
— Time taken to give first response to request
Basic Terminology of Computing
● Centralized Multi user system will have
—One server – multiple terminals
—Multiple servers- multiple terminals
● Throughput is
—No. of tasks processed in per unit of time
—Total no. of tasks executed
● Extensibility of a system means
—To expand the system over a network
—Ability to modify or add more components
Basic Terminology of Computing
● Fault tolerance means
—System gives correct result in spite of wrong program
—Continue operating properly even if components fail
● Mobility means
—Can access the system from any place
—System can be accessed from any device
● System availability means
—System will not fail when it is needed
—System will be ready to use all the time
Centralized Multi-user System
● All processing and data storage is managed by a single, central server.
● Most commonly used architecture, where a client sends a request to the
company/organization server and receives the response.
Problems:
● Single point of failure Network
● Difficult to expand
Distributed Computing System
● Distributed computing refers to a system where processing and data storage
is distributed across multiple devices or systems, rather than being handled
by a single central device.
● A collection of independent computers that appears to its users as a single
coherent system. (Andrew Tanenbaum).
Distributed Computing System
● A collection of heterogeneous nodes connected by one or more interconnection
networks which provides access to system-wide shared resources and services.
● It is basically a collection of interconnected processors covering wide geographical
area in which each processor has its own local memory and other peripherals.
● The communication between any two processor takes place by message passing
over communication network.
Example of Distributed Computing System
Distributed Computing System
● A type of computing in which different components and objects comprising an
application can be located on different computers connected to a network.
● Ex, a word processing application might consist of an editor component on one
computer, a spell-checker object on a second computer, and a thesaurus on a third
computer. In some distributed computing systems, each of the three computers could
even be running a different operating system.
● The data used in a distributed processing environment is also distributed across
platforms.
Distributed Applications
● Automated banking systems
● Tracking roaming cellular phones
● Global positioning systems
● Passenger reservation system: railways and airlines
● Amazon, Flipkart
● Air-traffic control
● Avionics (fly-by-wire)
● Research Institutions
● The World Wide Web
Hardware Considerations
● Architecture of interconnected multiple processors are of two types:
○ Tightly Coupled System
○ Loosely Coupled System
Parallel vs Distributed Architecture
Parallel vs Distributed Architecture
● Tightly Coupled System (Shared Memory Architecture)
○ Single system wide primary memory.
○ Communication takes place through shared memory.
○ No. of systems are limited by bandwidth of shared memory.
○ Sometimes called Parallel Processing Systems.
● Loosely Coupled System (Distributed Memory architecture)
○ Each processor has its own local memory.
○ Communication is done by passing message across the network.
○ Scalable - It can have unlimited number of processors.
○ Processors may be geographically separated.
○ It is known as Distributed Computing System.
Parallel Systems Distributed Systems
Memory Tightly coupled shared Distributed memory
memory Message passing, RPC, and/or used of
distributed shared memory
Control Global clock control No global clock control. Synchronization
algorithms needed.
Processor Order of Tbps Order of Gbps
interconnection Bus, mesh, tree, mesh of Ethernet(bus), token ring and
tree, and hypercube SCI (ring)
network
Main focus Performance - Scientific Performance - cost and scalability,
computing Reliability, Information/resource sharing
Network Operating System
Distributed Operating System
Network OS Distributed OS
Single System ● User is aware of the fact that ● YES. Provides virtual uniprocessor image to
Image multiple computers are being used. the user. Ex. DOS dynamically and
Ex. Selection of machine for automatically allocates jobs to various
executing a job is manual via machines.
remote login.
● Has to know location of resource to ● Single set of globally valid system calls.
access it & use different system
calls to access local & remote
resource
Autonomy ● High. Local OS at each computer & ● Low. A single system-wide OS.
communicate via common ● Processes & resources managed globally
communication protocol. Shared file
system.
● Each computer functions
independently & manages its own
processes & resources.
Fault Unavailability grows as faulty machines Unavailability remains little even if fault machines
Tolerance increase. increase.
Distributed Computing System
Models
1. Minicomputer Model
MiniComputer Model
● Simple extension of centralized time-sharing system.
● Consists of a few minicomputers interconnected by a communication
network where each minicomputer usually has multiple users simultaneously
logged on to it.
● Several interactive terminals are connected to each minicomputer. Each user
logs on to one specific minicomputer that has remote access to other
minicomputers. Does not reflect uniprocessor image.
● Network allows a user to access remote resources that are available on
some machine other than the one on to which the user is currently logged.
● Used for resource sharing with remote users.
● Ex. ARPA Net
2. Workstation Model
Workstation Model
● WorkStation
○ A powerful, single-user computer, like a personal computer, but has a
more powerful microprocessor.
○ Each has its own local disk and a local file system – diskful workstation.
● Process migration
○ Users first log on his/her personal workstation.
○ If there are idle remote workstations, one or more processes are
migrated to one of them.
○ Result of execution migrated back to user’s workstation.
Workstation Model
● Issues to be resolved:
○ How to find an idle workstation
○ How to migrate a job
○ What if a user logs on the remote machine executing process of another
machine – run two processes simultaneously, kill remote process,
migrate process back to its home workstation ?
● Examples – Sprite System, Xerox PARC
3. Workstation-Server Model
Workstation-Server Model
● Client WorkStation
○ Network of personal workstations each with its own disk and local file
system.(diskful workstations).
○ Local disk of diskful workstation used for storage of temporary files, etc.
○ Workstation without local disk is called diskless workstation.
● Server minicomputers
○ Each minicomputer is dedicated to one or more different types of services,
for managing & providing access to to shared resources.
○ Multiple servers used for a service for better scalability and higher reliability.
● User logs on to his machine. Normal computation activities carried at home
workstation but services provided by special servers.
● No process migration involved. Request Response Protocol implemented.
Workstation-Server Model
● Advantages
○ Cheaper - few minicomputers vs. large no. of diskful workstations
○ Backup and hardware maintenance easier
○ Flexibility to access files from any file server.
○ No process migration
○ Guaranteed response time(no remote process execution)
● Disadvantage
○ Does not exploit idle workstations
● Client-Server model of communication
○ RPC (Remote Procedure Call) & RMI (Remote Method Invocation)
● Most widely user model for building distributed Systems.
Example: V system.
4. Processor-Pool Model
Processor-Pool Model
● Based on the observation that most of the time a user does not need any computing
power but once in a while the user may need a very large amount of computing power
for a short time.
● Processors (microcomputers & minicomputers ) are pooled together to be shared by
users as needed.
● Each processor has its own memory to load and run a system program or an
application program of the DCS.
● Servers- Necessary number of processors are allocated to each user from the pool by
run server.
Processor-Pool Model
● No concept of home machine. User logs on to system as whole.
● Better utilization of processing power but less interactivity
● Greater flexibility – processors can act as extra servers
● Unsuitable for high performance interactive application as communication slow between
processor & terminal
● Example – Amoeba, Cambridge Distributed System
Hybrid Model
● Combines advantage of both the workstation – server and processor - pool model
● Based on workstation – server model with additional pool of processors
● The processor in the pool can perform large computations
● Workstation-server model can perform user interactive jobs.
● Hybrid model is more expensive to implement.
Distributed Computing System Models
Think!!
Suppose a component of a distributed system suddenly crashes. How will this
inconvenience the users when one of the following happens:
1. The system uses processor-pool model and crashed component is a processor in the
pool.
2. In processor-pool model, a user terminal crashes.
3. The system uses a workstation-server model and server crashes.
4. In the workstation-server model, one of the client crashes.
Issues - Answer!!!
● Processor-Pool Model and Crashed Component is a Processor in the Pool
○ Reduced Processing Power
○ Task Reassignment
○ Load Balancing Issues
● Processor-Pool Model and a User Terminal Crashes
○ Loss of Session Data
○ Reconnection Efforts.
○ Minimal Impact on Others.
● Workstation-Server Model and Server Crashes
○ System-Wide Downtime.
○ Data Loss
○ Recovery Time.
● Workstation-Server Model and a Client Crashes
○ Loss of Session Data
○ Minimal Impact on Server and Other Clients
○ Reconnection Required
Issues in Distributed Computing
Systems.
Transparency
● How to achieve the single-system image, i.e., how to make a collection of
computers appear as a single computer.
● 8 transparencies identified -
○ Access Transparency
○ Location Transparency
○ Replication Transparency
○ Failure Transparency
○ Migration Transparency
○ Concurrency Transparency
○ Performance Transparency
○ Scaling Transparency
Transparency in a Distributed System
● Access Transparency
○ Allow uniform access to resources(local or remote).
○ Use global set of system calls & global resource naming facility ( ex. URL).
● Location Transparency
○ Hide where a resource is located. Require system-wide, global unique resource naming facility.
○ Name transparency – Name of resource should not reveal its physical location. Allow
resource migration without changing name.
○ User Mobility – User should be able to freely log on to any machine in the system and access
a resource with same name.
Transparency in a Distributed System
● Replication Transparency
○ Naming of replicas – name various replicas & map user supplied name of resource
to appropriate replica.
○ Replication control – How many replicas, where place them, when create/delete
decision to be taken by system.
● Migration Transparency
○ Movement of object is handled automatically by system & following issues are
taken care of –
■ Migration decision made automatically by system.
■ Name of resource remains same on migration from one node to another
■ IPC ensures proper receipt of message by process, even if it further
migrates.
Transparency in a Distributed System
● Failure Transparency
○ Partial failure - masking partial failures in the system from the user. Ex. machine
failure/storage crash
○ Complete failure - Not achievable with the current state of the art DOS. Ex. Failure
of communication network often disrupts the work and is noticeable by the user.
● Concurrency Transparency
○ Each user feels that he is sole user of system. Resource sharing mechanism
should follow -
■ Event ordering property- All access requests to various system resources are
properly ordered to provide a consistent view to all users
■ Mutual exclusion property – At any time at most one process accesses a
shared resource, not to be used simultaneously by multiple processes.
■ No starvation property – Every process releases the resource eventually
■ No deadlock property
Transparency in a Distributed System
● Performance Transparency
○ System is automatically configured to improve performance as per load
varying in the system.
○ Processing capability of the system should be uniformly distributed
among the current jobs available.
● Scaling Transparency
○ System can expand in scale without disrupting activities of users
○ Open system architecture and scalable algorithms.
Reliability
● Faults
○ Fail stop
■ system stops functioning after changing to a state in which failure
is detected.
○ Byzantine failure
■ system continues to function but produces wrong result.
Reliability
● Fault avoidance
○ Occurrence of faults is minimized by making components more reliable & testing
thoroughly.
● Fault tolerance
○ Redundancy techniques
■ K-fail stop failures needs K + 1 replicas
■ K-Byzantine failures needs 2K + 1 replicas.
○ Distributed control
■ Many of algorithms or protocols are use to avoid a single point of failure.
● Fault detection and recovery
○ Atomic transaction - all operations execute or no effect occurs.
○ Stateless servers -
○ Acknowledge & timeout based retransmissions of messages
CAP Theorem
Introduction to CAP Theorem
● The CAP Theorem (also known as Brewer’s Theorem) is a fundamental principle
in distributed systems.
● It states that in the event of a network partition, a distributed system can
guarantee only two out of the following three properties:
○ Consistency
○ Availability
○ Partition Tolerance
● This theorem was first introduced by Eric Brewer in 2000 and formally proved
by Gilbert and Lynch in 2002.
● It serves as a crucial guideline for architects and engineers when designing
distributed databases and systems, especially in environments where
network failures are possible or even likely.
What is CAP in CAP Theorem?
Consistency Availability
All the clients can
Each client CAP Theorem always read and
always has the
same view of write , but it might
data. not be the most
recent.
Partition
The system continues cooperate
despite network failure.
What is 🧢 in CAP Theorem?
Each letter in CAP stands for a specific property:
Property Description
Every read receives the most recent write or an error. All nodes see the same data at
Consistency (C)
the same time.
Every request receives a (non-error) response, without guarantee that it contains the
Availability (A)
most recent write.
Partition Tolerance The system continues to operate despite arbitrary network partitioning or failures
(P) between nodes.
Consistency
● Consistency defines that all clients see the same data simultaneously, no
matter which node they connect to in a distributed system.
● For eventual consistency, the guarantees are a bit loose. Eventual consistency
guarantee means client will eventually see the same data on all the nodes at
some point of time in the future.
● All nodes in the system see the same data at the same time. This is because
the nodes are constantly communicating with each other and sharing
updates.
● Any changes made to the data on one node are immediately propagated to all
other nodes, ensuring that everyone has the same up-to-date information.
Availability
● Availability defines that all non-failing nodes in a distributed system return a
response for all read and write requests in a bounded amount of time, even if
one or more other nodes are down.
● User send requests, even though we don't see specific network components.
This implies that the system is available and functioning.
● Every request receives a response, whether successful or not. This is a crucial
aspect of availability, as it guarantees that users always get feedback.
Partition Tolerance
● Partition Tolerance defines that the system continues to operate despite
arbitrary message loss or failure in parts of the system. Distributed systems
guaranteeing partition tolerance can gracefully recover from partitions once
the partition heals.
● Addresses network failures, a common cause of partitions. It suggests that
the system is designed to function even when parts of the network become
unreachable.
● The system can adapt to arbitrary partitioning, meaning it can handle
unpredictable network failures without complete failure.
Trade-Offs in the CAP Theorem
Trade-Offs in the CAP Theorem
According to the CAP theorem, a distributed system can only guarantee two out of the three properties at
the same time—not all three, especially during network partitions:
Model Properties Guaranteed Trade-off Example
Not partition tolerant. Works only if the network is reliable. Rare in
CA Consistency, Availability
distributed systems.
Consistency, Partition May sacrifice availability during a partition. Example: Some banking
CP
Tolerance systems, HBase, MongoDB.
Availability, Partition May sacrifice consistency during a partition. Example: NoSQL databases
AP
Tolerance like Cassandra, DynamoDB.
CA (Consistency + Availability)
● The system guarantees that all nodes see the same data at the same time
(consistency) and that every request receives a response (availability)—but
only as long as the network is reliable and there are no partitions.
● Design strategies:
○ Use a centralized data store or tightly coupled cluster, ensuring all updates are immediately
visible to all nodes.
○ Master-slave replication or distributed locking can be used to prevent conflicting updates.
CA (Consistency + Availability) (cont.)
● Limitations: If a network partition occurs, the system cannot guarantee both
properties. It may become unavailable or inconsistent until the partition is
resolved.
● Use cases: Rare in large-scale distributed systems due to the inevitability of
network partitions. More common in single-site databases or tightly
controlled environments where network failures are extremely rare.
● Example: Traditional relational databases deployed on a single server or
within a single, highly reliable data center.
CP (Consistency + Partition Tolerance)
● What it means: The system ensures that all nodes see the same data, even if
some parts of the network cannot communicate (partition tolerance). To
maintain this, the system may become unavailable to some requests during a
partition.
● Design strategies:
○ Use distributed consensus protocols (like Paxos or Raft) to ensure all nodes agree on the data
state, even across partitions.
○ If a partition occurs, nodes may refuse to process requests that could compromise
consistency, effectively sacrificing availability.
CP (Consistency + Partition Tolerance) (cont.)
● Limitations: Users may experience downtime or errors during network
failures, as the system prefers to deny requests rather than risk
inconsistency.
● Use cases: Critical systems where data accuracy is paramount, and
temporary unavailability is preferable to serving incorrect data.
● Example: Banking systems (to prevent double-spending), databases like
MongoDB, HBase, and Redis in certain configurations.
AP (Availability + Partition Tolerance)
● What it means: The system remains available and continues to operate even if
network partitions occur (partition tolerance). However, it may serve stale or
inconsistent data during partitions.
● Design strategies:l
○ Use eventual consistency: updates are propagated to all nodes over time, but not necessarily
immediately.
○ Implement conflict resolution mechanisms to reconcile differences when partitions heal.
AP (Availability + Partition Tolerance)
1. Limitations: There is a window of time when different nodes may return
different data, leading to temporary inconsistencies.
2. Use cases: Applications where high availability is more important than
immediate consistency, and users can tolerate seeing slightly outdated data.
3. Example: Social media newsfeeds, shopping carts during browsing, NoSQL
databases like Cassandra, DynamoDB, and CouchDB
System Consistency Availability Partition Typical Use Cases Limitation During Partition
Type Tolerance
Single-site databases, Fails or becomes
CA Yes Yes No
reliable LANs inconsistent
Banking, financial, critical Some requests
CP Yes No Yes
records denied/unavailable
Social media, May serve
AP No Yes Yes
e-commerce browsing stale/inconsistent data
Stateful vs. Stateless File Server
Stateful File Server
Stateless File Server
Flexibility - Monolithic vs. Micro Kernel
Flexibility
● Choosing appropriate kernel
○ Monolithic kernel
■ Kernel where the entire operating system is working in the kernel
space and in supervisor mode.
○ Micro kernel
■ Kernel is reduced to contain minimal facilities necessary, and the
other system services are implemented as user level server
processes. Each server process has its own address space & can be
programmed separately.
■ Highly modular, thus easy to design, modify & add new services.
■ Requires message passing & context switching
Advantages of Micro kernel
● Modular hence very flexible
● Each server is an independent process having its own address space.
● Simple & flexible.
● Ease of design, maintenance & portability.
● Requires message passing & context switching.
● Slower than monolithic kernel.
Performance
● Performance of a distributed system should be at least as good as
centralized system.
● Various performance metrics - response time, throughput, system
utilization, network capacity utilization.
Design Issues to Increase Performance
● Batch if possible
○ Transfer data across network in large chunks rather than as individual pages.
○ Piggybacking of acknowledgement.
● Cache data on client side whenever possible
○ Reduces contention on centralized resources.
○ Saves computation time & network bandwidth.
● Minimize copying of data
● Minimize network traffic
○ Migrate process closer to its resource.
○ Cluster processes that communicate frequently.
● Fine grain parallelism; simultaneously service requests from several clients. Use
threads
Scalability
● Capability of a system to adapt to increased service load without disruption
of service or significant loss of performance.
○ Avoid centralized entities
○ Avoid centralized algorithms
○ Perform most operations on client workstations
Concept Example
Centralized Services A single server for all users
Centralized Data A single online telephone book
Centralized Algorithms Doing routing based on complete
information
Scalability
● Avoid centralized entities
○ Failure of centralized entity brings entire system down
○ Performance of centralized entity becomes system bottleneck
○ Capacity of network that connects centralized entity gets saturated
● Avoid centralized algorithms
○ Collects information from all nodes, process it on single node, then distributes
result to other nodes
● Perform most operations on client workstation
○ Server is common resource for several clients.
Heterogeneity
● Caused by interconnected sets of dissimilar hardware or software systems (Ex. different
topologies, protocols, word lengths(32/64) etc).
● Data and instruction formats depend on each machine architecture.
● If a system consists of K different machine types, we need K–1 translation software's,
at sender/receiver. Adding new data format becomes difficult.
● Instead use intermediate standard data format.
Security
● Difficult due to lack of a single point of control & use of insecure networks
for data communication
● Security concerns:
○ Messages may be stolen, plagiarized or changed by an intruder.
○ Ensure message received by intended receiver & sent by genuine
sender.
● Cryptography used for security.
Middleware
● Middleware is an additional layer of software that is used in NOS to more or
less hide the heterogeneity of the collection of underlying platforms but also
to improve distribution transparency.
● It offers a higher level of abstraction.
● It is placed in the middle between applications & NOS.
Distributed System as Middleware
Distributed Computing Environment (DCE)
● It is an integrated set of services and tools that can be installed as a
coherent environment on top of existing OS and serve as a platform for
building and running distributed application.
● It runs on many different kinds of computers , OS , and network produced
by different vendors.
● It hides differences between machines by automatically performing data-
type conversions, thus making heterogeneous nature of system transparent
to application programmers.
DCE Based Distributed System
DCE Applications
DCE Software
Operating system and networking
Questions
1. List & explain various Distributed Computing System Models.
2. Differentiate –Network, Distributed
3. Difference between service, server, client. Discuss relative advantages/
disadvantages of using single / multiple servers.
4. Replication Transparency
5. Explain CAP Theorem and it’s Trade-offs.
6. Distributed Computing Environment – advantages
7. How existence of multiple computers in a distributed system is made
invisible & provide uniprocessor image.
8. Short note – DOS, processor pool model
9. Issues in designing DOS