A distributed system is a collection of Reliability
independent computers that are Performance
used jointly to perform a single task Scalability
or to provide a single service.
Flexibility:
Network OS:
➜ Build a system out of (only)
Properties required components
No single system image ➜ Extensibility: Components/services
Individual nodes are highly can be changed or added
autonomous
All distribution of tasks is ➜ Openness of interfaces and
explicit to the user specification
Examples: Linux, Windows ➜ Allows reimplementation
and extension
Middleware OS: ➜ Interoperability
Similar to a uniprocessor OS ➜ Separation of policy and
but mechanism
System independent interface ➜ Standardized internal
for distributed programming
interfaces
Improves transparency(e.g,
hides heterogeneity)
Provides services(e.g., naming Techniques for Scaling:
services)
Decentralisation
Provides programming
model(e.g., distributed objects) Hiding communication
latencies(asynchronous
communication, reduce
The distributed nature of a system communication)
gives rise to the following challenges:
Distribution ( spreading
Transparency data and control around)
Flexibility
Replication (making Synchronization
copies of data and Replication and Consistency
processes). Naming
Fault Tolerance
Security
Decentralisation:
Avoid centralising
Most distributed systems are built
Services( e.g., based on a particular paradigm (or
single server) model):
Data (e.g., central Shared memory
directories) Distributed objects
Algorithms(e.g., Distributed file system
based on Shared documents
complete Distributed coordination
information) Agents
With regards to algorithms: MISCELLANEOUS ‘RULES OF THUMB’
➜ Do not require any machine Trade-offs: Many of the
to hold complete system state challenges provide conflicting
➜ Allow nodes to make requirements. For example
decisions based on local info better scalability can cause
➜ Algorithms must survive worse overall performance.
failure of nodes Have to make trade-offs -what
➜ No assumption of a global is more important?
clock Separation of Concerns: Split a
problem into individual
concerns and address each
Several key principles underlying all separately.
distributed systems:
End-to-End Argument: Some
Concurrency communication functions can
Communication
only be reliably implemented
at the application level
STRUCTURED OVERLAY
Policy vs. Mechanism: A
⮚ Nodes have identifier and
system should build
range
mechanisms that allow flexible
application of policies. Avoid ⮚ Data has identifier
built-in policies.
⮚ Nodes is responsible for data
that falls in its range
System Architecture ⮚ Search is routed to appropriate
⮚ Placement of machines node
⮚ Placement of software
machines Combination of architectures:
Example:
Software Architecture ⮚ Superpeer networks
Logical organization and roles of ⮚ Collaborative distributed
software components systems
⮚ Layered ⮚ Edge-server systems
⮚ Object-oriented
⮚ Data-centered STATEFUL VS STATELESS SERVERS:
⮚ Event-based Stateful:
⮚ Keeps persistent
information about
UNSTRUCTURED OVERLAY
clients
⮚ Data stored at random nodes
⮚ Improved
⮚ Partial view: node’s list of performance
neighbours
⮚ Expensive crash
⮚ Exchange partial views with recovery
neighbours to update
⮚ Must track clients ➀Data oriented vs control oriented
communication
Stateless:
➁ Synchronous vs asynchronous
⮚ Does not keeps state of clients
communication
⮚ Soft state design: limited client
➂ Transient vs persistent
state
communication
⮚ Can change own state
without informing clients
Data-Oriented vs Control-Oriented
⮚ No cleanup after crash
Communication:
⮚ Easy to replicate
Data-oriented communication
⮚ Increase communication
➜ Facilitates data exchange
between threads
Code Mobility: ➜ Shared address space,
shared memory & message passing
Weak vs Strong Mobility
⮚ Weak transfer only code Control-oriented communication
⮚ Strong transfer code and ➜ Associates a transfer of
execution segment control with communication
➜ Active messages, remote
procedure call (RPC) & remote
Sender vs Receiver initiated method invocation (RMI)
migration:
COMMUNICATION ABSTRACTIONS:
⮚ Sender Send program to
compute server Abstractions above simple message
passing make communication easier
⮚ Receiver Download applets
for the programmer.
Provided by higher level APIs
COMMUNICATION MODES
• ➀ Remote Procedure ➜ Service discovery
Call (RPC) & Remote
➜ Event notification
Method Invocation (RMI)
• ➁ Message-Oriented
Communication GOSSIP-BASED COMMUNICATION
• ➂ Group Communication Data Streams:
• ➃ Streams ➜ Sequence of data units
➜ Can apply to discrete and
continuos media (e.g., TCP
REMOTE PROCEDURE CALL(RPC):
connection is a stream)
Idea: Replace I/O oriented message
➜ Simple stream: single sequence of
passing model by execution of a
data
procedure call on a remote node
[BN84]: ➜ Complex stream: several related
streams (substreams) (e.g.,audio and
video)
➜ Synchronous - based on blocking
messages
Transmission modes:
➜ Message-passing details hidden
from application Asynchronous no timing
constraints (e.g., file transfer)
➜ Procedure call parameters used to
Synchronous maximum end-to-
transmit data
end delay for each data unit
➜ Client calls local “stub” which does (e.g., sending sampled data.
messaging and marshalling Must not be too old when it
reaches destination)
Isochronous maximum and
GROUP-BASED COMMUNICATION minimum end-to-end delay
(e.g., video)
Used for:
➜ Replication of services
Timing model of a distributed system
➜ Replication of data
Affected by: ➜ Execution speed/time of processes
➜ Execution speed/time of processes ➜ Communication delay
➜ Communication delay ➜ Clocks & clock drift
➜ Clocks & clock drift ➜ (Partial) failure
➜ (Partial) failure
With coordination you should agree
on Values:
EVALUATING DISTRIBUTED
ALGORITHMS ➜ Agree on global value
General Properties: ➜ Agree on environment
➜ Performance ➜ Agree on state
number of messages
exchanged
Main issues of coordination and
response/wait time
synchronization.
Delay
throughput: 1/(delay + • Time and Clocks:
executiontime) • Global State:
complexity: O()
• Concurrency Control:
➜ Efficiency
resource usage: memory, CPU,
etc. Synchronization Modes NETWORK
TIME PROTOCOL (NTP):
➜ Scalability
➜ Reliability Multicast: for LAN, low accuracy
Procedure Call: clients poll,
number of points of failure
reasonable accuracy
(low is good)
Symmetric: Between peer servers.
highest accuracy
Timing model of a distributed system
Affected by:
Read about LOGICAL CLOCKS – 1. Released: Outside of critical
Lamport’s logical clock section
Read about VECTOR CLOCKS 2. Wanted: Waiting to enter
critical section
Read about: CHANDY & LAMPORT’S
SNAPSHOTS 3. Held: Inside critical section
DISTRIBUTION MUTUAL EXCLUSION EVALUATING DISTRIBUTED
Requirements: ALGORITHMS
• ➀ Safety: At most one process General Properties:
may execute the critical section at
➜ Performance
a time
number of messages exchanged
• ➁ Liveness: Requests to enter and
exit the critical section eventually response/wait time
succeed Delay
• ➂ Ordering: Requests are throughput: 1/(delay +
processed in happened-before executiontime)
ordering
complexity: O()
METHOD 1: CENTRAL SERVER ➜ Efficiency
METHOD 2: TOKEN RING resource usage: memory, CPU, etc.
METHOD 3: USING MULTICAST AND ➜ Scalability
LOGICAL CLOCKS ➜ Reliability
number of points of failure (low is
Algorithm by Ricart & Agrawala: good)
➜ Processes pi maintain a Lamport
clock and can communicate pairwise
➜ Processes are in one of three
states:
SYNCHRONISATION AND OTHER Multicast ISSUES
COORDINATION 1
• Multicast
Performance:
• Elections ➜ Bandwidth
• Transactions ➜ Delay
Efficiency:
Examples of Multi-Cast
➜ Avoid sending a message over
Fault Tolerance a link multiple times (stress)
Service Discovery ➜ Distribution tree
Performance ➜ Hardware support (e.g.,
Event or Notification propagation Ethernet broadcast)
Network-level vs Application-level:
➜ Network routers understand
multicast
➜ Applications (or middleware)
send unicasts to group members
➜ Overlay distribution tree
Types of MultiCast:
BASIC MULTICAST
FIFO MULTICAST
CAUSAL MULTICAST
TOTALLY ORDERED MULTICAST
Sequence Based
Properties of Multicast
Aggreement-based
Group membership:
Open vs Closed group: Other possibilities:
Reliability: ➜ Moving sequencer
Ordering: ➜ Logical clock based
• each receiver
determines order independently
• delivery based REPLICATION I SSUES
on sender timestamp ordering
Updates
➜ Token based Replica placement
➜ Physical clock ordering Redirection/Routing
Hybrid Ordering:
➜ FIFO + Total
➜ Causal + Total Replica INCONSISTENCY
Staleness:
Research Bully Algorithm: Operation order:
Three types of messages in Bully Conflicting Data
algorithm:
• Election: announce election
• Answer: response to election TYPES OF DATA-CENTRIC
• Coordinator: announce CONSISTENCY MODELS Y:
elected coordinator STRICT CONSISTENCY
Read about Election RING SEQUENTIAL CONSISTENCY
ALGORITHM
CAUSAL CONSISTENCY
WEAK CONSISTENCY
ACID PROPERTIES OF TRANSACTIONS
RELEASE CONSISTENCY
atomic
ENTRY CONSISTENC
consistent
isolated
durable
CLIENT-CENTRIC CONSISTENCY
MODELS:
CLASSIFICATION OF TRANSACTION
Flat
Nested
Distributed CONSISTENCY PROTOCOLS:
Primary-Based Protocols:
➜ Remote-write protocols • Maintainability: how easily a
failed system can be repaired
➜ Local-write protocols
Replicated-Write Protocols:
CATEGORISING FAULTS AND
➜ Active Replication FAILURES
➜ Quorum-Based Protocols • Types of Faults:
• Transient Fault: occurs
PUSH vs PULL once then disappear
REPLICA PLACEMENT • Intermittent Fault:
occurs, vanishes,
DYNAMIC REPLICATION reoccurs, vanishes, etc.
• Permanent Fault:
DEPENDABILITY DEPENDS ON: persists until faulty
component is replaced
• Failure
• Types of Failures:
• Reliable Communication
• Process Failure: process
• Process Resilience
proceeds incorrectly or
• Recovery not at all
• Storage Failure: “stable”
secondary storage is
DEPENDABILITY DEPENDS ON:
inaccessible
• Availability: system is ready to
Communication Failure:
be used immediately
communication link or
• Reliability: system can run node failure
continuously without failure
• Safety: when a system
FAILURE MODELS:
(temporarily) fails to operate
correctly, • Crash Failure: a server halts,
nothing catastrophic happens but works correctly until it
halts
• Fail-Stop: server will Arbitrary Failure: a server may
stop in a way that clients produce arbitrary response at
can tell that it has arbitrary times (aka Byzantine failure)
halted.
• Fail-Resume server will
• FAULT TOLERANCE Techniques:
stop, then resume
execution at a later time. • ➜ Prevention: prevent
or reduce occurrence of
• Fail-Silent: clients do not
faults
know server has halted
• ➜ Prediction: predict
• Omission Failure: a server fails
the faults that can occur
to respond to incoming
and deal with them
requests
• Receive Omission: fails • ➜ Masking: hide the
to receive incoming occurrence of the fault
messages • ➜ Recovery: restore an
• Send Omission: fails to erroneous state to an
send messages error-free state
• Timing Failure: a server’s
response lies outside the STUDY Byzantine Fault Tolerance
specified
time interval
• Response Failure: a server’s DS CHALLENGES
response is incorrect Transparency
• Value Failure: the value Flexibility
of the response is wrong
Dependability
• State Transition Failure:
Performance
the server deviates from
the correct flow Scalability
of control
Three designs of replication:
➜ Explicit replication: The client
explicitly writes files to multiple
servers (not transparent).
➜ Lazy file replication: Server
automatically copies files to other
servers after file is written.
➜ Group file replication: WRITEs
simultaneously go to a group of
servers.
Read about case studies:
➜ Network File System (NFS)
➜ Andrew File System (AFS) & Coda
➜ Google File System (GFS)