snap shot
snap shot
snap shot
Distributed Snapshot
Introduction
• Sometimes you want to collect the current state of the distributed computation, called
Distributed Snapshot.
• Computation is a sequence of atomic actions that transforms a given initial state to
the final state.
• While such actions are totally ordered in a sequential process, they are only partially
ordered in a Distributed System.
• It is customary to reason about the properties of a program in terms of states and
state transitions.
• In distributed snapshot, state is also called as global state.
• The global state of a distributed computation is the set of local states of all individual
processes involved in the computation plus the state of the communication channel.
• Since the local physical clocks are never perfectly synchronized, the components of a
global state can never be recorded at the same time.
• In an asynchronous distributed system, actions are not related to time.
Requirement of Global States
Distributed Garbage Collection:
• An object is considered to be garbage if there are no longer any references to it
anywhere in the distributed system.
• It is based on the reference counting and should include the state of communication
channels.
Requirement of Global States
Distributed Deadlock Detection:
• It occurs when each of collection of processes waits for another process to send it a
message and look for “waits-for” relationship.
Distributed Debugging:
• Need collect values of distributed variables at the same time.
Properties of Consistent Snapshot
• The global state of a distributed system is a collection of the local states of the
processes and the channels.
• Operating system cannot know the current state of all process in the DS.
• A process can only know the current state of all processes on the local system.
• Remote processes only know state information that is received by messages.
State:
• Sequence of messages that have been sent and received along channels incident with
the process.
Snapshot:
• Records the state of a process.
Distributed Snapshot:
• A collection of snapshots, one for each process.
Global State
• We can define the global state with the union of all the snapshots.
• To have a global consistent state then process state has registered a received message,
the sending process state must own this message.
• Useful to adapt any centralized algorithm over a DS.
• Sometimes you want to collect the current state of the distributed computation, called
Distributed Snapshot.
• It consists of all local states and messages in transit.
• Global state of a distributed system consist of the local state of each process, together
with the messages that are currently in transit (sent but not delivered).
• A distributed snapshot should reflect a consistent state.
• The notation of the Global state can be graphically represented by what is called a cut.
• Diagram shows the consistent and non- consistent cut.
Global State
Cut:
• A cut is a set of cut events, one per node, each of which captures the state of the node
on which it occurs.
Global State
• In a consistent cut it is possible to have the sending of a message before a cut, but the
receiving of that message after the cut.
• That means, when considering consistent cuts we have also to worry about and save
the ‘ in transit’ messages, ie. Messages sent before the cut but received after the cut.
Example of a cut:
• Consider this solution for the common problem of deadlock detection.
• System has 3 processes P1, P2, P3.
• An external process P0 sends a message to each process.
• Each process on getting this message reports its local state.
• This global state thus collected at P0 is a cut.
• P0 uses this information to create a wait for graph.
Global State
Algorithm:
• Initiator process P0 records its state locally
• Marker sending rule for process Pi:
After Pi has recorded its state, for each outgoing channel Chij, Pi sends one marker
message over Chij
Chandy-Lamport Algorithm
• Marker receiving rule for process Pi :
Process Pi on receipt of a marker over channel Chij
If (Pi has not yet recorded its state) it,
Records its process state now;
Records the state of Chij as empty set;
Starts recording messages arriving over other incoming channels;
else(Pi has already recorded its state)
Pi records the state of Chij as the set of all messages it has received over Chij since it
saved its state.
Lai - Yang Algorithm
• The Lie - Yang algorithm fulfills this role of a marker in a non FIFO system by using a
coloring skill on computation messages that works as follows messages:
1: Every process is initially white it and turns read while taking a snapshot. The
equivalent of the "Marker Sending Rule" is executed when a process turns red.
3: Thus, a white/(red) message in a message that what sent before/(after) the sender
of the message recorded its local snapshot.
4: Every white process takes its snapshot at its convenience, but no later than the
instant it receives a red message. Every white process records a history of all white
messages sent or received buy it along each channel.
Lai - Yang Algorithm
5: When a process turns red, it sends the histories along with its snapshot to the
initiator process that collects the Global snapshot.
6: The initiator process evaluates transit (LSi, LSj) to compute the state of a channel Cij
as given below:
SCj = white messages sent by Pi on Cij – white messages received by Pj on Cij
• Thus, when a white process receives a read message, it records its local snapshot
before processing the message.
• This ensures that no message sent bi a process after recording its local snapshot is
processed by a destination process before the destination records its local snapshot.
Lai - Yang Algorithm
• Thus, an explicit marker message is not required in this algorithm and the marker is
piggybacked on computation messages using a coloring scheme.
• Though marker messages are not required in the algorithm, each process has to
record the entire message history on each channel as part of the local snapshot.
• Thus the space requirements of the algorithm may be large.
• Lai and Yang describe how the size of the local storage and snapshot recording can be
reduced by storing only the messages sent and received since the previous snapshot
recording, assuming that the previous snapshot is still available.
• This approach can be very useful two applications that required repeated snapshot of
a distributed system.
Distributed Debugging
• To examine the problem of recording a system’s global state so that we may make
useful statements about whether a transitory state – as opposed to a stable state –
occurred in an actual execution.
• We can determine whether the system is stable or transitory in an execution.
• This is what we require, debugging a DS.
• Practical example is a DS controlling a system of pipes in a factory where we are
interested in whether all the valves (controlled by different processes) were open at
some time.
• In this examples, we cannot in general observe the states of the valves simultaneously.
• The challenge is to monitor the system’s execution over time – to capture ‘trace’
information rather than a single snapshot.
• So that we can establish post hoc whether the required safety condition was or may
have been violated.
Distributed Debugging
• According to Chandy and Lamport’s snapshot algorithm, collects state in a
distributed fashion, and we pointed out how the processes in the system could send the
state they gather to a monitor process for collection.
• The algorithm we describe next (Marzullo and Neiger [1991]) is centralized.
• The observed processes send their states to a process called a monitor, which
assembles globally consistent states from what it receives.
• We consider the monitor to lie outside the system, observing its execution.
• Our aim is to determine cases where a given global state predicate φ was definitely
True at some point in the execution we observed, and cases where it was possibly True.
• The notion ‘possibly’ arises as a natural concept because we may extract a consistent
global state S from an executing system and find that φ(S) is True.
• No single observation of a consistent global state allows us to conclude whether a
non-stable predicate ever evaluated to True in the actual execution.
Distributed Debugging
Notations
• φ : a global state predicate
• Possibly φ : There is a consistent global state S through which a linearization of H
(history) passes such that φ(S) is True.
• Definitely φ : All linearization's of H passes such that φ(S) is True.
• Evaluating possibly φ entails search through all consistent Global States derived from
the observed execution.
• Processes may react to what condition is being tested for and respond accordingly.
• The monitor must assemble consistent Global States against which it evaluates φ.
• The monitor uses the vector clock values of each message to determine the consistent
Global state.
Distributed Debugging
For possibly φ
• The monitor process starts at the initial state and steps through all consistent States
reachable from that point, evaluating φ at each state. When φ is true, it stops.
For definitely φ
• The monitor process must attempt to find a set of states to which all linearization's
must pass, and at each of which φ is true.
Huang’s Algorithm:
• Huang's algorithm is used to solve this problem.
• Huang's algorithm can be described by the following,
a. Initially all processor idle.
Termination Detection Algorithm
b. A distributed task is started by a process sending a computational message to
another process. This initial process is send the message is the "controlling agent". The
initial weight of the controlling agent is usually 1.
c. The following rules are applied throughout the computation,
i. A process sending a message splits its current weight between itself and the
message.
ii. A process receiving a message add the weight of the message to itself.
iii. Upon becoming idle, a process sends a message containing its entire weight
back to the controlling agent.
iv. Termination occurs when the controlling agent has a weight of and is in the idle
state.
Termination Detection Algorithm
•
Dijkstra – Scholten Algorithm
Diffusion Algorithms:
• In a Diffusion algorithm A the activity begins by some pre-specified source node pi
diffuses through some portion of the network via messages.
• A Global state of A is said to be quiescent provided that no process is enabled to
perform any state and there are no messages in the channels.
Algorithm:
• Execute A as usual, adding acks for all messages.
• Messages of A of treated like search messages in AsynchSpanningTree.
• When a process receives an external input, it becomes the root, and begins executing
A.
Dijkstra – Scholten Algorithm
• When any non-root process receives its first A message, it designates the sender as its
parent in the tree, and begins participating in A.
• Root process acks every message immediately.
• Other processes acks all but the first message immediately.
Traversal Algorithm:
• A traversal algorithm is a centralized wave algorithm, i.e. There is one initiator, which
sends around a token.
• It representing a combined send and receive event.
• In each computation, the token first visits all processes.
• Finally, a decide event happens at the initiator, who at that time holds the token.
• In traversal algorithms, the father of non initiator is the neighbor from which it
received the token first.
Wave Algorithm
Echo Algorithm:
• The echo algorithm is a centralized wave algorithm for undirected graphs.
1. Initiator sends a message to all neighbors.
2. The father of a non initiator is the neighbor from which it receives the first
message.
3. A non initiator sends a message to all neighbors except its father.
4. When a non initiator has received a message from all neighbors, it sends a
message to its father.
5. When the initiator has received a message from all neighbors, it decides.
• The Echo algorithms consist of two phases or waves; a forward wave of explorer
messages which spread through a net, and a backward wave of echo messages which
is created give the explorer wave front hits the border of the net.
Wave Algorithm
• The explorer wave travels from the initiator to the border of the distributed system
and can be used to disseminate information, the Echo wave travels from the border of
the system back to the initiator and can be used to collect information from the system.
• During the propagation of the first Explorer wave, a spanning tree is constructed
which consists of all the predecessor nodes store in the parameter "pred" at each node.
• The Echo wave travels back along this spanning-tree to the initiator.
• If it reaches the initiator, the algorithm is terminated.
• Given a General graph with intelligent nodes which can communicate along its edges,
the first idea is that message passing is the fundamental operation of any echo
algorithm.
• Traversal of the graph therefore means passing messages from one node to another.
• For any particular node i which starts the execution of an echo algorithm, the
messages originating from i form a family, sharing the identity of i in common.
Wave Algorithm
Distribution of list among the nodes in a mesh network :
• Each node in the network should be given a unique number.
• This will be distributed by the node S that when the algorithm starts.
• And doesn't know which nodes there are in the network.
Phase 1:
1. S starts an echo algorithm.
2. Each leaf node return the value 1 in its ECHO.
3. Each ECHO send back on a link that is not FL returns the value 0.
4. Each node register the return values for ECHO's for the corresponding links.
5. When a node has got all ECHO messages it returns an ECHO on its FL containing
the sum value of all its return values + 1(for itself).
6. When the initiating node S, has got ECHOs on all its links it knows how many
nodes that are present in the network at the given moment.
Wave Algorithm
Phase 2:
7. The initiating node creates as many unique identifiers that are nodes in the
network and send them on its links to its neighbors.
Each link message get the number of identifier as was given in corresponding
ECHO.
8. Each other node will get a message with Unique Identities on its FL (the FL of
Phase 1).
The node keeps one of the identities and sends the rest on its links according to
the corresponding registered number of nodes. Each link gets as many as it
indicated in its ECHO.
9. Echo messages can optionally be sent back so the initiator node can get a
confirmation that the algorithm has terminated.
Unit –III
Resource deadlock:
• Set of deadlocked processes of, where each process waits for resource held by
another process (e.g. Data object in a database, I/O resource on a server).
• This uses AND condition.
• AND condition : A process that requires resources for execution can proceed when it
has acquired all those resources.
• The condition for deadlock in a system using the AND condition is the existence of a
cycle.
Resource V/s Communication Deadlock
Communication deadlock:
• Set of deadlocked processes, where each process waits to receive messages
(communication) from other processes in the set.
• This uses OR condition.
• OR condition: A process that requires resources for execution can proceed when it has
acquired at least one of those resources.
• The condition for deadlock in a system using the OR condition is the existence of a
knot.
• A knot (K) consists of a set of nodes such that for every node A in K, all nodes in K and
only the nodes in K are reachable from node A.
Detection of Resource Deadlock
• The state of process - resource interaction in distributed systems can be modeled by
a bi-partite directed graph called a resource allocation graph.
• The nodes of the graph are processes and resources of a system, and edges of the
graph depict assignments for pending requests.
• A pending request is represented by request edge directed from the node of a
requesting process to the node of the requested resource.
• A resource assignment is represented by an assignment edge directed from the node
of an assigned resource to the node of the assigned process.
• A system is deadlocked if its resource allocation graph contains a directed cycle or a
knot.
Detection of Resource Deadlock
Asynchronous Yes No