UNIT%20IV%20CONSENSUS%20AND%20RECOVERY
UNIT%20IV%20CONSENSUS%20AND%20RECOVERY
UNIT%20IV%20CONSENSUS%20AND%20RECOVERY
Part-A
1)Outline the issues in failure recovery.[April/May 2024]
2)What are the advantages and disadvantages of asynchronous computing?.[April/May 2024]
3)What is a checkpoint in the recovery system?How is it useful?[Nov/Dec 2023]
4)State the issues in failure recovery.State the means to identify the failure at an early
stage.[Nov/Dec 2023]
Part-B
April/May 2024:
Nov/Dec 2023
1|Page
Unit-4/Distributed Computing
Real-Life Analogy
• Agreement: Imagine a group of friends trying to decide where to go for dinner. They
discuss options, and if they can’t agree on a restaurant, they default to a familiar spot
they all accept (a pre-agreed fallback).
• Consensus: Now imagine they’re trying to pick a restaurant, but they can only choose
one that someone explicitly suggested. They need to discuss and vote until they all
agree on one of the proposed options.
i)Failure models:In distributed systems, failure models define how faulty processes behave,
impacting the ability to reach consensus. Key models include:
1. Fail-stop model: A process may crash during execution, potentially leaving some tasks
incomplete.
2. Send/Receive omission: Processes may fail to send or receive messages, leading to
communication gaps.
3. Byzantine failure model: Faulty processes can behave arbitrarily, sending incorrect or
misleading information.
The complexity of reaching consensus depends on the failure model, with Byzantine failures
being the hardest to manage due to their unpredictability and potential for malicious behavior
2|Page
Unit-4/Distributed Computing
iii)Network Connectivity :The System has full logical connectivity, i.e., each process can
communicate with any other by direct message passing.
iv)Sender Identification: Every message received by a process contains the identity of the
sender. This feature helps mitigate some risks of Byzantine faults (faulty behavior where
processes can send false or malicious messages) by ensuring the recipient can at least identify
the source. While the content may be tampered with, the sender's identity is still revealed by
the underlying network protocols.
v)Multiple Messages and Scheduling: When a process is expected to receive multiple
messages from the same sender during a round, the system assumes a scheduling mechanism
that handles the messages in sub-rounds. This ensures each message is uniquely identifiable,
even when multiple messages are sent in a short span of time.
Example:
This ensures that the transaction steps occur in the correct sequence, even in the face of network
delays or faults, preventing issues like lost updates or deadlocks.
viii)Agreement Variable: The agreement variable is the value that processes must agree on in
consensus algorithms. It can be boolean (true/false) or multi-valued. For simplicity in some
algorithms, the boolean form is used, although the results can extend to other data types.
This example illustrates the Byzantine Generals Problem, a classic issue in distributed
systems that highlights the challenge of achieving consensus in the presence of faulty or
malicious actors. Four generals of an army, camped around the enemy fort, need to agree on a
3|Page
Unit-4/Distributed Computing
The challenge is to determine whether the generals can still reach agreement despite the
presence of Byzantine actors, and if so, under what conditions. This situation reflects the core
problem of fault tolerance in distributed systems, where consensus protocols must be devised
to ensure agreement, even when some processes behave maliciously or fail.
The diagram shows four generals (G1, G2, G3, and G4) involved in the Byzantine Generals
Problem, where each general is sending potentially conflicting or misleading messages to the
others. The arrows between the generals indicate the messages being exchanged, and the
numbers along the arrows represent the values being communicated (either 0 or 1).
This example illustrates how Byzantine behavior (traitorous generals) can lead to confusion,
as the messages received by each general are inconsistent. This inconsistency makes it difficult
to determine which messages are reliable, ultimately challenging the generals' ability to reach
consensus on a common decision (whether to attack or not).
The problem shown in the diagram reflects the core difficulty in achieving agreement in
distributed systems when some participants (generals) act maliciously or unpredictably. It
highlights the importance of robust consensus protocols that can overcome the
uncertainty introduced by Byzantine failures.
4|Page
Unit-4/Distributed Computing
The Byzantine Agreement Problem involves a designated source process with an initial
value, and the goal is for all non-faulty processes to agree on the source’s value, even if some
processes are faulty (Byzantine). The key conditions are:
If the source is faulty, the non-faulty processes can agree on any value. Faulty processes'
decisions are irrelevant, and the system ensures that all correct processes reach a consensus.
Use Case:
Aviation Systems: Byzantine agreement is used in safety-critical systems like flight control
computers in aircraft, where some computers might provide incorrect data due to faults, but the
majority of systems need to agree on a single course of action to ensure safe flight.
2. Consensus Problem:
The Consensus Problem differs from Byzantine agreement in that each process has its own
initial value. The goal is for all non-faulty processes to agree on a single value, based on their
initial values. The conditions are:
In this case, all correct processes must come to a consensus on one value, even though each
process may initially propose a different value.
Use Case:
The Interactive Consistency Problem extends the consensus problem by requiring that non-
faulty processes agree on a set of values, one for each process (including possibly faulty
ones). The conditions are:
• Agreement: All non-faulty processes must agree on the same array of values
A[v1,v2,...,vn]
5|Page
Unit-4/Distributed Computing
• Validity: If process i is non-faulty, the agreed value for that process vi must be the
same as its initial value. For faulty processes, non-faulty processes can agree on any
value.
• Termination: Each non-faulty process must eventually decide on the entire array A.
In this problem, instead of agreeing on a single value, the goal is to agree on a set of values
corresponding to the initial value of each process.
Use Case:
Sensor Networks: In sensor networks where sensors collect data from different sources, some
sensors might be faulty or provide incorrect readings. The system needs to achieve interactive
consistency to ensure all non-faulty nodes agree on the values received from each sensor,
ensuring accurate aggregation of data (e.g., for climate control systems in smart buildings).
The formal difference between the agreement and consensus problems is that:
• In the agreement problem, only a single process (source) has an initial value that all
other processes must agree on.
• In the consensus problem, all processes start with their own initial values and must
agree on one value.
Despite this formal distinction, the terms "agreement" and "consensus" are often used
interchangeably in the literature because they address similar goals—achieving agreement
among non-faulty processes.
Reductions:
• Since these problems are equivalent, a solution to one problem can be transformed into
a solution for the others, implying that they share core characteristics, such as how non-
faulty processes reach a decision, handle failures, and ensure agreement.
Overview of result
Table 14.1 (table summarizes results on achieving agreement in distributed systems under
different failure modes for both synchronous and asynchronous systems.)gives an overview of
the results and lower bounds on solving the consensus problem under different assumptions. It
is worth understanding the relation between the consensus problem and the problem of
attaining common knowledge of the agreement value. For the “no failure” case, consensus is
attainable. Further, in a synchronous system, common knowledge of the consensus value is
also attainable, whereas in the asynchronous case, concurrent common knowledge of the
consensus value is attainable. Consensus is not solvable in asynchronous systems even if one
process can fail by crashing. To circumvent this impossibility result, weaker variants
6|Page
Unit-4/Distributed Computing
In a failure-free system, consensus can be achieved relatively easily by having processes share
their information, compute a decision collectively, and then disseminate that decision across
the system. Here's a breakdown of how this process works:
Mechanism:
7|Page
Unit-4/Distributed Computing
▪ Max function: The decision is the maximum value among all the
broadcast values.
▪ Min function: The decision is the minimum value among all the
broadcast values.
2. Distribution of Decision:
o Once the decision is made, it must be distributed to all processes to ensure that
the system reaches agreement.
o This can be done through different communication protocols:
▪ Token Circulation: Processes are arranged in a logical ring, and a token
is passed around the ring carrying the decision.
▪ Three-phase Tree-based Broadcast–Convergecast–Broadcast:
Processes form a tree-like structure where decisions are broadcast,
collected, and re-broadcast.
▪ Direct Communication: Each process communicates directly with all
other processes.
Asynchronous Systems:
8|Page
Unit-4/Distributed Computing
Steps:
1. Broadcasting: In each round, process Pi broadcasts its value xi to all other processes,
if it hasn’t broadcast it before.
2. Receiving and Updating: Each process updates its local value to the minimum of its
own value and the values it receives from other processes in that round.
3. Termination: After f+1 rounds, each process outputs its local value as the consensus
value.
Algorithm 14.1 is a consensus algorithm for n processes that can tolerate up to f crash failures.
The algorithm runs for f+1 rounds to ensure that consensus is reached even if processes crash.
Each process starts with an initial value and, in each round:
9|Page
Unit-4/Distributed Computing
Conditions:
• Agreement: At least one round will be failure-free. In that round, all non-failed
processes will broadcast and update to the same minimum value, ensuring they agree
on the same consensus value.
• Validity: Crashed processes only send correct values before crashing.
• Termination: After f+1rounds, each non-failed process outputs its final value as the
consensus.
In a system of n processes, the Byzantine agreement problem (as also the other variants of the
agreement problem) can be solved in a synchronous system only if the number of Byzantine
10 | P a g e
Unit-4/Distributed Computing
This establishes that to solve the Byzantine agreement problem, the number of processes n
must be greater than 3f, meaning that the system requires more than three times the number
of potentially faulty processes to achieve consensus.
In the first round, the commander Pc sends its value to the other three lieutenants, as shown by
dotted arrows. In the second round, each lieutenant relays to the other two lieutenants the value
it received from the commander in the first round. At the end of the second round, a lieutenant
takes the majority of the values it received (i) directly from the commander in the first round,
and (ii) from the other two lieutenants in the second round. The majority gives a correct
estimate of the “commander’s” value. Consider Figure 14.4(a) where the commander is a
traitor. The values that get transmitted in the two rounds are as shown. All three lieutenants
take the majority of (1, 0, 0) which is “0,” the agreement value.
11 | P a g e
Unit-4/Distributed Computing
In Figure 14.4(b), lieutenant Pd is malicious. Despite its behavior as shown, lieutenants Pa and
Pb agree on “0,” the value of the commander.
12 | P a g e
Unit-4/Distributed Computing
• For each process in the system (except the commander), the process receives a
value from the commander or other processes, and it forwards this value to others.
This continues as long as f>0, and at each level, the recursion depth decreases as f
is reduced.
13 | P a g e
Unit-4/Distributed Computing
• Once the recursion reaches f=0, each process stops forwarding values. Instead, it
looks at the values it received from other processes.
• Each process then uses majority voting to decide the final value:
o If a process received conflicting values (e.g., 1 from one source, 0 from
another), it uses the majority value.
o If no value was received from a particular process, the algorithm assumes a
default value (usually 0).
Real-World Applications:
1. Blockchain Systems: Blockchain platforms like Hyperledger Fabric use Byzantine
fault-tolerant algorithms (such as PBFT) to ensure consensus on transaction blocks,
even in the presence of malicious nodes.
The Byzantine Agreement Tree Algorithm is an approach used to solve the Byzantine
Agreement problem in synchronous systems by ensuring that non-faulty nodes agree on the
same value despite a subset of faulty nodes.
14 | P a g e
Unit-4/Distributed Computing
15 | P a g e
Unit-4/Distributed Computing
Explanation:
16 | P a g e
Unit-4/Distributed Computing
Algorithm:
17 | P a g e
Unit-4/Distributed Computing
4)Phase-King algorithm:
Objective:
18 | P a g e
Unit-4/Distributed Computing
19 | P a g e
Unit-4/Distributed Computing
Complexity:
The algorithm requires f +1 phases with two sub-rounds in each phase, and (f +1)( n−1)(n+1)
messages
20 | P a g e
Unit-4/Distributed Computing
Rollback recovery works by saving the state of processes periodically (called checkpoints), so
that in the event of failure, the system can restart from the last saved state, minimizing the
loss of work.
Key Concepts:
Example: A multiplayer game where all players' states are saved at the same time to
ensure no player has an inconsistent game state after a crash.
Example:
By using logs and message replay, the system avoids unnecessary rollbacks and maintains
efficiency during recovery.
Example:
• In a stock trading platform, the system logs non-deterministic events like stock price
updates and user trade actions. If a process fails, it rolls back to the last checkpoint and
replays the logged events (e.g., received stock price updates) to restore the state without
losing trades made by users.
• Online Transaction Processing in banking system(fund transfers, deposits, and
payments)
21 | P a g e
Unit-4/Distributed Computing
The primary focus is on ensuring reliability and availability in distributed systems, where faults
can occur. Different techniques, like coordinated checkpointing and log-based rollback
recovery, aim to mitigate the trade-offs of checkpointing and avoid the domino effect.
Checkpointing:
• Local checkpoint: Each process periodically saves its state, known as a local
checkpoint, which is stored on stable storage for recovery in case of failure.
• Checkpointing is represented by the symbol “ |” in process-line diagrams.
• A process may maintain multiple checkpoints or a single one at a time, depending on
the method used. It can roll back to any of its checkpoints if necessary.
Rollback-Recovery Protocols:
• These protocols ensure that, after a failure, the system's internal state is consistent
with its observable behavior before the failure.
• The protocols maintain information about both internal interactions (between
processes) and external interactions (with the outside world).
• A system is considered to recover correctly if its state after recovery is consistent with
its behavior prior to the failure.
A global state in a distributed system is the collection of all individual process states and the
states of communication channels. A consistent global state is one that could occur during a
22 | P a g e
Unit-4/Distributed Computing
• Consistent state (a): Message m1 has been sent but not yet received, which is still valid
since every received message has a corresponding send event.
• Inconsistent state (b): Process P2 has received m2, but P1’s state does not show that
it sent the message, making the state inconsistent.
A consistent global checkpoint is one where no message is received by any process before
the sender of that message has taken its local checkpoint.[Make sure no receive(M1) event
without a send(M1)]
Distributed applications often interact with external entities (like printers or ATMs) to either
receive inputs or deliver outputs. However, these external entities, referred to as outside world
processes (OWPs), cannot roll back actions if a failure occurs. For example, once a printer
prints a character or an ATM dispenses cash, those actions cannot be undone.
• Output Commit Problem: Before sending output to the OWP, the system must ensure
that the state from which the output is sent can be recovered after any future failure.
• Handling Input Messages:(Input reproducibility) Input messages from the OWP
might not be reproducible during recovery, so these must be saved to stable storage
before the application processes them. This ensures they can be replayed in case of a
failure.
23 | P a g e
Unit-4/Distributed Computing
In a distributed system with multiple processes that communicate through messages, a process
failure and subsequent recovery can lead to inconsistencies in the handling of messages. These
inconsistencies arise because the rollback of processes for recovery may involve undoing the
sending or receiving of certain messages. The messages affected by the rollback fall into
several categories, as illustrated in the example from Figure 13.3
1. In-Transit Messages:
• These are messages that have been sent but not yet received at the time of recovery. In
Figure 13.3, messages m2 and m5 are examples of in-transit messages.
• In-transit messages do not cause inconsistencies as long as the system guarantees their
delivery(reliable). In systems with reliable communication channels, they must be
delivered after recovery. In systems with lossy communication channels, they might be
lost.
2. Lost Messages:
• These occur when the sending of a message is not undone by the rollback, but the
receiving of it is undone. In this case, the sender does not roll back to a state before the
send operation, while the receiver does roll back to a state before it receives the
message.
• In the example, m1 is a lost message because its receipt was undone during recovery.
3. Delayed Messages:
• These are messages whose receipt is not recorded because the receiving process was
either down or the message arrived after the rollback.
• For instance, m2 and m5 in the example are delayed messages. They arrived after the
receiving process had rolled back.
24 | P a g e
Unit-4/Distributed Computing
• Orphan messages occur when the receipt of a message is recorded, but the send
operation is not because the sending process rolled back to a state before it sent the
message.
• Orphan messages do not arise if the system is rolled back to a consistent global state
(where all process states are aligned).
5. Duplicate Messages:
• These occur due to message logging and replaying during process recovery. After a
rollback, both the sending and receiving of a message may be undone. When the sender
process restarts, it resends the message, potentially causing a duplicate.
• In the example, m5 could become a duplicate if it is resent after rollback.
In the context of failure recovery in a distributed system, several issues arise regarding message
handling when processes fail and rollback to a prior checkpoint.
During recovery, the system must handle various types of messages that are left in abnormal
states, such as delayed, lost, orphan, and duplicate messages.
The below scenario involves three processes Pi, Pj, and Pk exchanging messages over fault-
free, FIFO communication channels. The failure recovery issues are illustrated in Figure 13.4,
where process Pi fails, and the system must restore consistency.
Key Points:
1. Process Rollback and Orphan Messages:
o After Pi fails, it rolls back to checkpoint Ci1, and Pj rolls back to Cj1 to handle
the orphan message H (receive event recorded at Pj, but send event undone at
Pi).
o Pk then rolls back to Ck1 to eliminate another orphan message I created by Pj
’s rollback.
2. Types of Messages:
25 | P a g e
Unit-4/Distributed Computing
o
Normal Messages: These are messages whose send and receive events have
been recorded, and they remain consistent after the rollback. In this example,
A, B, and J are normal messages.
o Vanished Messages: Messages whose send and receive events have been fully
undone by the rollback. In the example, G, H, and I are vanished messages.
o Delayed Messages: Messages that were in transit during the failure. For
instance, C was in transit and might arrive at different times during or after
recovery. Each scenario needs proper handling.
o Lost Messages: Messages whose send events are recorded but the receive
events are undone due to the rollback. D is a lost message because it was sent
by Pj, but the receive event at Pi was undone.
o Delayed Orphan Messages: Messages like E and F, where the send events
have been undone by rollback but may still arrive at their destination. These
must be discarded since the system will regenerate them.
3. Handling of Problematic Messages:
o Lost Messages: Process logs are used to replay lost messages, but this can lead
to duplicate messages, such as when Pj resends J, which Pk has already
received.
o Duplicate Messages: Duplicate messages arise from the replay of logs during
recovery and must be handled to maintain system consistency.
o Delayed Messages: Messages in transit during failure, like C, must be carefully
managed depending on when they arrive (before, during, or after recovery).
4. Overlapping Failures:
o Overlapping failures complicate the recovery process. For example, if Pj starts
recovery in response to Pi 's failure but then fails itself, it may "forget" about Pi
's failure. This introduces further inconsistencies, requiring mechanisms to
address overlapping failures.
----------------------------------------------------------------------------------------------------------------
Checkpoint-based Recovery
This method does not depend on the Piecewise Deterministic (PWD) assumption
and avoids logging or replaying non-deterministic events. While simpler than log-
26 | P a g e
Unit-4/Distributed Computing
1. Uncoordinated Checkpointing:
Each process independently decides when to take checkpoints without coordinating with other
processes. This autonomy reduces runtime overhead and allows processes to checkpoint at
convenient times. However, it introduces several issues:
• Domino Effect: During recovery, multiple processes might need to roll back to earlier
checkpoints, potentially losing a significant amount of useful work.
• Slow Recovery: Processes must iterate through checkpoints to find a globally
consistent state, increasing recovery time.
• Useless Checkpoints: Some checkpoints may never contribute to recovery and incur
unnecessary overhead.
• Multiple Checkpoints: Processes must store several checkpoints and periodically run
garbage collection to discard obsolete ones.
• Not Suitable for Frequent Output Commit Applications: These applications require
global coordination, reducing the benefits of autonomous checkpointing.
Figure 13.5: Checkpoint index and checkpoint interval, illustrates the dependencies between
checkpoints in a system with uncoordinated checkpointing.
27 | P a g e
Unit-4/Distributed Computing
2.Coordinated Checkpointing:
Types:
i)Blocking
ii)Non-blocking
1. Initiation: The coordinator initiates checkpointing by taking its checkpoint and broadcasting a
request for all processes to the checkpoint.
2. Checkpoint and Acknowledgment: Upon receiving this request, each process halts execution,
flushes its communication channels (to clear any in-transit messages), takes a tentative
checkpoint, and sends an acknowledgment to the coordinator.
3. Commit Phase: Once the coordinator gathers acknowledgments from all processes, it sends a
commit message. Each process then finalizes the checkpoint (making the tentative one
permanent), discards any old checkpoints, and resumes normal execution.
In a distributed system, different processes communicate with each other. If one process, say
P0, sends a message after it takes a checkpoint, but another process, say P1, hasn't yet received
the checkpoint request, then P1 might receive the message before taking its checkpoint. This
leads to an inconsistent state:
• Process P0's checkpoint will not show the message as having been sent.
• Process P1's checkpoint will show the message as received.
29 | P a g e
Unit-4/Distributed Computing
This inconsistency complicates recovery, as different processes have different views of what
has happened in the system.
Solution to the above inconsistent checkpoints is available for both FIFO and non-FIFO
channels.
The problem can be avoided if the communication channels between processes are FIFO
(First-In-First-Out). In this case:
• Each process sends a checkpoint request (marker) before sending any message after
its checkpoint.
• When a process receives the checkpoint request, it takes its own checkpoint before
processing any further messages.
1. An initiator process begins the checkpointing by taking a checkpoint and sending a marker
to all the other processes it communicates with.
2. When a process receives this marker, it:
o Takes a checkpoint (if it hasn’t already),
o Sends a marker to all the processes it communicates with before sending any other
messages.
3. The system continues to operate while the checkpointing process occurs in the background,
ensuring that each process records a consistent global state.
This protocol works seamlessly when channels are reliable and FIFO, meaning messages are received
in the order they were sent.
30 | P a g e
Unit-4/Distributed Computing
3. Communication-induced checkpointing:
i)Model-based checkpointing
ii)Index-based checkpointing
i)Model-based checkpointing:
How it works:
Processes monitor their communication patterns and take forced checkpoints whenever there's a risk
of creating an inconsistent recovery state. No coordination messages are exchanged; instead, decision-
making happens locally based on system behavior.
31 | P a g e
Unit-4/Distributed Computing
Key Idea:
A process takes a forced checkpoint when it detects that its communication might lead to
an inconsistent state later. This ensures that checkpoints always lead to a domino-effect-free
recovery, where only necessary rollbacks happen.
In the MRS model (Mark, Send, and Receive), the goal is to avoid the domino effect by
ensuring that message-sending events always occur before message-receiving events within
the same checkpoint interval.
Example: :
Imagine a group of people playing a multiplayer online game. Every time one player fights
another, the game ensures both players’ progress is saved (checkpointed) so that if one player
disconnects, both players can resume from the same point after recovery. The game
automatically forces checkpoints before big events like battles, ensuring the game doesn’t
get confused about the state of each player.
Best for:
Systems with complex communication patterns where dynamic, real-time interactions
between processes need to be closely monitored, like multiplayer gaming or real-time
simulations.
2. Index-Based Checkpointing:
• How it works:
Checkpoints are assigned increasing numbers (indexes), and processes force
checkpoints when they receive a message with a higher index than their current
checkpoint. This way, all processes maintain checkpoints that are globally consistent.
• Key Idea:
Processes only take forced checkpoints when their checkpoint index is lower than the
one attached to a received message, ensuring all processes stay in sync. This reduces
unnecessary checkpoints.
• Example (easy explanation):
Imagine several bank branches processing transactions. Each branch takes a
snapshot (checkpoint) after a few transactions. If one branch finishes processing
and sends an update (with a checkpoint number) to another branch, the receiving
branch checks its own checkpoint number. If its checkpoint is behind, it takes a
new one to keep both branches on the same page. This prevents confusion about
which transactions have been processed if a failure happens.
• Best for:
Systems where checkpoints can be tracked using a simple number system, like
transactional databases or financial systems.
32 | P a g e
Unit-4/Distributed Computing
Types of Checkpoints:
Assumptions:
1. Initiation: A process (Pi) initiates the algorithm by taking a tentative checkpoint and
sends requests to all other processes to do the same.
2. Response Handling: Each process either takes a tentative checkpoint and informs Pi
of success, or it replies with a "no" if it cannot take a tentative checkpoint.
o If Pi gets success responses from all processes, it decides to make all
checkpoints permanent.
o If any process fails, Pi discards the tentative checkpoints.
1. Decision Broadcast: Pi sends its decision to all processes, informing them whether the
tentative checkpoints should be made permanent or discarded.
2. Action: Upon receiving Pi's decision:
o If the decision is to make the checkpoints permanent, each process converts its
tentative checkpoint to a permanent one.
o If the decision is to discard the checkpoints, each process discards its tentative
checkpoint.
Key Benefits
• Avoids the domino effect: Ensures that processes do not have to roll back repeatedly,
thereby avoiding cascading rollbacks.
• Consistency: All or none of the processes advance their checkpoints, ensuring global
consistency.
33 | P a g e
Unit-4/Distributed Computing
These two aspects ensure that the algorithm avoids situations that could lead to an
inconsistent global state, such as the domino effect or orphan messages.
Optimization
While the protocol guarantees correctness, it may result in unnecessary checkpoints being
taken, which can be inefficient. Checkpointing is computationally expensive, so the goal is
to minimize unnecessary checkpoints.
Fig:13.11Note:Checkpoint Z2 can be avoided as the process is not involved with sending and
receiving of message M.Thus number of checkpoints can be reduced.
34 | P a g e
Unit-4/Distributed Computing
1. Initiation:
o A process Pi initiates the rollback recovery by sending a message to all other
processes in the system, requesting them to check whether they are willing to
restart from their most recent permanent checkpoints.
2. Responses from Processes:
o Each process replies either "yes" or "no":
▪ Yes: The process agrees to roll back to its previous checkpoint.
▪ No: The process is unable or unwilling to roll back, possibly due to being
involved in another ongoing checkpointing or recovery process.
3. Decision:
o If Pi receives "yes" responses from all processes, it decides that all processes
should roll back to their previous checkpoints.
o If any process replies "no", Pi aborts the rollback attempt and may try again
later.
1. Decision Propagation:
o Once Pi makes a decision, it propagates this decision to all other processes:
▪ If the decision is to roll back, the processes must restart from their
previous permanent checkpoints.
▪ If the rollback is aborted, the processes do not change their state, and no
rollback occurs.
2. Process Actions:
o Upon receiving the decision from Pi, each process performs the appropriate
action:
▪ If instructed to roll back, the process rolls back to its last permanent
checkpoint and resumes execution from there.
▪ If the rollback is aborted, the process continues from its current state.
3. Message Sending Restriction:
o While waiting for the rollback decision, processes are not allowed to send
messages related to the underlying computation. This restriction is necessary to
ensure that no inconsistent messages are sent during the rollback process.
Correctness
1. Consistent Restart:
o All processes restart from a consistent global state, as the checkpointing
algorithm ensures that all checkpoints form a consistent set.
o Since either all processes roll back to their previous checkpoints or none do,
no inconsistent state can arise during the recovery process.
2. No Inconsistent Messages:
o The restriction on message sending during the recovery process ensures that no
messages are sent based on an inconsistent state, preventing issues such as
35 | P a g e
Unit-4/Distributed Computing
Assumptions:
The system uses reliable, FIFO-ordered communication channels with infinite buffers.
Message transmission delays are arbitrary but finite.
To facilitate recovery after a process failure and restore the system to a consistent state, two
types of log storage are maintained, volatile log and stable log. Accessing the volatile log takes
less time than accessing the stable log, but the contents of the volatile log are lost if the
corresponding processor fails. The contents of the volatile log are periodically flushed to the
stable storage.
36 | P a g e
Unit-4/Distributed Computing
Algorithm:
37 | P a g e
Unit-4/Distributed Computing
@node X:
@node Z:
Compute rcvd=2
If(sent<recvd)
Then
Explanation:
In the above diagram 13.14 ,when process-Y fails it sends ROLLBACK(Y,2) to process-X and
ROLLBACK(Y,1) to Z.
@X :
if(sent <rcvd)
if(2<3)
So Rollback X rolls back to event ex2.It will change that as checkpoint2 of process Y.
38 | P a g e