Distributed Dbms Advanced Concepts
Distributed Dbms Advanced Concepts
● Serializability
● Mutual consistency means that all the values of a replicated data item are
identical.
● So, for a global schedule to be serializable, it must satisfy the following
conditions:
○ 1. Local schedules must be serializable
2. Two conflicting operations should be in the same relative order in all
of the local schedules they appear in.
○ 3. A Transaction needs to be run on each site whenever there is a
replicated data item.
Distributed Concurrency Control.
● Serializability
● For example, we may have two sites which are replicating a data value x:
● T₁: Read(x)
● x←x+5
● Write(x)
● T₂: Read(x)
● x ← x * 10
● Write (x)
Cont…
●
● Both transactions need to run on both sites, so these are the possible
schedules that might have been produced:
● Site 1: S₁ = {R₁(x), W₁(x), R₂(x), W₂(x)}
● Site 1: S₁ = {R₂(x), W₂(x), R₁(x), W₁(x)}
● Both schedules are more or less correct in their various contexts, but they
produce different results and thus they violate mutual consistency.
●
Locking Protocols
● Centralized 2PL
● A single site maintains all the locking information and there is only one lock
manager that can grant and release locks.
● This type of protocol is initiated as follows:
● The TC at S1 creates sub-transactions and ensures consistency is
maintained. If a data item is being updated and has replicas, all its replicas
must also be updated. The coordinator requests exclusive locks on all copies,
updates them, and then releases the locks.
●
Cont…
● When a transaction wishes to read or write a data item that is replicated at n sites, it must send a lock
request to more than half of the n sites where the item is stored.
● The transaction cannot proceed until it obtains locks on a majority of the copies. If the transaction does
not receive a majority within a certain timeout period, it cancels its request and informs all sites of the
cancellation. If it receives a majority, it informs all sites that it has the lock.
● Any number of transactions can simultaneously hold a shared lock on a majority of the copies; however,
only one transaction can hold an exclusive lock on a majority of the copies.
●
Locking Protocols
● Majority locking
● Advantages
○ Avoids the drawbacks of centralized control.
● Disadvantages
○ The protocol is more complicated, and deadlock detection is more
complex.
●
Distributed Deadlock Management.
This Demonstrates that in a DDBMS it’s also necessary to construct a global WFG
that is the union of all local WFGs.
Deadlock Detection
3 common methods
- Centralized
- Hierarchical
- Distributed
Centralized
This global WFG contains a cycle that does not involve the T ext node (T 3 → T 1 →
T 2 → T 3 ), so we can conclude that deadlock exists and an appropriate recovery
protocol must be invoked.
Distributed Deadlock Detection
● Local sites transmit their local WFGs up the hierarchy to check for
deadlock.
● Sites receiving these WFGs merge the information into their local WFGs
and check for cycles, excluding the T ext node.
● If a cycle is detected, it signifies a deadlock. In this case, one or more
transactions are rolled back and restarted, along with their agents.
● If the entire global WFG is constructed without detecting a cycle, it
indicates the absence of deadlock in the entire system.
● Distributed deadlock detection methods are potentially more robust than
centralized or hierarchical methods.
● However, they require significant intersite communication to coordinate
deadlock detection and resolution across distributed sites.
Distributed DBMS - Database Recovery
Transactions can fail for a number of reasons. Failure can be due to an error in
the transaction caused by incorrect input data as well as the detection of a
present or potential deadlock.
Furthermore, some concurrency control algorithms do not permit a
transaction to proceed or even to wait if the data that they attempt to access
are currently being accessed by another transaction.
This might also be considered a failure.The usual approach to take in cases of
transaction failure is to abort the transaction, thus resetting the database to
its state prior to the start of this transaction.
2. Site (System) Failures
The reasons for system failure can be traced back to a hardware or to a software failure.
The system failure is always assumed to result in the loss of main memory contents.
Therefore, any part of the database that was in main memory buffers is lost as a result of
a system failure. However, the database that is stored in secondary storage is assumed
to be safe and correct.
In distributed database terminology, system failures are typically referred to as site
failures, since they result in the failed site being unreachable from other sites in the
distributed system.
We typically differentiate between partial and total failures in a distributed system.
Total failure refers to the simultaneous failure of all sites in the distributed system;
partial failure indicates the failure of only some sites while the others remain
operational.
3. Media Failures
Media failure refers to the failures of the secondary storage devices that store
the database. Such failures may be due to operating system errors,as well as to
hardware faults such as head crashes or controller failures.
The important point is that all or part of the database that is on the secondary
storage is considered to be destroyed and inaccessible. Duplexing of disk
storage and maintaining archival copies of the database are common
techniques that deal with this sort of catastrophic problem.
Media failures are frequently treated as problems local to one site and
therefore not specifically addressed in the reliability mechanisms of
distributed DBMSs.
4. Communication Failures
There are a number of types of communication failures. The most common ones are the
errors in the messages, improperly ordered messages, lost messages, and communication line
failures. The first two errors are the responsibility of the computer network; we will not
consider them further.
Therefore, in our discussions of distributed DBMS reliability, we expect the underlying
computer network hardware and software to ensure that two messages sent from a process
at some originating site to another process at some destination site are delivered without
error and in the order in which they were sent
. Lost or undeliverable messages are typically the consequence of communication line failures
or (destination) site failures. If a communication line fails, in addition to losing the message(s)
in transit, it may also divide the network into two or more disjoint groups.
This is called network partitioning. If the network is partitioned, the sites in each partition
may continue to operate. In this case, executing transactions that access data stored in
multiple partitions becomes a major issue.
Figure below shows an example of network partitioning in which following the failure of the link connecting
sites S1 and S2, sites (S1, S4, S5) are partitioned from sites (S2, S3). In some cases it is difficult to distinguish
whether a communication link or a site has failed. For example, suppose that site S1 cannot communicate
with site S2 within a fixed (timeout) period. It could be that:
• site S2 is currently very busy and has not had time to respond to the message.
How failure affect recovery
As with local recovery, distributed recovery aims to maintain the atomicity and
durability of distributed transactions. To ensure the atomicity of the global transaction,
the DDBMS must ensure that subtransactions of the global transaction either all
commit or all abort.
If the DDBMS detects that a site has failed or become inaccessible, it needs to carry out
the following steps:
•Abort any transactions that are affected by the failure.
•Flag the site as failed to prevent any other site from trying to use it.
•Check periodically to see whether the site has recovered or, alternatively, wait for the
failed site to broadcast that it has recovered.
•On restart, the failed site must initiate a recovery procedure to abort any partial
transactions that were active at the time of the failure.
•After local recovery, the failed site must update its copy of the database to make it
consistent with the rest of the system.
Two phase
commit
Introduction
-Loss of message
This protocol ensures all the sites involved in a global transaction are
ready to commit before all changes become permanent.
Coordinator/transaction manager
participant/resource manager
2 phase commit protocol
Voting protocol.
Decision protocol.
2 phase commit protocol
Participating site-s2,s2,s4
The coordinator site writes [T,Prepare](log) to its log and sends this message
to all other participating sites indicating that it is ready to commit.
If a site is ready it writes [T,Ready](log) to its log and sends the message to the
coordinator site.
If a site is ready it writes [T,Not Ready](log) to its log and sends the message to
the coordinator site as shown in the previous diagram.
Voting phase
Decision phase
If the coordinator site receives [Not Ready,T] message from any one
participant, it writes [T,Abort](log) to its own log and sends the
message to the participants sites. The participant sites write the
same message into their logs and finally perform abort the commit
process. All this is done to ensure ATOMIC commitments of
transactions always.
Decision phase
Decision Phase
Key takeaways:
The coordinator aborts a transaction commit if and only of at least one participant
votes to abort it.
A coordinator commits a transaction if and only if all the participant sites vote to
commit it.
Termination Protocols for 2pc
If the coordinator fails and all participants detect it, they can elect a new
coordinator to resolve the block.
Recovery process for 2pc
They are three stages for failure of the coordinator .They include:
-Coordinator can't make a global commit decision without the participant's vote.
● If the participants detect the failure of the coordinator (by timing out), they can elect a new site to act as
coordinator.
● One election protocol is for the sites to have an agreed-upon linear ordering.
● We assume that site Si has order i in the sequence,
● The lowest being the coordinator, and that each site knows the identification and ordering of the other
sites in the system, some of which may also have failed
Cont…
● One election protocol asks each operational participant to send a message to the sites with a greater
identification number.
● That means, site Si would send a message to sites Si+1, Si+2, . . . , Sn in that order.
● If a site Sk receives a message from a lower-numbered participant, then Sk knows that it is not to be the
new coordinator and stops sending messages.
● This protocol is relatively efficient and most participants stop sending messages quite quickly.
Cont…
● Eventually, each participant will know whether there is an operational participant with a lower number.
● If there is not, the site becomes the new coordinator.
● If the newly elected coordinator also times out during this process, the election protocol is invoked again.
● If there are no operational sites with a lower number, the site forces all higher-numbered sites to let it
become the new coordinator, regardless of whether there is a new coordinator.
Bully Election Algorithm
● Any participant can trigger a request for an election. However, one participant can only issue a singular
request at one point in time. The algorithm operates by identifying all the non-faulty participants and
electing the participant with the largest identifier as the coordinator.
● There can be three kinds of messages that participants would exchange between each other during the
bully algorithm:
a) Election message
b) OK message
c) Coordinator message
Cont…
● There are several different communication topologies (ways of exchanging messages) that can be employed
to implement 2PC.
a. Centralized 2PC
● Linear 2PC can be improved if the voting process adopts the forward linear chaining of messages while the
decision process adopts the centralized topology, so that site n can broadcast the global decision to all
participants in parallel.
3. Distributed 2PC
● Each participant waits for messages from the other sites before deciding whether to commit or abort the
transaction.
● This in effect eliminates the need for the decision phase of the 2PC protocol, as the participants can reach
a decision consistently, but independently.