0% found this document useful (0 votes)
62 views

Distributed Dbms Advanced Concepts

Distributed transaction management allows transactions to access data across multiple local databases. There are two types of transactions: local transactions involve a single database, while global transactions involve multiple databases. A global transaction coordinator divides global transactions into sub-transactions and distributes them to local transaction managers. The local managers execute the sub-transactions and return results to the coordinator. Distributed concurrency control and deadlock detection are more complex than in a centralized system due to the need to coordinate across multiple sites.

Uploaded by

stevenkanyanjua1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Distributed Dbms Advanced Concepts

Distributed transaction management allows transactions to access data across multiple local databases. There are two types of transactions: local transactions involve a single database, while global transactions involve multiple databases. A global transaction coordinator divides global transactions into sub-transactions and distributes them to local transaction managers. The local managers execute the sub-transactions and return results to the coordinator. Distributed concurrency control and deadlock detection are more complex than in a centralized system due to the need to coordinate across multiple sites.

Uploaded by

stevenkanyanjua1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Distributed Transaction Management.

● Just like in centralized DBMS, ACID properties of any transaction executed


in a distributed DBMS must be maintained.

● There are two types of transactions in distributed DBMS:


● Local transactions - involve read, write, and update operations on any data
in only one local database.

● Global transactions – involve read, write, and update operations on data in


many such local databases(other local databases in the distributed system).
Distributed Transaction Management.

● The local databases in a distributed site all have transaction manager


modules, scheduler modules, recovery manager modules, and buffer
manager modules to handle transactions, concurrency control, and
recovery, just as in centralized DBMSs.
● In addition to these, there is a global transaction manager/transaction
coordinator to coordinate the various transactions on the site, both global
and local.
● There is also a data communications component to handle communications
between various sites.
Distributed Transaction Management.
Distributed Transaction Management.
How is a global transaction executed?
- Transaction coordinator at the first site starts the execution of every
transaction at that site(S1).
- Transaction coordinator divides these transactions into sub-transactions.
- Data communications component distributes these sub-transactions to the
other sites(S2, S3 … Sn).
- Transaction coordinators at the other sites manage the execution of these
transactions.
- Results are transmitted back to the first transaction coordinator via the data
communications component.
Distributed Concurrency Control.
Reasons why concurrency control is needed in DDBMS:
- - To be able to handle all types of failures.
- To satisfy system requirements by allowing parallelism.
- Incur modest computational and storage overhead.
- Perform satisfactorily in a network environment that has a significant
communication delay
- Place few constraints on the structure of atomic actions.

Distributed Concurrency Control.
● Serializability
● In a distributed DBMS, we have two types of schedules:
● 1. Local schedules – present at each local database
● 2. Global schedules – this is the union of all local schedules on the site.

● If data in a DDBMS is not replicated to other sites serializability can be


achieved in the same way in a centralized DBMS.
● However, if data is replicated, the schedules in the various local databases
might be serializable, but mutual consistency if the entire database may not
be guaranteed.

Distributed Concurrency Control.

● Serializability
● Mutual consistency means that all the values of a replicated data item are
identical.
● So, for a global schedule to be serializable, it must satisfy the following
conditions:
○ 1. Local schedules must be serializable
2. Two conflicting operations should be in the same relative order in all
of the local schedules they appear in.
○ 3. A Transaction needs to be run on each site whenever there is a
replicated data item.
Distributed Concurrency Control.
● Serializability
● For example, we may have two sites which are replicating a data value x:
● T₁: Read(x)
● x←x+5
● Write(x)
● T₂: Read(x)
● x ← x * 10
● Write (x)
Cont…

● Both transactions need to run on both sites, so these are the possible
schedules that might have been produced:
● Site 1: S₁ = {R₁(x), W₁(x), R₂(x), W₂(x)}
● Site 1: S₁ = {R₂(x), W₂(x), R₁(x), W₁(x)}
● Both schedules are more or less correct in their various contexts, but they
produce different results and thus they violate mutual consistency.

Locking Protocols

● Centralized 2PL
● A single site maintains all the locking information and there is only one lock
manager that can grant and release locks.
● This type of protocol is initiated as follows:
● The TC at S1 creates sub-transactions and ensures consistency is
maintained. If a data item is being updated and has replicas, all its replicas
must also be updated. The coordinator requests exclusive locks on all copies,
updates them, and then releases the locks.

Cont…

● The local transaction managers involved in the global transaction request


and release locks from the centralized lock manager using the normal rules
for two-phase locking.
● The centralized lock manager checks that a request for a lock on a data item
is compatible with the locks that currently exist. If it is, the lock manager
sends a message back to the originating site acknowledging that the lock has
been granted. Otherwise, it puts the request in a queue until the lock can be
granted.

Locking Protocols
● Centralized 2PL
● Advantages
- - Simple implementation.
- Simple deadlock handling.
● Disadvantages
○ - Lock manager site becomes a bottleneck.
○ - System is vulnerable to lock manager site failure.

Locking Protocols
● Primary copy 2PL
● One replica of a data item is chosen as the primary copy, and all other
replicas become slave values
● The site containing this primary value is called the primary site and all other
sites containing the replicas are called the slave sites.
● The primary site doesn’t need to host the primary value.
● When a transaction needs to update a data item, it requests the lock at the
primary site of the data item. If the lock is granted, it implicitly gets locks on
all its replicas. If not, the request is delayed until it is granted.

Locking Protocols
● The primary copy can then be updated, and the change propagated to all
slave copies.
● The propagation should be carried out ASAP to prevent any further
transactions from reading out-of-date values.

● Primary Value 2PL
● Disadvantages
○ - Deadlock handling is more complex due to multiple lock managers.
○ - If the primary site fails, the primary value of a data item is inaccessible,
even if all other replicas are available.

Locking Protocols
● Distributed 2PL
● Each site has its own lock manager which is responsible for managing locks
on the site’s data items.
● If the data is not replicated, the protocol is similar to the primary value 2PL.
● If data is replicated, distributed 2PL implements a read-one-write-all
(ROWA) replica control protocol. This means that any copy of a replicated
item can be used for a read operation, but all copies must be exclusively
locked before an item can be updated.



Locking Protocols
● Distributed 2PL
● Advantages
○ - Work is distributed and can be robust during failure,
● Disadvantages
○ - Deadlock detection is more complicated due to multiple lock managers.

Locking Protocols
● Majority locking
● This protocol is an extension of distributed 2PL to avoid the need to lock all copies of a replicated item
before an update. Again, the system maintains a lock manager at each site to manage the locks for all data
at that site.

● When a transaction wishes to read or write a data item that is replicated at n sites, it must send a lock
request to more than half of the n sites where the item is stored.

● The transaction cannot proceed until it obtains locks on a majority of the copies. If the transaction does
not receive a majority within a certain timeout period, it cancels its request and informs all sites of the
cancellation. If it receives a majority, it informs all sites that it has the lock.

● Any number of transactions can simultaneously hold a shared lock on a majority of the copies; however,
only one transaction can hold an exclusive lock on a majority of the copies.


Locking Protocols
● Majority locking
● Advantages
○ Avoids the drawbacks of centralized control.
● Disadvantages
○ The protocol is more complicated, and deadlock detection is more
complex.

Distributed Deadlock Management.

Any locking-based concurrency control algorithm (and some timestamp


based algorithms that require transactions to wait may result in deadlocks.

In a distributed environment deadlock detection may be more complicated if


lock management is not centralized.
Example (A timeline of actions taken by 3 transactions)
Wait Graphs for Individual Sites
Combined Wait Graphs

This Demonstrates that in a DDBMS it’s also necessary to construct a global WFG
that is the union of all local WFGs.
Deadlock Detection

3 common methods

- Centralized
- Hierarchical
- Distributed
Centralized

● A single site is appointed as the Deadlock Detection Coordinator (DDC).


● The DDC has the responsibility of construction and maintaining the global
WFG
● Each lock manager periodically transmits its local WFG to the DDC. The
DDC builds the global WFG and checks for cycles.
● If cycles exist, the DDC breaks the cysces by selecting transactions to be
rolled back and restarted. All sites involved are informed of the rollbacks
and restarts.
● Disadvantage of this is that failure of the central site would cause problems.
Hierarchical Deadlock Detection.
- The sites within a distributed network are organized into a hierarchical
structure consisting of different levels.
- Each level represents a specific degree of responsibility and authority in the
deadlock detection process.
- A the lowest levels (individual listes, deadlock detection is performed). Each
site is responsible for detecting deadlocks within its own resources and
transactions, This helps identify deadlocks on a smaller scale.
- At the highest level we have the global deadlock detector. It identifies
deadlocks that span across the entire distributed network.
- This approach reduces need for frequent communication between all sites
for deadlock detection. Each site sends its local WFG to its parent node in
the hierarchy.
Hierarchical Deadlock Detection.

- While this is more efficient in terms of communication, it introduces


complexity with implementing, maintaining the structure as well as
handling failure at different levels of the hierarchy.
Distributed Deadlock Detection
- One of the distributed deadlock algorithms involves introducing an
external node called Text into the local WFG for each site.
- When a transaction at one site creates an agent at another site, an edge is
added to the local WFG from the transaction to the T ext node.
- Simultaneously, an edge is added to the local WFG at the receiving site
from the T ext node to the newly created agent.
Distributed Deadlock Detection
Distributed Deadlock Detection
● If a local WFG at a site contains a cycle that does not involve the T ext node, it
indicates a deadlock situation confined to that site.
● In such cases, the local site can autonomously resolve the deadlock by rolling back
and restarting one or more transactions along with their associated agents.
● A global deadlock is potentially present if the local WFG contains a cycle that includes
the T ext node.
● However, the existence of such a cycle does not confirm a global deadlock because
the T ext nodes may represent different agents.
● To determine the presence of a global deadlock, the local WFGs from various sites
need to be merged into a global WFG.
● To avoid unnecessary communication between sites, a timestamp-based strategy is
employed.
● Each transaction is assigned a timestamp, and a rule is established that allows a site
to transmit its local WFG to another site only if the timestamp of the waiting
transaction is greater than the timestamp of the transmitting transaction.
● This prevents sites from needlessly exchanging their WFGs.
Distributed Deadlock Detection

This global WFG contains a cycle that does not involve the T ext node (T 3 → T 1 →
T 2 → T 3 ), so we can conclude that deadlock exists and an appropriate recovery
protocol must be invoked.
Distributed Deadlock Detection

● Local sites transmit their local WFGs up the hierarchy to check for
deadlock.
● Sites receiving these WFGs merge the information into their local WFGs
and check for cycles, excluding the T ext node.
● If a cycle is detected, it signifies a deadlock. In this case, one or more
transactions are rolled back and restarted, along with their agents.
● If the entire global WFG is constructed without detecting a cycle, it
indicates the absence of deadlock in the entire system.
● Distributed deadlock detection methods are potentially more robust than
centralized or hierarchical methods.
● However, they require significant intersite communication to coordinate
deadlock detection and resolution across distributed sites.
Distributed DBMS - Database Recovery

● Distributed recovery is more complicated than centralized database recovery because


failures can occur at the communication links or a remote site.

● However, in a DDBS, besides traditional failures types which occur in a centralized


system (such as media and power failures), new types of failures may occur, e.g.,
communication link failure, network partitioning, delayed/lost messages and remote
site failure
● There are four types of failure that are particular to distributed DBMSs:
● • transaction failure
● • site failure
● • media failure
● • communication failure.
1. Transaction Failures

Transactions can fail for a number of reasons. Failure can be due to an error in
the transaction caused by incorrect input data as well as the detection of a
present or potential deadlock.
Furthermore, some concurrency control algorithms do not permit a
transaction to proceed or even to wait if the data that they attempt to access
are currently being accessed by another transaction.
This might also be considered a failure.The usual approach to take in cases of
transaction failure is to abort the transaction, thus resetting the database to
its state prior to the start of this transaction.
2. Site (System) Failures

The reasons for system failure can be traced back to a hardware or to a software failure.
The system failure is always assumed to result in the loss of main memory contents.
Therefore, any part of the database that was in main memory buffers is lost as a result of
a system failure. However, the database that is stored in secondary storage is assumed
to be safe and correct.
In distributed database terminology, system failures are typically referred to as site
failures, since they result in the failed site being unreachable from other sites in the
distributed system.
We typically differentiate between partial and total failures in a distributed system.
Total failure refers to the simultaneous failure of all sites in the distributed system;
partial failure indicates the failure of only some sites while the others remain
operational.
3. Media Failures
Media failure refers to the failures of the secondary storage devices that store
the database. Such failures may be due to operating system errors,as well as to
hardware faults such as head crashes or controller failures.
The important point is that all or part of the database that is on the secondary
storage is considered to be destroyed and inaccessible. Duplexing of disk
storage and maintaining archival copies of the database are common
techniques that deal with this sort of catastrophic problem.
Media failures are frequently treated as problems local to one site and
therefore not specifically addressed in the reliability mechanisms of
distributed DBMSs.
4. Communication Failures
There are a number of types of communication failures. The most common ones are the
errors in the messages, improperly ordered messages, lost messages, and communication line
failures. The first two errors are the responsibility of the computer network; we will not
consider them further.
Therefore, in our discussions of distributed DBMS reliability, we expect the underlying
computer network hardware and software to ensure that two messages sent from a process
at some originating site to another process at some destination site are delivered without
error and in the order in which they were sent
. Lost or undeliverable messages are typically the consequence of communication line failures
or (destination) site failures. If a communication line fails, in addition to losing the message(s)
in transit, it may also divide the network into two or more disjoint groups.
This is called network partitioning. If the network is partitioned, the sites in each partition
may continue to operate. In this case, executing transactions that access data stored in
multiple partitions becomes a major issue.
Figure below shows an example of network partitioning in which following the failure of the link connecting
sites S1 and S2, sites (S1, S4, S5) are partitioned from sites (S2, S3). In some cases it is difficult to distinguish
whether a communication link or a site has failed. For example, suppose that site S1 cannot communicate
with site S2 within a fixed (timeout) period. It could be that:

• site S2 has crashed or the network has gone down;

• the communication link has failed;

• the network is partitioned;

• site S2 is currently very busy and has not had time to respond to the message.
How failure affect recovery
As with local recovery, distributed recovery aims to maintain the atomicity and
durability of distributed transactions. To ensure the atomicity of the global transaction,
the DDBMS must ensure that subtransactions of the global transaction either all
commit or all abort.
If the DDBMS detects that a site has failed or become inaccessible, it needs to carry out
the following steps:
•Abort any transactions that are affected by the failure.
•Flag the site as failed to prevent any other site from trying to use it.
•Check periodically to see whether the site has recovered or, alternatively, wait for the
failed site to broadcast that it has recovered.
•On restart, the failed site must initiate a recovery procedure to abort any partial
transactions that were active at the time of the failure.
•After local recovery, the failed site must update its copy of the database to make it
consistent with the rest of the system.
Two phase
commit
Introduction

A distributed database is in many ways more complex than a


centralized one. Some of the failures that may be experienced in a
distributed DBMS include:

-Loss of message

-Failure at a site which is hosting a subtransaction

-Failure of communication link


2 phase commit protocol

In a distributed transactions,all global transactions must satisfy


atomicity principle so as to ensure consistency after updating records
on all sites

This protocol ensures all the sites involved in a global transaction are
ready to commit before all changes become permanent.

They are two modules in this protocol:

Coordinator/transaction manager

participant/resource manager
2 phase commit protocol

A 2PC protocol consists of two main phases:

Voting protocol.

Decision protocol.
2 phase commit protocol

Voting protocol- this is where the participant sites decide whether


they are ready to commit a transaction or not.

Decision protocol - this is where the coordinator site decides


whether the transaction should be committed or aborted.
Voting Phase
Example:

A transaction T is to be initiated at site s1,s2,s3,s4.

The coordinating site-s1

Participating site-s2,s2,s4

The coordinator site writes [T,Prepare](log) to its log and sends this message
to all other participating sites indicating that it is ready to commit.

If a site is ready it writes [T,Ready](log) to its log and sends the message to the
coordinator site.

If a site is ready it writes [T,Not Ready](log) to its log and sends the message to
the coordinator site as shown in the previous diagram.
Voting phase
Decision phase

If the coordinator site receives [Ready,T] message from all


participants, it writes [T,Commit](log) to its own log and sends the
message to the participants sites. The participant sites write the
same message into their logs and finally perform the commit.

If the coordinator site receives [Not Ready,T] message from any one
participant, it writes [T,Abort](log) to its own log and sends the
message to the participants sites. The participant sites write the
same message into their logs and finally perform abort the commit
process. All this is done to ensure ATOMIC commitments of
transactions always.
Decision phase
Decision Phase
Key takeaways:

The coordinator aborts a transaction commit if and only of at least one participant
votes to abort it.

A coordinator commits a transaction if and only if all the participant sites vote to
commit it.
Termination Protocols for 2pc

Termination protocol is invoked when a coordinator or participant


times out due to not receiving an expected message.

For coordinators in states WAITING and DECIDED:

Timeout in WAITING state: Coordinator waiting for all participants'


votes, can't commit but can decide to globally abort.

Timeout in DECIDED state: Coordinator waiting for


acknowledgments, resends global decision to sites not
acknowledged.
Termination in a participant
Thee simplest termination protocol is to leave the participant process blocked until
communication with the coordinator is re-established

For participants in states INITIAL and PREPARED:

Timeout in INITIAL state: Participant waits for a PREPARE message, can


unilaterally abort the transaction, or send an ABORT message to the coordinator.

Timeout in PREPARED state: Participant waits for a global commit/abort


instruction, unable to change its vote. Can use the cooperative termination
protocol to contact other participants for the decision.

Cooperative termination protocol reduces blocking likelihood but doesn't


eliminate it entirely.

If the coordinator fails and all participants detect it, they can elect a new
coordinator to resolve the block.
Recovery process for 2pc
They are three stages for failure of the coordinator .They include:

1)Failure in INITIAL state:

-Coordinator hasn't started the commit procedure

-Recovery initiates the commit procedure.

2)Failure in WAITING state:

-Coordinator sent the PREPARE message.

-Hasn't received all responses or abort responses.

-Recovery restarts the commit procedure.

3)Failure in DECIDED state:

-Coordinator instructed participants to globally abort or commit.

-If all acknowledgments received, successful completion.

-Otherwise, initiate the previously discussed termination protocol.


Participant Failure
1)Failure in INITIAL State:

-Participant hasn't voted on the transaction.

-Can unilaterally abort the transaction on recovery.

-Coordinator can't make a global commit decision without the participant's vote.

2)Failure in PREPARED State:

-Participant has sent its vote to the coordinator.

-Recovery follows the previously discussed termination protocol.

3)Failure in ABORTED/COMMITTED States:

-Participant has completed the transaction.

-No further action is needed upon restart.


Election Protocols

● If the participants detect the failure of the coordinator (by timing out), they can elect a new site to act as
coordinator.

● One election protocol is for the sites to have an agreed-upon linear ordering.
● We assume that site Si has order i in the sequence,
● The lowest being the coordinator, and that each site knows the identification and ordering of the other
sites in the system, some of which may also have failed
Cont…

● One election protocol asks each operational participant to send a message to the sites with a greater
identification number.

● That means, site Si would send a message to sites Si+1, Si+2, . . . , Sn in that order.
● If a site Sk receives a message from a lower-numbered participant, then Sk knows that it is not to be the
new coordinator and stops sending messages.

● This protocol is relatively efficient and most participants stop sending messages quite quickly.
Cont…

● Eventually, each participant will know whether there is an operational participant with a lower number.
● If there is not, the site becomes the new coordinator.
● If the newly elected coordinator also times out during this process, the election protocol is invoked again.
● If there are no operational sites with a lower number, the site forces all higher-numbered sites to let it
become the new coordinator, regardless of whether there is a new coordinator.
Bully Election Algorithm

● Any participant can trigger a request for an election. However, one participant can only issue a singular
request at one point in time. The algorithm operates by identifying all the non-faulty participants and
electing the participant with the largest identifier as the coordinator.

● There can be three kinds of messages that participants would exchange between each other during the
bully algorithm:
a) Election message

b) OK message

c) Coordinator message
Cont…

● Suppose there are n different participants with


unique identifiers ranging from 0 to n-1.

● Given that 5 is the highest ID amongst the nodes, it


is the leader. Assuming that the leader crashes and
node 2 are the first to notice the breakdown of the
leader, the node with ID 2 initiates an election
Cont…

● Accordingly, the participant with ID 2 sends an election


message to all nodes with an ID greater than its own ID.

● Participants 3 and 4 both receive the election message.


However, since 5 is crashed, it does not respond or
receives the ping. Participants 3 and 4 accordingly initiate
election iteratively by broadcasting election messages to
those with IDs greater than their own respective IDs.
Cont…

● Moreover, they respond with an OK message to the


node that sent them a request for election since they
are not the participants with the highest IDs. This
means that 3 and 4 would confirm to participant 2
that they are alive and non-crashed.
Cont…

● Node 4 receives the election message and


accordingly responds with an OK message to node 3
to confirm its operating state. As of the previous
case, node 5 does not respond as it is unavailable.

● Node 4 has already broadcasted an election message


to node 5, and received no response. It simply figures
out that node 5 has crashed, and the new node with
the highest ID is node 4.
Cont…

● Node 4 figures out that it is the node with the


highest ID, then sends a coordinator message to all
of the alive nodes.

● Consecutively, all nodes are updated with the new


leader.
Communication topologies for 2PC

● There are several different communication topologies (ways of exchanging messages) that can be employed
to implement 2PC.
a. Centralized 2PC

● In this topology, all communication is funneled through the coordinator.


Cont…

● A number of improvements to the centralized 2PC protocol have been


proposed that attempt to improve its overall performance, either by
reducing the number of messages that need to be exchanged or by
speeding up the decision-making process.
● These improvements depend upon adopting different ways of exchanging
messages.
b. Linear 2PC

● In linear 2PC, sites are ordered 1, 2, . . . ,n, where site 1 is the


coordinator and the remaining sites are the participants.

● The 2PC protocol is implemented by a forward chain of


communication from coordinator to participant n for the
voting phase and a backward chain of communication from
participant n to the coordinator for the decision phase
Cont…

● Linear 2PC can be improved if the voting process adopts the forward linear chaining of messages while the
decision process adopts the centralized topology, so that site n can broadcast the global decision to all
participants in parallel.
3. Distributed 2PC

● Uses a distributed topology.


● The coordinator sends the PREPARE message to all participants, which in turn send their decision to all
other sites.

● Each participant waits for messages from the other sites before deciding whether to commit or abort the
transaction.

● This in effect eliminates the need for the decision phase of the 2PC protocol, as the participants can reach
a decision consistently, but independently.

You might also like