Distributed Database Systems
Distributed Database Systems
① Delivery Modes
A. A. Pull-only: ▪ The transfer of data from servers to clients is initiated by a client pull.
When a client request is received at a server, the server responds by
locating the requested information.
▪ Updates are carried out at a server without notification to clients.
B. Push- ▪ The transfer of data from servers to clients is initiated by a server push in
only: the absence of any specific request from clients.
▪ The usefulness of server push depends heavily upon the accuracy of a
server to predict the needs of clients.
C. Hybrid: ▪ Combines the client-pull and server-push mechanisms.
▪ The transfer of information from servers to clients is first initiated by a
client pull (by posing the query), and the subsequent transfer of updated
information to clients is initiated by a server push.
② Frequency
A. Periodic: ▪ Data are sent from the server to clients at regular intervals.
▪ Both pull and push can be performed in periodic fashion. Periodic
delivery is carried out on a regular and pre-specified repeating
schedule
B. Conditional: ▪ Data are sent from servers whenever certain conditions are satisfied.
▪ Such conditions can be as simple as a given time span or as
complicated as event-condition-action rules.
▪ Conditional delivery is mostly used in the hybrid or push-only delivery
systems.
C. Ad-hoc or ▪ Ad-hoc delivery is irregular and is performed mostly in a pure pull-
irregular: based system.
▪ Data are pulled from servers to clients in an ad-hoc fashion whenever
clients request it
③ Communication Methods
❖ Promises of DDBSs
For a system to deal with this type of query over a distributed, fragmented and replicated
database, it needs to be able to deal with a number of different types of transparencies:
❖ Client/Server Systems
➢ The general idea is very simple: it distinguishes the functionality that needs to be
provided and divide these functions into two classes: server functions and client
functions.
➢ This provides a two level architecture which makes it easier to manage the complexity of
modern DBMSs and the complexity of distribution.
Distributed Database
Servers architecture
Where each
application server is
dedicated to one or a
few applications,
while database
servers operate in the
multiple server
fashion.
Client/Server Reference
Architecture
Overview of Relational DBMS
➢ In the first relation EMP, ENO represents the relation primary key and the values of ENO
come from the domain of all valid employee numbers, say D1
➢ Note that each attribute of each relation does not ➢ Sometimes, there may be
have to come from a distinct domain. Various more than one possibility for
attributes within a relation or from a number of the key. These alternative keys
relations may be defined over the same domain. are called candidate keys, and
➢ Each relation in any database must have a key. The one of the candidate keys is
key of a relation is “the minimum non-empty subset chosen as the primary key.
of its attributes such that the values of the ➢ The number of attributes of a
attributes comprising the key uniquely identify each relation defines its degree,
tuple of the relation”. The attributes that make up and the number of tuples of
key are called prime attributes. Each relation has at the relation defines its
least one primary key. cardinality.
❖ Normalization
➢ The aim of normalization is to” eliminate various anomalies (or undesirable aspects) of a
relation in order to obtain better relations”.
➢ The following four problems might exist in a relation scheme:
1. Repetition ▪ Certain information may be repeated unnecessarily.
Anomaly: ▪ For example: we have to repeat the projects details (PNAME, BUDGET) for
every employee in the EMP relation.
▪ This repetition cause redundancy, a waste of storage and complicated
update
2. Update ▪ As a consequence of the repetition of data, performing updates may be
Anomaly troublesome.
▪ For example: if the budget of a project changes, multiple tuples have to
be updated to reflect this change.
3. Insertion ▪ It may not be possible to add new tuple to the EMP1 relation.
Anomaly ▪ For example: when a new project is added and no employees are assigned
to this project yet. We can’t add the project data to the EMP1 relation; as
this will cause part of the primary key to have NULL value (no EMP
number) which is not valid.
4. Deletion ▪ This is the converse of the insertion anomaly.
Anomaly ▪ For example: If a project will be terminated, so the project details will be
deleted, and all the employees’ details of those assigned to this project
will be deleted too.
Let X, Y are two attributes or two sets of attributes in a relation R, If for
Functional each value of X in R, there is only one associated Y value, we say that “X
Dependency functionally determines Y” or that “Y is functionally dependent on X”
(FD) .Notational, this is shown as X → Y. By default the primary key of a relation
functionally determines all the non-key attributes of that relation.
➢ Data manipulation languages developed for the relational model (query languages)
fall into two fundamental groups: relational algebra languages and relational calculus
languages.
A. Relational Algebra
➢ Relational algebra consists of a set of operators that operate on relations. Each operator
takes one or two relations as operands and produces a relation as a result, which, in turn,
may be an operand to another operator.
➢ There are five fundamental relational algebra operators: selection, projection, union, set
difference, and Cartesian product. The first two of these operators are unary operators,
and the last three are binary operators.
➢ The additional operators that can be defined in terms of these fundamental operators:
intersection, theta- join, natural join, semi-join and division.
▪ General form:
2. Projection ▪ Produces a vertical slice of a relation (number of attributes)
▪ General form:
▪ Note: projection can generate duplicate tuples. Commercial systems (and
SQL) allow this and provide Projection with duplicate elimination or
Projection without duplicate elimination
3. Union
▪ General form:
▪ Two relations R and S are union compatible if and only if they are of the
same degree and the i-th attribute of each relation is defined over the
same domain
4. Set
Difference
▪ General Form:
5. Cartesian
(Cross) ▪ Given relations:
Product
▪ Cartesian (cross) product:
▪ The result of R × S is a relation of degree (k1+ k2) and consists of all (n1*
n2)-tuples where each tuple is a concatenation of one tuple of R with one
tuple of S. Example: EMP × PAY
6. Intersection
▪ General form: R, S union-compatible
7. θ-Join
▪ General form:
▪ A derivative of Cartesian product:
▪ Example:
▪ Note that:
8. Division
▪ Given relations:
▪ Example:
Types of Join:
1. Outer-Join:
▪ Ensures that tuples from one or both relations that do
not satisfy the join condition still appear in the final
result with other relation’s attribute values set to NULL
▪ Left outer join - Right outer join - Full outer join
▪ Example: EMP left outer join with ASG on ENO
2. Semi-join:
Example:
B. Relational Calculus
A. Level of ▪ Refers to sharing of both data and the applications. There are three
sharing possibilities:-
First, there is no sharing: Where each application and its data execute
at one site, and there is no communication with any other program or
access to any data file at other sites.
Second; the level of data sharing: all the programs are replicated at all
the sites, but data files are not. Accordingly, user requests are handled
at the site where they originate and the necessary data files are
moved around the network.
Finally, in data-plus-program sharing: both data and programs may be
shared, meaning that a program at a given site can request a service
from another program at a second site, which, in turn, may have to
access a data file located at a third site
B. Access ▪ The access patterns of user requests may be static, so that they do
Pattern not change over time, or dynamic.
Behavior
C. The level of ▪ One possibility, of course, is that the designers do not have any
knowledge information about how users will access the database.
about the ▪ Another alternative is that the designers have complete
access information, where the access patterns can reasonably be predicted,
pattern or partial information, where there are deviations from the
behavior predictions.
➢ Two major strategies that have been identified for designing distributed databases are
the top-down approach and the bottom-up approach.
Top-down: mostly in designing systems from scratch - mostly in homogeneous systems
Bottom-up: when the databases already exist at a number of sites
➢ The relations in a database schema are usually decomposed into smaller fragments, for
better understanding we need to answer the following questions:
1. Why fragment at all? 2. How should we fragment? 3. How much should we fragment?
4. Is there any way to test the correctness of decomposition?
5. How should we allocate?
6. What is the necessary information for fragmentation and allocation?
① Why fragment at all ? Can't we just distribute relations?
③ Degree of Fragmentation
➢ The extent to which the database should be fragmented is an important decision that
affects the performance of query execution.
➢ The degree of fragmentation goes from (one extreme) that is not to fragment at all, to
(the other extreme), to fragment to the level of individual tuples (in the case of horizontal
fragmentation) or to the level of individual attributes (in the case of vertical
fragmentation).
➢ There are three rules to apply during fragmentation together, to ensure that the
database doesn’t have any semantic change during or after fragmentation:
➢ When the database is fragmented properly, we have to decide about the allocation of
the fragments to various sites on the network.
➢ The replication alternatives of allocation are:
1- Non-replicated →
◦ Partitioned : each fragment resides
at only one site
2- Replicated → The reasons for
replication are reliability and
efficiency of read-only queries.
◦ Fully replicated: each fragment at
each site.
◦ Partially replicated: each
Comparison of Replication Alternatives
fragment at some of the sites.
❖ The various types of fragmentation
Example:
➢ The PROJ relation is horizontally fragmented into
PROJ1, PROJ2 as described below:
➢ PROJ1 : projects with budgets less than $200,000
➢ PROJ2 : projects with budgets greater than or equal to
$200,000
Example:
The relation PROJ is fragmented into horizontal fragments
PROJ1 and PROJ2. The fragments are defined as follows:
Example:
The relation PROJ can also be fragmented based on
the project location. The resulting fragments are:
Minterm Predicates
➢ A horizontal fragment Ri of relation R consists of all the tuples of R that satisfy a minterm
predicate mi.
➢ Hence, given a set of minterm predicates M, there are as many horizontal fragments of
relation R as there are minterm predicates.
➢ This set of horizontal fragments is also commonly referred to as the set of minterm
fragments.
➢ Minterm predicates consists of set of simple predicates that will form the minterm
predicates. Two important aspects of simple predicates Pr are their completeness, and
their minimality.
➢ Pr should be complete: each tuple of a relation has the same access probability for each
application
➢ Pr should be minimal: there should be at least one application that accesses fi and f j
differently. The simple predicate should be relevant in determining a fragmentation.
Example:
▪ Consider link L1: owner(L1) = PAY and The result of
member(L1) = EMP. this
▪ Then we can group engineers into two fragmentation
groups according to their salary: those
making less than or equal to $30,000, and
those making more than $30,000. The two
fragments EMP1 and EMP2 are defined as
follows:
③ Hybrid fragmentation
Nested fragmentation
with different types.
Query Processing
➢ Query processing is a critical performance issue, ➢ Many important functions
so it has received (and continues to receive) characterize this mapping:
considerable attention in both centralized and 1- The calculus query must be
distributed DBMSs. decomposed into a sequence of
➢ But, the query processing problem is much more relational operators called an
difficult in distributed environments; this is algebraic query.
because a larger number of parameters affect 2- The data accessed by the query
the performance of distributed queries. must be localized so that the
➢ Distributed relations are divided and distributed operators on relations are
as fragments. Distributed database design is of translated to bear on local data
major importance for query processing since the (fragments).
definition of fragments is based on the objective 3- The algebraic query on fragments
of increasing reference locality, and sometimes must be extended with
parallel execution for the most important communication operators and
queries. optimized with respect to a cost
➢ The role of a distributed query processor is to function to be minimized. This
map a high-level query on a distributed database cost function typically refers to
(a set of global relations) expressed in relational computing resources such as disk
calculus into a sequence of database operators I/Os, CPUs, and communication
(of relational algebra) on relation fragments. networks
➢ The total cost of strategy A can be ➢ The total cost of strategy B can be derived as
derived as follows: follows:
1. Produce ASG' by selecting ASG 1. Transfer EMP to site 5 requires 400 *
requires (10+10) * tupacc = 20 tuptrans = 4000
2. Transfer ASG' to the sites of EMP 2. Transfer ASG to site 5 requires 1000 *
requires (10+10) *tuptrans = 200 tuptrans = 10000
3. Produce EMP' by joining ASG' and EMP 3. Produce ASG' by selecting ASG requires
requires (10+10) * tupacc * 2= 40 1000 * tupacc = 1000
4. Transfer EMP' to result site requires 4. Join EMP and ASG' requires 400 * 20
(10+10) *tuptrans = 200 *tupacc = 8000
5. The total cost = 20+ 200+ 40+ 200= 460 5. The total cost = 4000+10000+1000+8000=
▪ The join of ASG' and EMP (step 3) 23000
exploit the index on ENO of EMP. Thus, ▪ We assume that the access methods to
EMP is accessed only once for each relations EMP and ASG based on attributes
tuple of ASG'. RESP and ENO are lost because of data
▪ Strategy A is better by a factor of 50, transfer.
which is quite significant. It also ▪ We assume that the join of EMP and ASG' in
provides better distribution of work step 4 is done by the default nested loop
among sites. algorithm (it performs the Cartesian product
of the two input relations).
The difference between the costs of the two strategies would be higher with slower
communication and/or higher degree of fragmentation
➢ Query optimization aims at choosing the “best” ➢ To avoid the high cost of
point in the solution space of all possible execution exhaustive search →
strategies. 1- They try to find a very good
➢ The first method for query optimization is to solution, not necessarily the best
search the solution space, predict the cost of each one, but avoid the high cost of
strategy, and select the strategy with minimum optimization, in terms of memory
cost. and time consumption.
➢ The problem is that the solution space can be 2- The use of heuristics, whose
large; so there may be many equivalent strategies, effect is to restrict the solution
even with a small number of relations. space so that only a few
➢ The problem becomes worse as the number of strategies are considered. An
relations or fragments increases (becomes greater important heuristic in distributed
than 5 or 6). systems is to replace join
➢ Having high optimization cost is not necessarily operators by combinations of
bad, particularly if query optimization is done once semi-joins to minimize data
for many subsequent executions of the query. communication.
❖ Optimization Timing
➢ A query may be optimized at different times ➢ The main advantage over static query
relative to the actual time of query optimization is that the actual sizes of
execution. intermediate relations are available to
➢ Optimization can be done statically before the query processor, thereby minimizing
executing the query or dynamically as the the probability of a bad choice. The
query is executed. main shortcoming is that query
➢ Static query optimization is done at query optimization, an expensive task, must
compilation time. Dynamic query be repeated for each execution of the
optimization proceeds at query execution query.
time. ➢ Hybrid query optimization attempts to
➢ At any point of execution, the choice of the provide the advantages of static query
best next operator can be based on accurate optimization while avoiding the issues
knowledge of the results of the operators generated by inaccurate estimates.
executed previously. ➢ The approach is basically static, but
➢ Database statistics are not needed to dynamic query optimization may take
estimate the size of intermediate results. place at run time when a high difference
However, they may still be useful in between predicted sizes and actual size
choosing the first operators. of intermediate relations is detected.
❖ Layers of Query Processing
➢ The problem of query processing can itself be decomposed into several sub-problems;
corresponding to various layers:
➢ A generic layering scheme for query processing is shown where each layer solves a well-
defined sub-problem.
① Query Decomposition
➢ The first layer decomposes the calculus query into an algebraic query on global relations.
➢ The information needed for this transformation is found in the global conceptual schema
describing the global relations. The information about data distribution is this layer.
➢ Query decomposition can be viewed as four successive steps which are: normalization,
analysis, simplification, and restructuring
② Data Localization
➢ The input to the second ➢ Generating a query on fragments is done in two steps:
layer is an algebraic query 1- The query is mapped into a fragment query by
on global relations. The main substituting each relation by its reconstruction
role of the second layer is to program (also called materialization program).
localize the query’s data 2- The fragment query is simplified and restructured to
using data distribution produce another “good” query. Simplification and
information in the fragment restructuring may be done according to the same
schema rules used in the decomposition layer.
Note:
OR's mapped into union.
AND's mapped into join or selection.
The qualification in conjunctive normal form is In the latter form, treating the two
conjunctions independently may
lead to redundant work if common
While the qualification in disjunctive normal subexpressions are not eliminated
form is
② Analysis
➢ Query analysis enables rejection of normalized queries for which further processing is
either impossible or unnecessary.
➢ The main reasons for rejection are that the query is type incorrect or semantically
incorrect.
➢ When one of these cases is detected, the query is simply returned to the user with an
explanation. Otherwise, query processing is continued.
➢ To detect that the query is semantically incorrect a query graph and join graph are used.
➢ In a query graph, one node indicates the result relation, and any other node indicates an
operand relation. An edge between two nodes one of which does not correspond to the
result represents a join, whereas an edge whose destination node is the result represents
a project.
➢ In the join graph only the joins are considered. It’s useful in the query optimization
phase.
Example:
➢ The last step of query decomposition is rewriting the query in relational algebra. Here
we represent the relational algebra query graphically by an operator tree.
➢ An operator tree is a tree in which a leaf node is a relation stored in the database, and
a non-leaf node is an intermediate relation produced by a relational algebra operator.
The sequence of operations is directed from the leaves to the root, which represents
the answer to the query.
The query “Find the names of employees other than J. Doe who
Example: worked on the CAD/CAM project for either one or two years”
whose SQL expression is:
➢ By applying transformation rules, many different trees may be found equivalent to the
one produced by the method described above.
➢ We now present the six most useful equivalence rules, which concern the basic
relational algebra operators:
The application of these six The restructuring of the tree in Fig[3] leads to the tree
rules enables the generation of in Fig [5].
many equivalent trees. The resulting tree is good in the sense that repeated
For instance, the tree in Fig [4] is access to the same relation (as in Fig [3]) is avoided and
equivalent to the one in Fig [3]. that the most selective operations are done first.
Fig [4]
Rewritten
Operator
Tree
➢ Find useless (not empty) intermediate relations. Relation R defined over attributes
Example: Applying the reduction rule we get the reduced query in Fig [8].
Fig [8]
Reduction for
Vertical
Fragmentation
▪ By commuting the projection with the join (i.e., projecting on ENO, ENAME), we can see
that the projection on EMP2 is useless because ENAME is not in EMP2. Therefore, the
projection needs to apply only to EMP1.
Query Optimization
➢ The search space is the set of alternative execution ➢ The cost model predicts the
plans that represent the input query. These plans cost of a given execution plan.
are equivalent, in the sense that they yield the same To be accurate, the cost model
result, but they differ in the execution order of must have good knowledge
operations and the way these operations are about the distributed
implemented, and in their performance. execution environment.
➢ The search space is obtained by applying
transformation rules discussed before ③ The Search strategy
➢ Search space characterized by alternative execution.
It Focus on join trees ➢ The search strategy explores
➢ For N relations, there are O(N!) equivalent join trees the search space and selects
that can be obtained by applying commutativity and the best plan, using the cost
associativity rules. model.
An important question is how to “move” in the search space. There are two options:
① Deterministic ② Randomized
▪ Start from base relations and build ▪ Search for optimalities around a particular
plans by adding one relation at each starting point
step ▪ Trade optimization time for execution time
▪ Dynamic programming: breadth-first ▪ Better when > 10 relations
▪ Greedy: depth-first ▪ Iterative improvement
Example:
Consider the following ▪ Fig represents three equivalent join trees for that query,
query: which are obtained by exploiting the associativity of binary
operators.
▪ Each of these join trees can be assigned a cost based on the
estimated cost of each operator.
▪ Join tree (c) which starts with a Cartesian product may have
a much higher cost than the other join trees.
Equivalent
Join Trees
➢ Investigating a large search space may ➢ Another important restriction is with respect
make optimization time prohibitive, to the shape of the join tree. Two kinds of
sometimes much more expensive than join trees are usually distinguished: linear
the actual execution time. versus bushy trees
➢ Therefore, query optimizers typically ➢ A linear tree is a tree such that at least one
restrict the size of the search space operand of each operator node is a base
they consider: relation. By considering only linear trees, the
➢ The first restriction is to use heuristics: size of the search space is reduced to O(2N).
The most common heuristic is to ➢ A bushy tree is more general and may have
perform selection and projection when operators with no base relations as operands
accessing base relations. Another (both operands are intermediate relations).
common heuristic is to avoid Cartesian However, in a distributed environment,
products that are not required by the bushy trees are useful in exhibiting
query. parallelism.
Consider the following query 1. EMP → site 2; Site 2 computes EMP’ = EMP ⋈ ASG
expressed in relational algebra: EMP’ → site 3; Site 3 computes EMP’ ⋈ PROJ.
PROJ ⋈PNO ASG ⋈ENO EMP 2. ASG → site 1; Site 1 computes EMP’ = EMP ⋈ ASG
EMP’ → site 3; Site 3 computes EMP’ ⋈ PROJ.
3. ASG → site 3; Site 3 computes ASG’ = ASG ⋈ PROJ
ASG’ →site 1; Site 1 computes ASG’ ⋈ EMP.
4. PROJ → site 2; Site 2 computes PROJ’ = PROJ ⋈
ASG; PROJ’ → site 1; Site 1 computes PROJ’ ⋈
EMP.
To implement this query there 5. EMP → site 2; PROJ→ site 2; Site 2 computes EMP
are many alternatives: ⋈ PROJ ⋈ ASG
➢ To select one of these programs, the ▪ Example: Assume that relations EMP,
following sizes must be known or ASG, and PROJ of the following query are
predicted: size (EMP), size (ASG), size stored as follows, where relation EMP is
(PROJ), size (EMP ⋈ ASG), and size (ASG ⋈ fragmented “Names of employees
PROJ). working on the CAD/CAM project”
➢ Furthermore, if it is the response time that
is being considered, the optimization must
take into account the fact that transfers
can be done in parallel with strategy 5.
➢ An alternative to enumerating all the
solutions is to use heuristics that consider
▪ There are several possible strategies,
only the sizes of the operand relations by
including the following:
assuming
1. Execute the entire query (EMP ⋈ ASG
▪ For example: the cardinality of the resulting
⋈ PROJ) by moving EMP1 and ASG to
join is the product of operand cardinalities.
site 2.
In this case, relations are ordered by
2. Execute (EMP ⋈ASG) 1 PROJ by moving
increasing sizes and the order of execution
(EMP1 ⋈ASG) and ASG to site 2, and so
is given by this ordering and the join graph.
on.
For instance, the order (EMP, ASG, PROJ)
▪ The choice between the possible
could use strategy 1, while the order (PROJ,
strategies requires an estimate of the
ASG, EMP) could use strategy 4.
size of the intermediate results.
▪ For example: if size (EMP1 ⋈ ASG) > size (EMP1), strategy 1 is preferred to strategy 2.
Therefore, an estimate of the size of joins is required. Example:
Let us consider the query ▪ With a point–to–point network, the best strategy is
PROJ ⋈ ASG, where PROJ and to send each PROJi to site 3, which requires a
ASG are fragmented. Assume transfer of 3000 kbytes, versus 6000 kbytes if ASG is
that the allocation of fragments sent to sites 1,2, and 4.
and their sizes are as follows: ▪ However, with a broadcast network, the best
strategy is to send ASG (in a single transfer) to sites
1, 2, and 4, which incurs a transfer of 2000 kbytes.
The latter strategy is faster and maximizes response
time because the joins can be done in parallel.
A transaction
Model
❖ Definition of a Transaction
Example:
Let us consider a simplified version of a typical
reservation application, where a travel agent
Let us assume that there is a FLIGHT
enters the flight number, the date, and a
relation that records the data about
customer name, and asks for a reservation. The
each flight, a CUST relation for the
transaction to perform this function can be
customers who book flights, and an FC
implemented as follows, where database accesses
relation indicating which customers
are specified in embedded SQL notation:
are on what flights. Let us also assume
that the relation definitions are as
follows (where the underlined
attributes constitute the keys:
Termination A transaction always terminates, even when there are failure. If
the transaction can complete its task successfully, we say that
Conditions of
the transaction commits. If, a transaction stops without
Transactions completing its task, we say that it aborts.
Abort Commit
Example:
① Atomicity ② Consistency
❖ Types of transactions
A. Online (short-life transactions) A. Two-step: All the read actions are performed
• Very short execution time (seconds) before any write action
• Access to small portion of the B. Restricted: restricted so that a data item has to
database be read before it can be written
B. batch (long-life transactions) C. Action model: Which consists of the restricted
• Longer execution time (minutes or class with the further restriction that each
hours) (read, write) pair be executed together.
• Access to large portion of the
database Consider the following example:
The following is how to define transaction T according to the previous transaction models:
Two step
Restricted
Action
❖ Schedule Definition
A schedule is a prefix of a
complete schedule such
that only some of the
operations and only some
of the ordering
relationships are included
❖ Serial History
❖ Serializable History
▪ Transactions execute concurrently, but the net effect of the resulting history upon the
database is equivalent to some serial history.
▪ Equivalent with respect to what?
▪ Conflict equivalence: the relative order of execution of the conflicting operations
belonging to unaborted transactions in two histories are the same.
▪ Conflicting operations: two incompatible operations (e.g., Read and Write) conflict if they
both access the same data item.
▪ Incompatible operations of each transaction is assumed to conflict; do not change their
execution orders.
▪ If two operations from two different transactions conflict, the corresponding transactions
are also said to conflict
❖ Serializability in
Distributed DBMS
❖ Strict
2PL
❖ Centralized 2PL
❖ Distributed 2PL
▪ Transaction execution model: divide into subtransactions each of which execute at a site
Tij: transaction Ti that executes at site j
▪ Transactions run independently at each site until they reach the end of their read phases
▪ All subtransactions are assigned a timestamp at the end of their read phase
▪ Validation test performed during validation phase. If one fails, all rejected.
❖ Deadlock
▪ A transaction is deadlocked if it is
blocked and will remain blocked until
there is intervention.
▪ Locking-based CC algorithms may cause
deadlocks.
▪ TO-based algorithms that involve waiting
may cause deadlocks.
▪ Wait-for graph
- If transaction Ti waits for another
transaction Tj to release a lock on an
entity, then Ti → Tj in WFG.
❖ Local versus
Global WFG
❖ Deadlock
Management
▪ Ignore: Let the application programmer deal with it, or restart the system
▪ Prevention: Guaranteeing that deadlocks can never occur in the first place. Check
transaction when it is initiated. Requires no run time support.
▪ Avoidance: Detecting potential deadlocks in advance and taking action to insure that
deadlock will not occur. Requires run time support.
▪ Detection and Recovery: Allowing deadlocks to form and then finding and breaking them.
As in the avoidance scheme, this requires run time support.
❖ Deadlock Prevention
▪ Evaluation:
▪ All resources which may be needed by a transaction - Reduced concurrency due
must be predeclared. to preallocation
- The system must guarantee that none of the resources - Evaluating whether an
will be needed by an ongoing transaction. allocation is safe leads to
- Resources must only be reserved, but not necessarily added overhead.
allocated a priori - Difficult to determine
- Unsuitability of the scheme in database environment (partial order)
- Suitable for systems that have no provisions for - No transaction rollback or
undoing processes. restart is involved.