0% found this document useful (0 votes)
4 views

Distributed Databases and Client-Server Architectures

The document discusses distributed databases, their concepts, advantages, and types, including homogeneous and heterogeneous systems. It covers data fragmentation, replication, allocation, and concurrency control techniques necessary for managing simultaneous operations without conflicts. Additionally, it addresses the challenges of distributed transactions and the importance of maintaining data integrity and availability in a distributed environment.

Uploaded by

Mohamed Amin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Distributed Databases and Client-Server Architectures

The document discusses distributed databases, their concepts, advantages, and types, including homogeneous and heterogeneous systems. It covers data fragmentation, replication, allocation, and concurrency control techniques necessary for managing simultaneous operations without conflicts. Additionally, it addresses the challenges of distributed transactions and the importance of maintaining data integrity and availability in a distributed environment.

Uploaded by

Mohamed Amin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

Distributed Databases

and Concurrency Control


Chapter Outline
1. Distributed Database Concepts
2. Data Fragmentation, Replication and Allocation
3. Types of Distributed Database Systems
4. Concurrency Control and Recovery
5. 3-Tier Client-Server Architecture
Distributed Database Concepts
 A transaction can be executed by multiple
networked computers in a unified manner.
 A distributed database (DDB) processes Unit of
execution (a transaction) in a distributed manner.
A distributed database (DDB) can be defined as
 A distributed database (DDB) is a collection of
multiple logically related database distributed over
a computer network, and a distributed database
management system as a software system that
manages a distributed database while making the
distribution transparent to the user.
London New York
Payroll app Payroll app

EMP
London
New York

Internet
Hong Kong
Payroll app Problem:
NY and HK payroll
apps run very slowly!
Hong Kong
London New York
Payroll app Payroll app

London
Emp NY
London
New York Emp

Internet
Hong Kong
Payroll app
Much better!!

Hong Kong

HK
Emp
London New York
Payroll app Payroll app
Annual
Bonus app

London
Emp NY
London
New York Emp

Internet
Hong Kong Distribution provides
Payroll app
opportunities for
parallel execution
Hong Kong

HK
Emp
Distributed Database System
 Advantages
 Management of distributed data with different
levels of transparency:

This refers to the physical placement of data (files,
relations, etc.) which is not known to the user
(distribution transparency).
Distributed Database System
 Advantages (transparency, contd.)
 The EMPLOYEE, PROJECT, and WORKS_ON
tables may be fragmented horizontally and stored
with possible replication as shown below.
Distributed Database System
 Advantages (transparency, contd.)
 Distribution and Network transparency:

Users do not have to worry about operational details
of the network.
 There is Location transparency, which refers to freedom of
issuing command from any location without affecting its
working.
 Then there is Naming transparency, which allows access
to any names object (files, relations, etc.) from any
location.
Distributed Database System
 Advantages (transparency, contd.)
 Replication transparency:

It allows to store copies of a data at multiple sites as
shown in the previous diagram.

This is done to minimize access time to the required
data.
 Fragmentation transparency:

Allows to fragment a relation horizontally (create a
subset of tuples of a relation) or vertically (create a
subset of columns of a relation).
Distributed Database System
 Other Advantages
 Increased reliability and availability:

Reliability refers to system live time, that is, system
is running efficiently most of the time. Availability is
the probability that the system is continuously
available (usable or accessible) during a time
interval.

A distributed database system has multiple nodes
(computers) and if one fails then others are
available to do the job.
Distributed Database System
 Other Advantages (contd.)
 Improved performance:

A distributed DBMS fragments the database to keep
data closer to where it is needed most.

This reduces data management (access and
modification) time significantly.
 Easier expansion (scalability):

Allows new nodes (computers) to be added anytime
without changing the entire configuration.
Distributed Data Storage

 Assume relational data model


 Replication

System maintains multiple copies of data, stored in different
sites, for faster retrieval and fault tolerance.
 Fragmentation

Relation is partitioned into several fragments stored in distinct
sites
 Replication and fragmentation can be combined

Relation is partitioned into several fragments: system
maintains several identical replicas of each such fragment.
Data Fragmentation, Replication and
Allocation
 Data Replication
 Database is replicated to all sites.
 In full replication the entire database is replicated and in
partial replication some selected part is replicated to some
of the sites.
 Data replication is achieved through a replication schema.
 Data Distribution (Data Allocation)
 This is relevant only in the case of partial replication or
partition.
 The selected portion of the database is distributed to the
database sites.
Data Replication

 A relation or fragment of a relation is


replicated if it is stored redundantly in two or
more sites.
 Full replication of a relation is the case where
the relation is stored at all sites.
 Fully redundant databases are those in which
every site contains a copy of the entire
database.
Data Replication (Cont.)

 Advantages of Replication
 Availability: failure of site containing relation r does not result in

unavailability of r is replicas exist.


 Parallelism: queries on r may be processed by several nodes in

parallel.
 Reduced data transfer: relation r is available locally at each site

containing a replica of r.
 Disadvantages of Replication

Increased cost of updates: each replica of relation r must be
updated.
 Increased complexity of concurrency control: concurrent updates to
distinct replicas may lead to inconsistent data unless special
concurrency control mechanisms are implemented.

One solution: choose one copy as primary copy and apply
concurrency control operations on primary copy
Types of Distributed Database Systems
 Homogeneous

All sites of the database
system have identical Window
setup, i.e., same database Site 5 Unix
system software. Oracle Site 1
Oracle

The underlying operating Window
system may be different. Site 4 Communications

For example, all sites run network
Oracle or DB2, or Sybase
or some other database Oracle
system.
Site 3 Site 2

The underlying operating Linux Oracle Linux Oracle
systems can be a mixture
of Linux, Window, Unix,
etc.
Types of Distributed Database Systems
 Heterogeneous
 Federated: Each site may run different database system but the

data access is managed through a single conceptual schema.



This implies that the degree of local autonomy is minimum. Each site
must adhere to a centralized access policy. There may be a global
schema.
 Multidatabase: There is no one conceptual global schema. For
data access a schema is constructed dynamically as needed by
the application software. Unix Relational
Object
Oriented Site 5 Unix
Site 1
Hierarchical
Window
Site 4 Communications
network

Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux
Homogeneous Distributed Databases

 In a homogeneous distributed database


 All sites have identical software

 Are aware of each other and agree to cooperate in processing user

requests.
 Each site surrenders part of its autonomy in terms of right to change

schemas or software
 Appears to user as a single system

 In a heterogeneous distributed database


 Different sites may use different schemas and software


Difference in schema is a major problem for query processing

Difference in software is a major problem for transaction processing
 Sites may not be aware of each other and may provide only

limited facilities for cooperation in transaction processing


Types of Distributed Database Systems
 Federated Database Management Systems
Issues
 Differences in data models:

Relational, Objected oriented, hierarchical, network,
etc.
 Differences in constraints:

Each site may have their own data accessing and
processing constraints.
 Differences in query language:

Some site may use SQL, some may use SQL-89,
some may use SQL-99, and so on.
Client-Server Database Architecture
 It consists of clients running client software, a set
of servers which provide all database
functionalities and a reliable communication
infrastructure.
Server 1 Client 1

Client 2

Server 2 Client 3

Server n Client n
Client-Server Database Architecture
 Clients reach server for desired service, but
server does reach clients.
 The server software is responsible for local data
management at a site, much like centralized
DBMS software.
 The client software is responsible for most of the
distribution function.
 The communication software manages
communication among clients and servers.
Concurrency Control Techniques
What is Concurrency Control?

 Concurrency Control in Database Management


System is a procedure of managing simultaneous
operations without conflicting with each other.
 It ensures that Database transactions are performed
concurrently and accurately to produce correct
results without violating data integrity of the
respective Database.
 Concurrent access is quite easy if all users are just
reading data. There is no way they can interfere with
one another. Though for any practical Database, it
would have a mix of READ and WRITE operations
and hence the concurrency is a challenge.
Potential problems of Concurrency

 Lost Updates occur when multiple transactions select


the same row and update the row based on the value
selected
 Uncommitted dependency issues occur when the
second transaction selects a row which is updated by
another transaction (dirty read)
 Non-Repeatable Read occurs when a second
transaction is trying to access the same row several
times and reads different data each time.
Temporary Update Problem:
 Temporary update or dirty read
problem occurs when one
transaction updates an item and
fails. But the updated item is
used by another transaction
before the item is changed or
reverted back to its last value.

 In this example, if transaction 1


fails for some reason then X will
revert back to its previous value.
But transaction 2 has already
read the incorrect value of X.
Lost Update Problem:
 In the lost update problem,
update done to a data item
by a transaction is lost as it is
overwritten by the update
done by another transaction.

 In this example, transaction 1


changes the value of X but it
gets overwritten by the
update done by transaction 2
on X. Therefore, the update
done by transaction 1 is lost.
Unrepeatable Read Problem:
 The unrepeatable problem occurs
when two or more read operations of
the same transaction read different
values of the same variable.

 In this example, once transaction 2


reads the variable X, a write
operation in transaction 1 changes
the value of the variable X. Thus,
when another read operation is
performed by transaction 2, it reads
the new value of X which was
updated by transaction 1.
Concurrency Control and Recovery
 Distributed Databases encounter a number of
concurrency control and recovery problems which
are not present in centralized databases. Some
of them are listed below.
 Dealing with multiple copies of data items
 Failure of individual sites
 Communication link failure
 Distributed commit
 Distributed deadlock
Concurrency Control and Recovery
 Details
 Dealing with multiple copies of data items:

The concurrency control must maintain global
consistency. Likewise the recovery mechanism
must recover all copies and maintain consistency
after recovery.
 Failure of individual sites:

Database availability must not be affected due to
the failure of one or two sites and the recovery
scheme must recover them before they are
available for use.
Concurrency Control and Recovery
 Details (contd.)
 Communication link failure:

This failure may create network partition which would affect
database availability even though all database sites may be
running.
 Distributed commit:

A transaction may be fragmented and they may be executed
by a number of sites. This require a two or three-phase
commit approach for transaction commit.
 Distributed deadlock:

Since transactions are processed at multiple sites, two or
more sites may get involved in deadlock. This must be
resolved in a distributed manner.
System Failure Modes
 Failures unique to distributed systems:

Failure of a site.

Loss of massages

Handled by network transmission control protocols such as
TCP-IP

Failure of a communication link

Handled by network protocols, by routing messages via
alternative links

Network partition

A network is said to be partitioned when it has been split into
two or more subsystems that lack any connection between them

Note: a subsystem may consist of a single node
 Network partitioning and site failures are generally indistinguishable.
What is a Transaction?
 A set of steps completed by a DBMS to accomplish a
single user task.
 Must be either entirely completed or aborted
 No intermediate states are acceptable
Distributed Transactions
 Transaction may access data at several sites.
 Each site has a local transaction manager responsible for:
 Maintaining a log for recovery purposes

 Participating in coordinating the concurrent execution of the transactions

executing at that site.


 Each site has a transaction coordinator, which is responsible for:
 Starting the execution of transactions that originate at the site.

 Distributing subtransactions at appropriate sites for execution.

 Coordinating the termination of each transaction that originates at the

site, which may result in the transaction being committed at all sites or
aborted at all sites.
Transaction System Architecture
Database Concurrency Control
 1 Purpose of Concurrency Control
 To enforce Isolation (through mutual exclusion) among
conflicting transactions.
 To preserve database consistency through consistency
preserving execution of transactions.
 To resolve read-write and write-write conflicts.

 Example:
 In concurrent execution environment if T1 conflicts with T2
over a data item A, then the existing concurrency control
decides if T1 or T2 should get the A and if the other
transaction is rolled-back or waits.
Concurrency Control Protocols

Different concurrency control protocols offer


different benefits between the amount of
concurrency they allow and the amount of overhead
that they impose.
Following are the Concurrency Control techniques
in DBMS:
Lock-Based Protocols

Two Phase Locking Protocol

Timestamp-Based Protocols

Validation-Based Protocols
Lock-based Protocols

 Lock Based Protocols in DBMS is a mechanism in which


a transaction cannot Read or Write the data until it
acquires an appropriate lock. Lock based protocols help
to eliminate the concurrency problem in DBMS for
simultaneous transactions by locking or isolating a
particular transaction to a single user.
 A lock is a data variable which is associated with a data
item. This lock signifies that operations that can be
performed on the data item. Locks in DBMS help
synchronize access to the database items by concurrent
transactions.
 All lock requests are made to the concurrency-control
manager. Transactions proceed only once the lock
request is granted.
1. Shared Lock (S):
 A shared lock is also called a Read-only lock. With
the shared lock, the data item can be shared
between transactions. This is because you will never
have permission to update data on the data item.
 For example, consider a case where two
transactions are reading the account balance of a
person. The database will let them read by placing a
shared lock. However, if another transaction wants
to update that account’s balance, shared lock
prevent it until the reading process is over.
2. Exclusive Lock (X):
 With the Exclusive Lock, a data item can be read as well as
written. This is exclusive and can’t be held concurrently on the
same data item. X-lock is requested using lock-x instruction.
 Transactions may unlock the data item after finishing the ‘write’
operation.
 For example, when a transaction needs to update the account
balance of a person. You can allows this transaction by placing X
lock on it.
 Therefore, when the second transaction wants to read or write,
exclusive lock prevent this operation.
Database Concurrency Control
Two-Phase Locking Techniques
 Locking is an operation which secures

(a) permission to Read

(b) permission to Write a data item for a transaction.
 Example:

Lock (X). Data item X is locked in behalf of the requesting
transaction.
 Unlocking is an operation which removes these permissions
from the data item.
 Example:

Unlock (X): Data item X is made available to all other
transactions.
 Lock and Unlock are Atomic operations.
Database Concurrency Control
Two-Phase Locking Techniques: Essential components

Two locks modes:

(a) shared (read) (b) exclusive (write).

Shared mode: shared lock (X)

More than one transaction can apply share lock on X for
reading its value but no write lock can be applied on X by any
other transaction.

Exclusive mode: Write lock (X)

Only one write lock on X can exist at any time and no shared
lock can be applied by any other transaction on X.

Conflict matrix Read Write
Read

Y N
Write

N N
Database Concurrency Control
Two-Phase Locking Techniques: Essential
components
 Lock Manager:

Managing locks on data items.
 Lock table:

Lock manager uses it to store the identify of
transaction locking a data item, the data item, lock
mode and pointer to the next data item locked. One
simple way to implement a lock table is through
linked list.
Transaction ID Data item id lock mode Ptr to next data item
T1 X1 Read Next
Database Concurrency Control
Two-Phase Locking Techniques: Essential
components
 Database requires that all transactions should be
well-formed. A transaction is well-formed if:

It must lock the data item before it reads or writes to
it.

It must not lock an already locked data items and it
must not try to unlock a free data item.
Database Concurrency Control
Dealing with Deadlock and Starvation
 Deadlock
T’1 T’2
read_lock (Y); T1 and T2 did follow two-phase
read_item (Y); policy but they are deadlock
read_lock (X);
read_item (Y);
write_lock (X);
(waits for X) write_lock (Y);
(waits for Y)

 Deadlock (T’1 and T’2)


Database Concurrency Control
Dealing with Deadlock and Starvation
 Deadlock prevention

 A transaction locks all data items it refers to before


it begins execution.
 This way of locking prevents deadlock since a
transaction never waits for a data item.
 The conservative two-phase locking uses this
approach.
Dealing with Deadlock and Starvation
 Starvation is the situation when a transaction needs to wait
for an indefinite period to acquire a lock.
 Following are the reasons for Starvation:
 When waiting scheme for locked items is not properly

managed
 In the case of resource leak

 The same transaction is selected as a victim repeatedly

 Deadlock
 Deadlock refers to a specific situation where two or more
processes are waiting for each other to release a resource or
more than two processes are waiting for the resource in a
circular chain.
Database Concurrency Control
Dealing with Deadlock and Starvation
 Deadlock avoidance

 There are many variations of two-phase locking algorithm.


 Some avoid deadlock by not letting the cycle to complete.
 That is as soon as the algorithm discovers that blocking a
transaction is likely to create a cycle, it rolls back the
transaction.
 Wound-Wait and Wait-Die algorithms use timestamps to
avoid deadlocks by rolling-back victim.
Database Concurrency Control
Dealing with Deadlock and Starvation
 Starvation

 Starvation occurs when a particular transaction consistently


waits or restarted and never gets a chance to proceed
further.
 In a deadlock resolution it is possible that the same
transaction may consistently be selected as victim and
rolled-back.
 This limitation is inherent in all priority based scheduling
mechanisms.
 In Wound-Wait scheme a younger transaction may always
be wounded (aborted) by a long running older transaction
which may create starvation.
Commit Protocols
 Commit protocols are used to ensure atomicity across
sites

a transaction which executes at multiple sites must either
be committed at all the sites, or aborted at all the sites.

not acceptable to have a transaction committed at one
site and aborted at another
 The two-phase commit (2PC) protocol is widely used
 The three-phase commit (3PC) protocol is more
complicated and more expensive, but avoids some
drawbacks of two-phase commit protocol.
Distributed commit problem
.
Transaction T

Action: Action: Action:


a1,a2 a3 a4,a5

Commit must be atomic


Distributed commit problem
 Commit must be atomic

 Solution: Two-phase commit (2PC)


 Centralized 2PC
 Distributed 2PC
 Linear 2PC
 Many other variants…
Terminology
 Resource Managers (RMs)
 Usually databases
 Participants
 RMs that did work on behalf of transaction
 Coordinator
 Component that runs two-phase commit on behalf
of transaction
Two – Phase Commit
REQUEST-TO-PREPARE

PREPARED*
Coordinator

Participant
COMMIT*

DONE
REQUEST-TO-PREPARE

NO
Coordinator

Participant
ABORT

DONE
TWO-PHASE COMMIT (2PC) - OK
TWO-PHASE COMMIT (2PC) - ABORT

‘G
l ob
al
A bo
rt’

You might also like