Distributed Databases and Client-Server Architectures
Distributed Databases and Client-Server Architectures
EMP
London
New York
Internet
Hong Kong
Payroll app Problem:
NY and HK payroll
apps run very slowly!
Hong Kong
London New York
Payroll app Payroll app
London
Emp NY
London
New York Emp
Internet
Hong Kong
Payroll app
Much better!!
Hong Kong
HK
Emp
London New York
Payroll app Payroll app
Annual
Bonus app
London
Emp NY
London
New York Emp
Internet
Hong Kong Distribution provides
Payroll app
opportunities for
parallel execution
Hong Kong
HK
Emp
Distributed Database System
Advantages
Management of distributed data with different
levels of transparency:
This refers to the physical placement of data (files,
relations, etc.) which is not known to the user
(distribution transparency).
Distributed Database System
Advantages (transparency, contd.)
The EMPLOYEE, PROJECT, and WORKS_ON
tables may be fragmented horizontally and stored
with possible replication as shown below.
Distributed Database System
Advantages (transparency, contd.)
Distribution and Network transparency:
Users do not have to worry about operational details
of the network.
There is Location transparency, which refers to freedom of
issuing command from any location without affecting its
working.
Then there is Naming transparency, which allows access
to any names object (files, relations, etc.) from any
location.
Distributed Database System
Advantages (transparency, contd.)
Replication transparency:
It allows to store copies of a data at multiple sites as
shown in the previous diagram.
This is done to minimize access time to the required
data.
Fragmentation transparency:
Allows to fragment a relation horizontally (create a
subset of tuples of a relation) or vertically (create a
subset of columns of a relation).
Distributed Database System
Other Advantages
Increased reliability and availability:
Reliability refers to system live time, that is, system
is running efficiently most of the time. Availability is
the probability that the system is continuously
available (usable or accessible) during a time
interval.
A distributed database system has multiple nodes
(computers) and if one fails then others are
available to do the job.
Distributed Database System
Other Advantages (contd.)
Improved performance:
A distributed DBMS fragments the database to keep
data closer to where it is needed most.
This reduces data management (access and
modification) time significantly.
Easier expansion (scalability):
Allows new nodes (computers) to be added anytime
without changing the entire configuration.
Distributed Data Storage
Advantages of Replication
Availability: failure of site containing relation r does not result in
parallel.
Reduced data transfer: relation r is available locally at each site
containing a replica of r.
Disadvantages of Replication
Increased cost of updates: each replica of relation r must be
updated.
Increased complexity of concurrency control: concurrent updates to
distinct replicas may lead to inconsistent data unless special
concurrency control mechanisms are implemented.
One solution: choose one copy as primary copy and apply
concurrency control operations on primary copy
Types of Distributed Database Systems
Homogeneous
All sites of the database
system have identical Window
setup, i.e., same database Site 5 Unix
system software. Oracle Site 1
Oracle
The underlying operating Window
system may be different. Site 4 Communications
For example, all sites run network
Oracle or DB2, or Sybase
or some other database Oracle
system.
Site 3 Site 2
The underlying operating Linux Oracle Linux Oracle
systems can be a mixture
of Linux, Window, Unix,
etc.
Types of Distributed Database Systems
Heterogeneous
Federated: Each site may run different database system but the
Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux
Homogeneous Distributed Databases
requests.
Each site surrenders part of its autonomy in terms of right to change
schemas or software
Appears to user as a single system
Difference in schema is a major problem for query processing
Difference in software is a major problem for transaction processing
Sites may not be aware of each other and may provide only
Client 2
Server 2 Client 3
Server n Client n
Client-Server Database Architecture
Clients reach server for desired service, but
server does reach clients.
The server software is responsible for local data
management at a site, much like centralized
DBMS software.
The client software is responsible for most of the
distribution function.
The communication software manages
communication among clients and servers.
Concurrency Control Techniques
What is Concurrency Control?
site, which may result in the transaction being committed at all sites or
aborted at all sites.
Transaction System Architecture
Database Concurrency Control
1 Purpose of Concurrency Control
To enforce Isolation (through mutual exclusion) among
conflicting transactions.
To preserve database consistency through consistency
preserving execution of transactions.
To resolve read-write and write-write conflicts.
Example:
In concurrent execution environment if T1 conflicts with T2
over a data item A, then the existing concurrency control
decides if T1 or T2 should get the A and if the other
transaction is rolled-back or waits.
Concurrency Control Protocols
Timestamp-Based Protocols
Validation-Based Protocols
Lock-based Protocols
Y N
Write
N N
Database Concurrency Control
Two-Phase Locking Techniques: Essential
components
Lock Manager:
Managing locks on data items.
Lock table:
Lock manager uses it to store the identify of
transaction locking a data item, the data item, lock
mode and pointer to the next data item locked. One
simple way to implement a lock table is through
linked list.
Transaction ID Data item id lock mode Ptr to next data item
T1 X1 Read Next
Database Concurrency Control
Two-Phase Locking Techniques: Essential
components
Database requires that all transactions should be
well-formed. A transaction is well-formed if:
It must lock the data item before it reads or writes to
it.
It must not lock an already locked data items and it
must not try to unlock a free data item.
Database Concurrency Control
Dealing with Deadlock and Starvation
Deadlock
T’1 T’2
read_lock (Y); T1 and T2 did follow two-phase
read_item (Y); policy but they are deadlock
read_lock (X);
read_item (Y);
write_lock (X);
(waits for X) write_lock (Y);
(waits for Y)
managed
In the case of resource leak
Deadlock
Deadlock refers to a specific situation where two or more
processes are waiting for each other to release a resource or
more than two processes are waiting for the resource in a
circular chain.
Database Concurrency Control
Dealing with Deadlock and Starvation
Deadlock avoidance
PREPARED*
Coordinator
Participant
COMMIT*
DONE
REQUEST-TO-PREPARE
NO
Coordinator
Participant
ABORT
DONE
TWO-PHASE COMMIT (2PC) - OK
TWO-PHASE COMMIT (2PC) - ABORT
‘G
l ob
al
A bo
rt’