Distributed Database
Management Systems
Distributed Database Management Systems
When an organization is geographically dispersed, it may choose to store its
databases on a central database server or to distribute them to local servers
(or a combination of both). A distributed database is a single logical
database that is spread physically across computers in multiple locations
that are connected by a data communications network.
The distributed database is still centrally administered as a corporate
resource while providing local flexibility and customization. The network
must allow the users to share the data; thus, a user (or program) at location
A must be able to access (and perhaps update) data at location B. The sites
of a distributed system may be spread over a large area (e.g., country or the
world) or over a small area (e.g., a building or campus). The computers may
range from PCs to large-scale servers or even supercomputers. A distributed
database requires multiple instances of a database management system (or
several DBMSs), running at each remote site.
2
Evolution of DDBMS
Decentralized database management systems
(DDBMS)
Interconnected computer systems
Data/processing functions reside on multiple sites
1970’s: Centralized DBMS
1980’s: Decentralized management structure
common
1990’s: New forces
Internet and the World Wide Web used for data
access and distribution
3
DDBMS
Advantages
Data located near site with greatest demand
Faster data access
Faster data processing
Growth facilitation
Improved communications
Reduced operating costs
User-friendly interface
Less danger of single-point failure
Processor independence
4
DDBMS
Disadvantages
Complexity of management and control
Security
Increased storage requirements
Greater difficulty in managing data
environment
Increased training costs
5
Distributed Processing
Shares database’s logical processing
among physically, networked independent
sites
6
Distributed Database
Stores logically related database over
physically independent sites
7
Distributed Database vs. Distributed
Processing
Distributed processing
Does not require distributed database
May be based on a single database on single
computer
Copies or parts of database processing functions
must be distributed to all data storage sites
Distributed database
Requires distributed processing
Both
Require a network to connect components
8
Functions of DDBMS
Application/end user interface
Transformation to determine request components
Query optimization to find the best access strategy
Mapping to determine the data location
I/O interface to read or write data
Formatting to prepare the data for presentation
Security to provide data privacy
Backup and recovery
DB Administration
Concurrency Control
Transaction Management
9
Centralized Database
Figure 10.3
10
Fully Distributed Database Management System
Figure 10.4
11
DDBMS Components
Computer workstations
Network hardware and software
components
Communications media
Transaction processor (TP)
Also called application manager (AP) or
transaction manager (TM)
Data processor (DP)
Also called data manager (DM)
12
Distributed Database Components
Transaction processor(TP), Data processor (DP)
13
DDBMS
Protocols
Interface with network to transport data
and commands between DPs and TPs
Synchronize data received from DPs and
route to appropriate TPs
Ensure common database functions
Security
Concurrency control
Backup and recovery
Transaction processor(TP), Data processor (DP)
14
Levels of Data and Process Distribution
Database systems can be classified based
on process distribution and data
distribution
15
Single-Site Processing, Single-Site Data (SPSD)
All processing on single CPU or host
computer
All data are stored on host computer disk
DBMS located on the host computer
Typical of mainframe and minicomputer
DBMSs
Typical of 1st generation of single-user
microcomputer database
16
Single-Site Processing, Single-Site Data (con’t.)
Figure 10.6
17
Multiple-Site Processing, Single-Site Data (MPSD)
• Requires network file server
• Applications accessed through LAN
• Variation known as client/server
architecture
Figure 10.7
18
Multiple-Site Processing,
Multiple-Site Data (MPMD)
Fully distributed DDBMS with support for
multiple DPs and TPs at multiple sites
Homogeneous I
Integrate one type of centralized DBMS over the
network
Heterogeneous
Integrate different types of centralized DBMSs
over a network
19
20
Homogeneous Distributed Database Scenario
Heterogeneous Distributed Database Scenario
Distributed
DB Transparency
Allows end users to feel like only database
user
Hides complexities of distributed database
Transparency features
Distribution
Transaction
Failure
Performance
Heterogeneity
23
Distribution Transparency
Allows management of a physically
dispersed database as though it were
centralized
Three Levels
Fragmentation transparency
Location transparency
Local mapping transparency Table 10.2
24
Transaction
Transparency
Ensures transactions maintain integrity
and consistency
Completed only if all involved database
sites complete their part of the transaction
Management mechanisms
Remote request
Remote transaction
Distributed transaction
Distributed request
25
Remote Request
Figure 10.10
26
Remote Transaction
27
Distributed Transaction
Figure 10.12
28
Distributed Requests
Figure 10.13
29
Distributed Requests (con’t.)
Figure 10.14
30
31
Synchronous and Asynchronous distributed Database
32
Data replication
A popular option for data distribution as well as for fault tolerance of a
database is to store a separate copy of the database at each of two or more
sites. Replication may use either synchronous or asynchronous distributed
database technologies, although asynchronous technologies are more typical
in a replicated environment. If a copy is stored at every site, we have the case
of full replication, which may be impractical except for only relatively small
databases. However, as disk storage and network technology costs have
decreased, full data replication, or mirror images, have become more
common, especially for “always on” services, such as electronic commerce
and search engines.
33
There are five advantages to data replication:
1. Reliability If one of the sites containing the relation (or database) fails, a
copy can always be found at another site without network traffic delays.
2. Fast response Each site that has a full copy can process queries locally.
3. Replicated databases are usually refreshed at scheduled intervals, so most
forms of replication are used when some relaxing of synchronization across
database copies is acceptable.
4. Node decoupling Each transaction may proceed without coordination across
the network. if nodes are down, busy, or disconnected (e.g., in the case of
mobile personal computers), a transaction is handled when the user desires. In
the place of real-time synchronization of updates, a behind-the-scenes process
coordinates all data copies.
5. Reduced network traffic at prime time Often updating data happens during
prime business hours, when network traffic is highest and the demands for
rapid response greatest. Replication, with delayed updating of copies of data,
moves network traffic for sending updates to other nodes to non-prime-time
hours.
34
Snapshot replication:
snapshot replication Different schemes exist for updating data copies,
assuming that multiple sites are updating the same data First, updates from
all replicated sites are periodically collected at a master, or primary, site,
where all the updates are made to form a consolidated record of all changes.
With some distributed DBMSs, this list of changes is collected in a snapshot
log, which is a table of row identifiers for the records to go into the snapshot.
Then a readonly snapshot of the replicated portion of the database is taken at
the master site. Finally, the snapshot is sent to each site where there is a copy.
35
u !
yo
n k
h a
T
36