LECTURE 9 & 10
DISTRIBUTED DATABASE MANAGEMENT MUHAMMAD HAMIZ MOHD RADZI
SYSTEMS
COURSE OUTLINE
Evolution of DDBMS
Advantages and Disadvantages of DDBMS
Distributed Processing and Distributed Database
Characteristics of DDBMS
DDBMS Components
Level of Data and Process Distribution
Distributed Database Transparency
Distribution Transparency
Distributed Database Design
MUHAMMAD HAMIZ MOHD RADZI
OBJECTIVES
At the end of this lesson, you should be able to:
Describe centralize DBMS with its features and problem
Describe the evolution of DDBMS
Explain the advantages and disadvantages of DDBMS
Explain the distributed processing and distributed database
Explain the characteristics of DDBMS
Describe the components of DDBMS
Explain SPSD, MPSD, MPMD
MUHAMMAD HAMIZ MOHD RADZI
Explain distribution, transaction, failure and performance transparency
Explain fragmentation, location and local mapping transparency
Describe the horizontal, vertical and mixed fragmentation of database design
Explain fully, partially and un-replicated replication of database design
Explain the data allocation of database design
MUHAMMAD HAMIZ MOHD RADZI
INTRODUCTION
Previously, database is known as one large single logical data.
Application
program 1
(with data
semantics)
DBMS
description
Application
program 2 manipulation
(with data database
semantics)
control
Application
program 3
(with data
semantics) MUHAMMAD HAMIZ MOHD RADZI
Database is located at the server and processing is split between server and client
to lessen data traffic on the network.
However, there a few limitations to the design of a database that makes DBMS is
not desirable.
Hence, concept of DDBMS is came into the context.
DDBMS means the database are fragmented into few parts and are allocated to
few places.
Even though the database are allocated into few places, the DDBMS treats the
database as single logical data.
MUHAMMAD HAMIZ MOHD RADZI
EVOLUTION OF DDBMS
During the 1970s, corporations implemented centralized database management
systems to meet their structured information needs.
Structured information is usually presented as regularly issued formal reports in a
standard format.
Thus, structured information needs are well served by centralized systems.
Required that corporate data be stored in a single central site and data access
provided through dumb terminals
MUHAMMAD HAMIZ MOHD RADZI
CENTRALIZED DATABASE
A database system which resides at one of the nodes of a network of
computers.
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
MUHAMMAD HAMIZ MOHD RADZI
PROBLEMS WITH CENTRALIZED DB
Performance degradation as number of remote sites grew
High cost to maintain large centralized DBs
Reliability problems with one, central site
The site with the database can become a bottleneck.
Data availability is not efficient
Possible availability problem: if the site with the database goes down, there can be
no data access.
MUHAMMAD HAMIZ MOHD RADZI
DDB CONCEPT
Instead of having one, centralized database, we are going to spread the data out
among various cities on the distributed network, each of which has its own
computer and data storage facilities.
All of this distributed data is still considered to be a single logical database.
Location transparency - The user just issues the query, and the result is returned.
It is not necessary to know where on the network the data being sought is located.
MUHAMMAD HAMIZ MOHD RADZI
DDBMS
A distributed database (DDB) is a collection of multiple, logically interrelated
databases distributed over a computer network.
A distributed database management system (DDBMS) is the software that
manages the DDB and provides an access mechanism that makes this distribution
transparent to the users.
MUHAMMAD HAMIZ MOHD RADZI
FACTORS THAT AIDED DDBMS TO COPE WITH
TECHNOLOGICAL ADVANCEMENT
Acceptance of Internet as a platform for business
Mobile wireless revolution
Usage of application as a service
Focus on mobile business intelligence
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTING THE DATA
Headquartered in NY, a company’s database
consists of 6 large tables: A, B, C, D, E, F.
With a centralized database, all 6 tables
would be located in NY.
The company has major sites in Los Angeles,
Memphis, New York, Paris, and Tokyo.
The first and simplest idea in distributing the
data would be to disperse the six tables
among the five sites, perhaps based on
frequency of use of each table.
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTING THE DATA
Tables A and B are kept at New York
Table C is moved to Memphis
Tables D and E are moved to Tokyo
Table F is moved to Paris.
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTING THE DATA
Paris employees can now access Table F without incurring
telecommunications costs associated with accessing Table F in NY.
Local autonomy - Paris employees, e.g., can take responsibility for Table F -
- its security, backup and recovery, and concurrency control.
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTING THE DATA: PROBLEMS
When the database was centralized at New York, a query issued
at any of the sites that required a join of two or more of the
tables could be handled in the standard way by the computer at
New York.
The result would then be sent to the site that issued the query.
In the dispersed approach, a join might require tables located at
different sites!
MUHAMMAD HAMIZ MOHD RADZI
ADVANTAGES AND DISADVANTAGES OF DDBMS
Advantages Disadvantages
• Data are located near • Complexity of management
greatest demand site and control
• Faster data access and • Technological difficulty
processing • Security
• Growth facilitation • Lack of standards
• Improved communications • Increased storage and
• Reduced operating costs infrastructure requirements
• User-friendly interface • Increased training cost
• Less danger of a single- • Costs incurred due to the
point failure requirement of duplicated
• Processor independence infrastructure
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTED PROCESSING AND DISTRIBUTED
DATABASES
Distributed processing – a database’s logical processing is shared among two or
more physically independent sites that are connected through a network
One computer performs I/O, data selection and validation while second
computer creates reports
Uses a single-site database but the processing chores are shared among several
sites
Distributed database – stores a logically related database over two or more
physically independent sites. The sites are connected via a network
Database is composed of database fragments which are located at different
sites and may also be replicated among various sites
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTED PROCESSING ENVIRONMENT
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTED DATABASE ENVIRONMENT
MUHAMMAD HAMIZ MOHD RADZI
Distributed processing does not require a distributed database, but a
distributed database requires distributed processing (each database
fragment is managed by its own local database process).
Distributed processing may be based on a single database located on a
single computer. For the management of distributed data to occur, copies or
parts of the database processing functions must be distributed to all data
storage sites.
Both distributed processing and distributed databases require a network to
connect all components.
MUHAMMAD HAMIZ MOHD RADZI
CHARACTERISTICS OF DDBMS
A DBMS must have at least the following functions to be classified as distributed:
Application interface to interact with the end user, application programs, and other DBMSs
within the distributed database.
Validation to analyze data requests for syntax correctness.
Transformation to decompose complex requests into atomic data request components.
Query optimization to find the best access strategy. (Which database fragments must be
accessed by the query, and how must data updates, if any, be synchronized?)
Mapping to determine the data location of local and remote fragments.
I/O interface to read or write data from or to permanent local storage.
MUHAMMAD HAMIZ MOHD RADZI
Formatting to prepare the data for presentation to the end user or to an
application program.
Security to provide data privacy at both local and remote databases.
Backup and recovery to ensure the availability and recoverability of the database
in case of a failure.
DB administration features for the database administrator.
Concurrency control to manage simultaneous data access and to ensure data
consistency across database fragments in the DDBMS.
Transaction management to ensure that the data moves from one consistent state to
another.
This activity includes the synchronization of local and remote transactions as well as
transactions across multiple distributed segments.
MUHAMMAD HAMIZ MOHD RADZI
Must perform all the functions of a centralized DBMS
Must handle all necessary functions imposed by the distribution of data and
processing
Must perform these additional functions transparently to the end user
MUHAMMAD HAMIZ MOHD RADZI
DDBMS COMPONENTS
MUHAMMAD HAMIZ MOHD RADZI
Computer workstations or remote devices (sites or nodes) that form the network
system.
Network hardware and software components that reside in each workstation or
device. The network components allow all sites to interact and exchange data.
Communications media that carry the data from one node to another. The DDBMS
must be communications media-independent; that is, it must be able to support
several types of communications media.
The transaction processor (TP), which is the software component found in each
computer or device that requests data. The transaction processor receives and
processes the application’s data requests (remote and local). The TP is also known
as the application processor (AP) or the transaction manager (TM).
The data processor (DP), which is the software component residing on each
computer or device that stores and retrieves data located at the site. The DP is also
known as the data manager (DM). A data processor may even be a centralized
DBMS.
MUHAMMAD HAMIZ MOHD RADZI
MUHAMMAD HAMIZ MOHD RADZI
FUNCTIONS OF DISTRIBUTED DBMS
Receives the request of an application
Validates analyzes, and decomposes the request
Maps the request
Decomposes request into several I/O operations
Searches and validates data
Ensures consistency, security, and integrity
Validates data for specific conditions
Presents data in required format
MUHAMMAD HAMIZ MOHD RADZI
LEVELS OF DATA & PROCESS DISTRIBUTION
Current database systems can be classified on the basis of how process distribution
and data distribution are supported.
For example, a DBMS may store data in a single site (centralized DB) or in multiple
sites (distributed DB) and may support data processing at a single site or at
multiple sites.
MUHAMMAD HAMIZ MOHD RADZI
SINGLE-SITE PROCESSING, SINGLE-SITE DATA
(SPSD)
Processing is done on a single host computer
Data stored on host computer’s local disk
Processing restricted on end user’s side
DBMS is accessed by dumb terminals
MUHAMMAD HAMIZ MOHD RADZI
FIGURE 12.6 - SINGLE-SITE PROCESSING, SINGLE-
SITE DATA (CENTRALIZED)
MUHAMMAD HAMIZ MOHD RADZI
MULTIPLE-SITE PROCESSING, SINGLE-SITE DATA
(MPSD)
Multiple processes run on different computers sharing a single data repository
Require network file server running conventional applications
Accessed through LAN
Client/server architecture
Reduces network traffic
Processing is distributed
Supports data at multiple sites
MUHAMMAD HAMIZ MOHD RADZI
MULTIPLE-SITE PROCESSING, SINGLE-SITE DATA
MUHAMMAD HAMIZ MOHD RADZI
MULTIPLE-SITE PROCESSING, SINGLE-SITE DATA
(MPSD)
Fully distributed database management system
Support multiple data processors and transaction processors at multiple sites
Classification of DDBMS depending on the level of support for various types of
databases
Homogeneous: Integrate multiple instances of same DBMS over a network
Heterogeneous: Integrate different types of DBMSs
Fully heterogeneous: Support different DBMSs, each supporting different data
model
MUHAMMAD HAMIZ MOHD RADZI
RESTRICTIONS OF DDBMS
Remote access is provided on a read-only basis
Restrictions on the number of remote tables that may be accessed in a single
transaction
Restrictions on the number of distinct databases that may be accessed
Restrictions on the database model that may be accessed
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTED DATABASE TRANSPARENCY
Distribution Transaction Failure
transparency transparency transparency
Performance Heterogeneity
transparency transparency
MUHAMMAD HAMIZ MOHD RADZI
Distribution transparency – user does not know where data is located and if
replicated or partitioned
Transaction transparency – transaction can update at several network sites to
ensure data integrity
Failure transparency – system continues to operate in the event of a node failure
(other nodes pick up lost functionality)
Performance transparency – allows system to perform as if it were a centralized
DBMS. No performance degradation due to use of a network or platform
differences
Heterogeneity transparency – allows the integration of several different local
DBMSs under a common schema
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTION TRANSPARENCY
Allows management of a physically dispersed database as though it were a centralized
database
Supported by a distributed data dictionary (DDD) which contains the description of the entire
database as seen by the DBA
The DDD is itself distributed and replicated at the network nodes
Three levels of distribution transparency are recognized:
1. Fragmentation transparency – user does not need to know if a database is partitioned;
fragment names and/or fragment locations are not needed
2. Location transparency – fragment name, but not location, is required
3. Local mapping transparency – user must specify fragment name and location
MUHAMMAD HAMIZ MOHD RADZI
MUHAMMAD HAMIZ MOHD RADZI
The EMPLOYEE table is divided among three locations (no replication)
Suppose an employee wants to find all employees with a birthdate prior to jan
1, 1940
Fragmentation transparency-
SELECT * FROM EMPLOYEE WHERE EMP_DOB < ’01-JAN-1940’;
Location transparency-
SELECT * FROM E1 WHERE EMP_DOB < ’01-JAN-1940’ UNION SELECT * FROM E2 …
UNION SELECT * FROM E3…;
Local Mapping Transparency
SELECT * FROM E1 NODE NY WHERE EMP_DOB < ’01-JAN-1940’ UNION SELECT * FROM
E2 NODE ATL … UNION SELECT * FROM E3 NODE MIA…;
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTED DATABASE DESIGN
Data fragmentation:
How to partition the database into fragments
Data replication:
Which fragments to replicate
Data allocation:
Where to locate those fragments and replicas
MUHAMMAD HAMIZ MOHD RADZI
DATA FRAGMENTATION
Breaks single object into two or more segments or fragments
Each fragment can be stored at any site over a computer network
Information about data fragmentation is stored in the distributed data
catalog (DDC), from which it is accessed by the TP to process user
requests
MUHAMMAD HAMIZ MOHD RADZI
DATA FRAGMENTATION STRATEGIES / TYPES
Horizontal fragmentation:
Division of a relation into subsets (fragments) of tuples (rows)
Same Schema
Vertical fragmentation:
Division of a relation into attribute (column) subsets
Each fragment must contain the primary key
Mixed fragmentation:
Combination of horizontal and vertical strategies MUHAMMAD HAMIZ MOHD RADZI
MUHAMMAD HAMIZ MOHD RADZI
HORIZONTAL FRAGMENTATION EXAMPLE
MUHAMMAD HAMIZ MOHD RADZI
VERTICALLY FRAGMENTED TABLE CONTENTS
MUHAMMAD HAMIZ MOHD RADZI
A SAMPLE CUSTOMER TABLE
MUHAMMAD HAMIZ MOHD RADZI
HORIZONTAL FRAGMENTATION OF THE CUSTOMER
TABLE BY STATE
MUHAMMAD HAMIZ MOHD RADZI
VERTICALLY FRAGMENTED TABLE CONTENTS
Two separate areas in the company use different fields of the table in
the daily activities – the SERVICE dept and the COLLECTIONS dept
MUHAMMAD HAMIZ MOHD RADZI
MIXED FRAGMENTATION OF THE CUSTOMER TABLE
The table is divided horizontally by the three states and within
each state there is a vertical fragmentation by department
MUHAMMAD HAMIZ MOHD RADZI
TABLE CONTENTS AFTER THE MIXED FRAGMENTATION PROCESS
MUHAMMAD HAMIZ MOHD RADZI
CORRECTNESS RULES
Completeness
Any data item must be covered by at least one fragment
Reconstruction
There exist relation operators (select, project, join…) to reconstruct the original table from
fragments
Horizontal fragmentation - Union
Vertical fragmentation - Join
Hybrid fragmentation - combination of relational operators
Disjointness
Remove duplicate data
Horizontal fragmentation - a row cannot appear in more than one fragment.
Vertical fragmentation – non-key attributes cannot appear in more than one fragment.
MUHAMMAD HAMIZ MOHD RADZI
DATA REPLICATION
Storage of data copies at multiple sites served by a computer network
Fragment copies can be stored at several sites to serve specific information
requirements
Can enhance data availability and response time
Can help to reduce communication and total query costs
Imposes additional processing overhead
Which copy do you read when submitting a query
All copies must be updated when a write occurs
MUHAMMAD HAMIZ MOHD RADZI
DATA REPLICATION
MUHAMMAD HAMIZ MOHD RADZI
REPLICATION SCENARIOS
Fully replicated database:
Stores multiple copies of each database fragment at multiple sites
Can be impractical due to amount of overhead
Partially replicated database:
Stores multiple copies of some database fragments at multiple sites
Most DDBMSs are able to handle the partially replicated database well
Un-replicated database:
Stores each database fragment at a single site
No duplicate database fragments
Database size, usage frequency and costs (performance, overhead, management)
influence the decision to replicate
MUHAMMAD HAMIZ MOHD RADZI
DATA ALLOCATION
Deciding where to locate data: Data distribution over a computer network is achieved
through data partition, data replication, or a combination of both
Allocation strategies:
Centralized data allocation
Entire database is stored at one site
Partitioned data allocation
Database is divided into several disjointed parts (fragments) and stored at several
sites
Replicated data allocation
Copies of one or more database fragments are stored at several sites MUHAMMAD HAMIZ MOHD RADZI
REFERENCES
Database Systems: A Practical Approach to Design, Implementation, and Management,
Thomas Connolly and Carolyn Begg, 5th Edition, 2010, Pearson.
Fundamental of Database Management Systems, Mark L. G., 2nd Edition, 2012, John
Wiley.
MUHAMMAD HAMIZ MOHD RADZI