0% found this document useful (0 votes)
3 views

Chapter 10 - Distributed Databases

The document discusses distributed databases, which allow transactions to be executed across multiple networked computers while maintaining distribution transparency. It outlines the advantages such as increased reliability, performance, and scalability, as well as disadvantages including complexity and cost. Additionally, it covers concepts like data replication, fragmentation, and various types of distributed database systems, along with concurrency control and client-server architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 10 - Distributed Databases

The document discusses distributed databases, which allow transactions to be executed across multiple networked computers while maintaining distribution transparency. It outlines the advantages such as increased reliability, performance, and scalability, as well as disadvantages including complexity and cost. Additionally, it covers concepts like data replication, fragmentation, and various types of distributed database systems, along with concurrency control and client-server architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Exit Exam Tutorial

Part 2: Fundamental Database Management Systems


Episode 10: Distributed Databases
2.10 Distributed Databases
2.10.1 Distributed Databases Concepts
- A transaction can be executed by multiple networked
computers in a unified manner.
- A distributed database (DDB) processes Unit of execution (a
transaction) in a distributed manner.
- A distributed database (DDB) can be defined as:
 A collection of multiple logically related database
distributed over a computer network, and a distributed
database management system as a software system that
manages a distributed database while making the
distribution transparent to the user.
 The physical placement of data (files, relations, etc.) which
is not known to the user (distribution transparency).
2.10 Distributed Databases
2.10.2 Advantages of Distributed Databases
i. Distribution and Network transparency: Users do not have
to worry about operational details of the network.
- There is Location transparency, which refers to freedom of
issuing command from any location without affecting its
working.
- Then there is Naming transparency, which allows access to
any names object (files, relations, etc.) from any location.
ii. Replication transparency: It allows to store copies of a data
at multiple sites.
- This is done to minimize access time to the required data.
2.10 Distributed Databases
iii. Fragmentation transparency: Allows to fragment a
relation horizontally (create a subset of rows of a relation) or
vertically (create a subset of columns of a relation).
iv. Increased reliability and availability: Reliability refers to
system live time, that is, system is running efficiently most of
the time.
- Availability is the probability that the system is
continuously available (usable or accessible) during a time
interval.
v. Improved performance: A distributed DBMS fragments the
database to keep data closer to where it is needed most.
- This reduces data management (access and modification)
time significantly.
2.10 Distributed Databases
vi. Easier expansion (scalability): Allows new nodes
(computers) to be added anytime without changing the entire
configuration.
2.10 Distributed Databases
2.10.3 Disadvantages of Distributed Databases
i. Complexity: The data replication , failure recovery ,
network management ... make the system more complex than
the central DBMSs.
ii. Cost: Since DDBMS needs more people and more
hardware, maintaining and running the system can be more
expensive than the centralized system.
iii. Problem of connecting Dissimilar Machine: Additional
layers of operation system software are needed to translate
and coordinate the flow of data between machines.
iv. Data integrity and security problem: Because data
maintained by distributed systems can be accessed at
locations in the network, controlling the integrity of a
database can be difficult.
2.10 Distributed Databases
2.10.4 Data Replication and Fragmentation: Distributed Data
Storage
i. Data Replication
- The system maintain several identical copies of the relation
& store each copy at a different site.
- In general it enhance the performance of read operation
and increase the availability of data to read only
transaction.
- However, update transactions incur greater overhead.
2.10 Distributed Databases
ii. Data Fragmentation
- Split a relation into logically related and correct parts.
- The main reasons for fragmenting a relation are
• Efficiency: data that is not needed by the local applications
is not stored.
• Parallelism: a transaction can be divided into several sub-
queries that operate on fragments which will increase the
degree of concurrency.
- But reconstruction of the whole relation will require
accessing data from all sites containing part of the relation.
2.10 Distributed Databases
A relation can be fragmented in two ways:
 Horizontal fragmentation: It is a horizontal subset of a
relation which contain those of rows which satisfy selection
conditions.
- Consider the Employee relation with selection condition
(DNO = 5).
- All rows satisfy this condition will create a subset which will
be a horizontal fragment of Employee relation.
- A selection condition may be composed of several
conditions connected by AND or OR.
2.10 Distributed Databases
 Vertical fragmentation: It is a subset of a relation which is
created by a subset of columns.
- Thus a vertical fragment of a relation will contain values of
selected columns.
- Consider the Employee relation. A vertical fragment of can
be created by keeping the values of Name, B-date, Sex, and
Address.
- Because there is no condition for creating a vertical
fragment, each fragment must include the primary key
attribute of the parent relation Employee.
- In this way all vertical fragments of a relation are
connected.
2.10 Distributed Databases
 Representation
 There three rules that must be followed during
fragmentation:
- Completeness: if a relation r is decomposed into fragments
r1, r2… rn , each data item that can be found in r must
appear in at least one fragment.
- Reconstruction: it must be possible to define a relation
operation that will reconstruct the relation r from
fragments.
- Disjointness: if a data item di appears in fragment ri , then it
shouldn’t appear in any other fragment.
2.10 Distributed Databases
2.10.5 Types of Distributed Database Systems
Homogeneous:
- All sites of the database system have identical setup, i.e.,
same database system software.
- The system may have little or no local autonomy.
- The underlying operating systems can be a mixture of Linux,
Window, Unix, etc.
2.10 Distributed Databases
Heterogeneous:
- At least one of the database must be from different vendor :
two variants.
 Federated: Each site may run different database system but
the data access is managed through a single conceptual
schema.
- This implies that the degree of local autonomy is minimum.
Each site must adhere to a centralized access policy.
- There may be a global schema.
 Multi-database: There is no one conceptual global schema.
- For data access a schema is constructed dynamically as
needed by the application software.
2.10 Distributed Databases
2.10.6 Concurrency Control and Recovery in Distributed
Databases
- Distributed Databases encounter a number of concurrency
control and recovery problems which are not present in
centralized databases.
2.10 Distributed Databases
Dealing with multiple copies of data items:
- The concurrency control must maintain global consistency.
- Likewise the recovery mechanism must recover all copies
and maintain consistency after recovery.
 Failure of individual sites: Database availability must not be
affected due to the failure of one or two sites and the
recovery scheme must recover them before they are
available for use.
 Communication link failure: This failure may create
network partition which would affect database availability
even though all database sites may be running.
2.10 Distributed Databases
 Distributed commit: A transaction may be fragmented and
they may be executed by a number of sites.
- This require a two or three-phase commit approach for
transaction commit.
 Distributed deadlock: Since transactions are processed at
multiple sites, two or more sites may get involved in
deadlock.
- This must be resolved in a distributed manner.
2.10 Distributed Databases
2.10.6.1 Distributed Concurrency Protocol
i. Primary site technique: A single site is designated as a
primary site which serves as a coordinator for transaction
management.
Transaction management:
- Concurrency control and commit are managed by this site.
- In two phase locking, this site manages locking and
releasing data items. If all transactions follow two-phase
policy at all sites, then serializability is guaranteed.
Advantages:
- An extension to the centralized two phase locking so
implementation and management is simple.
- Data items are locked only at one site but they can be
accessed at any site.
2.10 Distributed Databases
Disadvantages:
- All transaction management activities go to primary site
which is likely to overload the site.
- If the primary site fails, the entire system is inaccessible.
- To aid recovery a backup site is designated which behaves
as a shadow of primary site.
- In case of primary site failure, backup site can act as
primary site.
2.10 Distributed Databases
ii. Primary copy technique: In this approach, instead of a site,
a data item partition is designated as primary copy.
- To lock a data item just the primary copy of the data item is
locked.
Advantages:
- Since primary copies are distributed at various sites, a
single site is not overloaded with locking and unlocking
requests.
Disadvantages:
- Identification of a primary copy is complex. A distributed
directory must be maintained, possibly at all sites.
2.10 Distributed Databases
RECOVERY FROM COORDINATION FAILURE
- In both approaches a coordinator site or copy may become
unavailable.
- This will require the selection of a new coordinator.
- Primary site approach with no backup site: Aborts and
restarts all active transactions at all sites. Elects a new
coordinator and initiates transaction processing.
- Primary site approach with backup site: Suspends all active
transactions, designates the backup site as the primary site
and identifies a new back up site.
 Primary site receives all transaction management
information to resume processing.
2.10 Distributed Databases
- Primary and backup sites fail or no backup site: Use election
process to select a new coordinator site.
iii. Concurrency Control Based on voting:
- There is no primary copy of coordinator.
- Send lock request to sites that have data item.
- If majority of sites grant lock then the requesting
transaction gets the data item.
- Locking information (grant or denied) is sent to all these
sites.
- To avoid unacceptably long wait, a time-out period is
defined. If the requesting transaction does not get any vote
information then the transaction is aborted.
2.10 Distributed Databases
2.10.7 Client-Server Distributed Database Architecture
- It consists of clients running client software, a set of servers
which provide all database functionalities and a reliable
communication infrastructure.
- Many Web applications use an architecture called the
three-tier architecture, which adds an intermediate layer
between the client and the database server.
- This intermediate layer called the Web server.
- This server plays an intermediary role by storing business
rules (constraints) that are used to access data from the
database server.
2.10 Distributed Databases
- It can also improve database security by checking a client's
credentials before forwarding a request to the database
server.
- The intermediate server accepts requests from the client,
processes the request and sends database commands to the
database server, and then acts as a conduit for passing
(partially) processed data from the database server to the
clients.
- Clients reach server for desired service, but server does
reach clients.
- The server software is responsible for local data
management at a site, much like centralized DBMS
software.
- The client software is responsible for most of the
distribution function.
2.10 Distributed Databases
- The communication software manages communication
among clients and servers.
- The processing of a SQL queries goes as follows:
 Client parses a user query and decomposes it into a number
of independent sub-queries. Each sub-query is sent to
appropriate site for execution.
 Each server processes its query and sends the result to the
client.
 The client combines the results of sub-queries and produces
the final result.
2.10 Distributed Databases
Special Thanks to the publisher and author with:
2.10 Distributed Databases
TOPICS AND THE CONCEPTS:
Distributed Databases Concepts
Replication
Fragmentation
Federated Distributed Databases
Multi-database Distributed Databases
Homogeneous Distributed Database
Heterogeneous Distributed Database
Client-server Distributed Database Architectures

REFERENCES:
Fundamental Database Management Systems (6th Edition) by Ramez Elmasri, Shamkant B. Navaathe
Database Systems: A Practical Approach to Design, Implementation, and Management (6th Edition) by
Thomas Connolly, Carolyn Begg

PRESENTED BY:
Mohammed Nebil

HISTORY OF THE PROGRAMMING:


Boyce Codd

SPECIAL THANKS:
Digital Library of Educations
Federal Democratic Republic of Ethiopia, Ministry of Educations
Ethiopian Education Short Note

You might also like