ADBMS Notes 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Module:3

Q.1 Define distributed database.


Distributed Database: A distributed database is a collection of multiple interconnected
databases, which are spread physically across various locations that communicate via a
computer network.

Q.2 State the advantages of distributed databases over centralized databases.


Following are the advantages of distributed databases over centralized databases.

1. Modular Development − If the system needs to be expanded to new locations or new

units, in centralized database systems, the action requires substantial efforts and

disruption in the existing functioning. However, in distributed databases, the work

simply requires adding new computers and local data to the new site and finally

connecting them to the distributed system, with no interruption in current functions.

2. More Reliable − In case of database failures, the total system of centralized databases

comes to a halt. However, in distributed systems, when a component fails, the

functioning of the system continues may be at a reduced performance. Hence DDBMS

is more reliable.

3. Better Response − If data is distributed in an efficient manner, then user requests can

be met from local data itself, thus providing faster response. On the other hand, in

centralized systems, all queries have to pass through the central computer for

processing, which increases the response time.

4. Lower Communication Cost − In distributed database systems, if data is located locally

where it is mostly used, then the communication costs for data manipulation can be

minimized. This is not feasible in centralized systems.

Q.3 List the features of distributed databases and brief them.


A distributed database is a collection of multiple interconnected databases, which are spread

physically across various locations that communicate via a computer network.

Features
● Databases in the collection are logically interrelated with each other. Often they

represent a single logical database.

● Data is physically stored across multiple sites. Data in each site can be managed

by a DBMS independent of the other sites.

● The processors in the sites are connected via a network. They do not have any

multiprocessor configuration.

● A distributed database is not a loosely connected file system.

● A distributed database incorporates transaction processing, but it is not

synonymous with a transaction processing system.

Q.4 Define Replication.

Replication involves using specialized software that looks for changes in the distributive

database. Once the changes have been identified, the replication process makes all the

databases look the same. The replication process can be complex and time-consuming,

depending on the size and number of the distributed databases. This process can also require

much time and computer resources.

• If the distributed database is (partially or fully) replicated, it is necessary to implement

protocols that ensure the consistency of the replicas, i.e. copies of the same data item have the

same value.

• These protocols can be eager in that they force the updates to be applied to all the replicas

before the transactions completes, or they may be lazy so that the transactions updates one

copy (called the master) from which updates are propagated to the others after the transaction

completes.
Q.5 Write short notes on distributed concurrency control.

Distributed Concurrency Control

• Concurrency control involves the synchronization of access to the distributed database, such

that the integrity of the database is maintained. It is, without any doubt, one of the most

extensively studies problems in the DDBS field.

• The concurrency control problem in a distributed context is somewhat different that in a

centralized framework. One not only has to worry about the integrity of a single database, but

also about the consistency of multiple copies of the database. The condition that requires all

values of multiple copies of every data item to converge to the same value is called mutual

consistency.

• Let us only mention that the two general classes are pessimistic, synchronizing the execution

of the user request before the execution starts, and optimistic, executing requests and then

checking if the execution has compromised the consistency of the database.

• Two fundamental primitives that can be used with both approaches are locking, which is

based on the mutual exclusion of access to data items, and time-stamping, where transactions

executions are ordered based on timestamps.

• There are variations of these schemes as well as hybrid algorithms that attempt to combine

the two basic mechanisms.

Q.6 Explain the design issues of distributed databases.

The following are the design issues related to distributed databases

1. Distributed Database Design

• One of the main questions that is being addressed is how database and the applications that

run

against it should be placed across the sites.


• There are two basic alternatives to placing data: partitioned (or no-replicated) and replicated.

• In the partitioned scheme the database is divided into a number of disjoint partitions each of

which is placed at a different site. Replicated designs can be either fully replicated (also called

fully duplicated) where the entire database is stored at each site, or partially replicated (or

partially duplicated) where each partition of the database is stored at more than one site, but

not at all the sites.

• The two fundamental design issues are fragmentation, the separation of the database into

partitions called fragments, and distribution, the optimum distribution of fragments. The

research in this area mostly involves mathematical programming in order to minimize the

combined cost of storing the database, processing transactions against it, and message

communication among site.

2. Distributed Directory Management

• A directory contains information (such as descriptions and locations) about data items in the

database. Problems related to directory management are similar in nature to the database

placement problem discussed in the preceding section.

• A directory may be global to the entire DDBS or local to each site; it can be centralized at one

site or distributed over several sites; there can be a single copy or multiple copies.

3. Distributed Query Processing

• Query processing deals with designing algorithms that analyze queries and convert them

into a series of data manipulation operations. The problem is how to decide on a strategy for

executing each query over the network in the most cost-effective way, however cost is defined.

• The factors to be considered are the distribution of data, communication cost, and lack of

sufficient locally-available information. The objective is to optimize where the inherent

parallelism is used to improve the performance of executing the transaction, subject to the

above mentioned constraints.


4. Distributed Concurrency Control

• Concurrency control involves the synchronization of access to the distributed database, such

that the integrity of the database is maintained. It is, without any doubt, one of the most

extensively studied problems in the DDBS field.

• The concurrency control problem in a distributed context is somewhat different that in a

centralized framework. One not only has to worry about the integrity of a single database, but

also about the consistency of multiple copies of the database. The condition that requires all

values of multiple copies of every data item to converge to the same value is called mutual

consistency.

• Let us only mention that the two general classes are pessimistic, synchronizing the execution

of the user request before the execution starts, and optimistic, executing requests and then

checking if the execution has compromised the consistency of the database.

• Two fundamental primitives that can be used with both approaches are locking, which is

based on the mutual exclusion of access to data items, and time-stamping, where transactions

executions are ordered based on timestamps.

• There are variations of these schemes as well as hybrid algorithms that attempt to combine

the two basic mechanisms.

5. Distributed Deadlock Management

• The deadlock problem in DDBSs is similar in nature to that encountered in operating

systems.

• The competition among users for access to a set of resources (data, in this case) can result in

a deadlock if the synchronization mechanism is based on locking. The well-known alternatives

of prevention, avoidance, and detection/recovery also apply to DDBSs.

6. Reliability of Distributed DBMS


• It is important that mechanisms be provided to ensure the consistency of the database as

well as to detect failures and recover from them. The implication for DDBSs is that when a

failure occurs and various sites become either inoperable or inaccessible, the databases at the

operational sites remain consistent and up to date.

• Furthermore, when the computer system or network recovers from the failure, the DDBSs

should be able to recover and bring the databases at the failed sites up-to-date. This may be

especially difficult in the case of network partitioning, where the sites are divided into two or

more groups with no communication among them.

7. Replication

• If the distributed database is (partially or fully) replicated, it is necessary to implement

protocols that ensure the consistency of the replicas, i.e. copies of the same data item have the

same value.

• These protocols can be eager in that they force the updates to be applied to all the replicas

before the transaction completes, or they may be lazy so that the transactions update one copy

(called the master) from which updates are propagated to the others after the transaction

completes.

Q.7 What is a homogeneous database?

Homogeneous Distributed Databases

In a homogeneous distributed database, all the sites use identical DBMS and operating

systems. Its properties are −

● The sites use very similar software.

● The sites use identical DBMS or DBMS from the same vendor.

● Each site is aware of all other sites and cooperates with other sites to process user requests.

● The database is accessed through a single interface as if it is a single database.


Q.8 What is a heterogeneous database?

In a heterogeneous distributed database, different sites have different operating systems,

DBMS products and data models. Its properties are −

● Different sites use dissimilar schemas and software.

● The system may be composed of a variety of DBMSs like relational, network, hierarchical or

object oriented.

● Query processing is complex due to dissimilar schemas.

● Transaction processing is complex due to dissimilar software.

● A site may not be aware of other sites and so there is limited co-operation in processing user

requests.

Q.9 Discuss the types of homogeneous and heterogeneous databases.

Types of Homogeneous Distributed Database

There are two types of homogeneous distributed database −

● Autonomous − Each database is independent that functions on its own. They are integrated

by a controlling application and use message passing to share data updates.

● Non-autonomous − Data is distributed across the homogeneous nodes and a central or

master DBMS coordinates data updates across the sites.

Types of Heterogeneous Distributed Databases

● Federated − Heterogeneous database systems are independent in nature and integrated

together so that they function as a single database system.

● Un-federated − Database systems employ a central coordinating module through which the

databases are accessed.


Q.10 Define Autonomy.

Autonomy − It indicates the distribution of control of the database system and the degree to

which each constituent DBMS can operate independently.

Q.11 Define Distribution.

Distribution − It states the physical distribution of data across the different sites.

Q.12 Draw and explain the client-server architecture.

Client - Server Architecture for DDBMS

This is a two-level architecture where the functionality is divided into servers and clients. The

server functions primarily encompass data management, query processing, optimization and

transaction management. Client functions include mainly user interface. However, they have

some functions like consistency checking and transaction management.

The two different client - server architectures are −

● Single Server Multiple Client

● Multiple Server Multiple Client (shown in the following diagram)

You might also like