0% found this document useful (0 votes)
3 views

8 Distributed Databases

A distributed system is a software application that coordinates multiple processes across a network using protocols for communication. Distributed databases, which can be homogeneous or heterogeneous, improve reliability, availability, and performance by storing data across different locations and managing it through a Distributed Database Management System. Key concepts include data fragmentation, replication, query processing, and concurrency control to ensure efficient operation and recovery in the event of failures.

Uploaded by

canolowana6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

8 Distributed Databases

A distributed system is a software application that coordinates multiple processes across a network using protocols for communication. Distributed databases, which can be homogeneous or heterogeneous, improve reliability, availability, and performance by storing data across different locations and managing it through a Distributed Database Management System. Key concepts include data fragmentation, replication, query processing, and concurrency control to ensure efficient operation and recovery in the event of failures.

Uploaded by

canolowana6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

DISTRIBUTED SYSTEMS

What is a distributed system? It's one of those things that's hard to define without first defining many other things.
Here is a "cascading" definition of a distributed system:

A program
is the code you write.

A process
is what you get when you run it.

A message
is used to communicate between processes.

A packet
is a fragment of a message that might travel on a wire.

A protocol
is a formal description of message formats and the rules that two processes must follow in order to
exchange those messages.

A network
is the infrastructure that links computers, workstations, terminals, servers, etc. It consists of routers which
are connected by communication links.

A component
can be a process or any piece of hardware required to run a process, support communications between
processes, store data, etc.

A distributed system
is an application that executes a collection of protocols to coordinate the actions of multiple processes
on a network, such that all components cooperate together to perform a single or small set of related tasks.

A distributed system is a software system in which components located on networked


computers communicate and coordinate their actions by passing messages.

Motivation, Definition and Challenges of Distributed Systems

Why build Distributed Systems

 Fault-Tolerant: It can recover from component failures without performing incorrect actions.
 Highly Available: It can restore operations, permitting it to resume providing services even when
some components have failed.
 Recoverable: Failed components can restart themselves and rejoin the system, after the cause of
failure has been repaired.
 Consistent: The system can coordinate actions by multiple components often in the presence of
concurrency and failure. This underlies the ability of a distributed system to act like a non-
distributed system.
 Scalable: It can operate correctly even as some aspect of the system is scaled to a larger size. For
example, we might increase the size of the network on which the system is running. This increases
the frequency of network outages and could degrade a "non-scalable" system. Similarly, we might
increase the number of users or servers, or overall load on the system. In a scalable system, this
should not have a significant effect.
 Predictable Performance: The ability to provide desired responsiveness in a timely manner.
 Secure: The system authenticates access to data and services

DISTRIBUTED DATABASES

What are distributed databases?

 Distributed database is a system in which storage devices are not connected to a common
processing unit.
 Database is controlled by Distributed Database Management System and data may be
stored at the same location or spread over the interconnected network. It is a loosely
coupled system.
 Shared nothing architecture is used in distributed databases.
 The above diagram is a typical example of distributed database system, in which
communication channel is used to communicate with the different locations and every
system has its own memory and database.

Goals of Distributed Database system.

The concept of distributed database was built with a goal to improve:

Reliability: In distributed database system, if one system fails down or stops working for
some time another system can complete the task.
Availability: In distributed database system reliability can be achieved even if sever fails
down. Another system is available to serve the client request.
Performance: Performance can be achieved by distributing database over different
locations. So the databases are available to every location which is easy to maintain.

Types of distributed databases.

The two types of distributed systems are as follows:

1. Homogeneous distributed databases system:


 Homogeneous distributed database system is a network of two or more databases (With
same type of DBMS software) which can be stored on one or more machines.
 So, in this system data can be accessed and modified simultaneously on several databases
in the network. Homogeneous distributed system is easy to handle.
Example: Consider that we have three departments using Oracle-9i for DBMS. If some
changes are made in one department then, it would update the other department also.

2. Heterogeneous distributed database system.


 Heterogeneous distributed database system is a network of two or more databases with
different types of DBMS software, which can be stored on one or more machines.
 In this system data can be accessible to several databases in the network with the help of
generic connectivity (ODBC and JDBC).
Example: In the following diagram, different DBMS software are accessible to each
other using ODBC and JDBC.

The basic types of distributed DBMS are as follows:

1. Client-server architecture of Distributed system.

 A client server architecture has a number of clients and a few servers connected in a
network.
 A client sends a query to one of the servers. The earliest available server solves it and
replies.
 A Client-server architecture is simple to implement and execute due to centralized server
system.

2. Collaborating server architecture.

 Collaborating server architecture is designed to run a single query on multiple servers.


 Servers break single query into multiple small queries and the result is sent to the client.
 Collaborating server architecture has a collection of database servers. Each server is
capable for executing the current transactions across the databases.

3. Middleware architecture.

 Middleware architectures are designed in such a way that single query is executed on
multiple servers.
 This system needs only one server which is capable of managing queries and transactions
from multiple servers.
 Middleware architecture uses local servers to handle local queries and transactions.
 The software are used for execution of queries and transactions across one or more
independent database servers, this type of software is called as middleware.

What is fragmentation?

 The process of dividing the database into a smaller multiple parts is called
as fragmentation.
 These fragments may be stored at different locations.
 The data fragmentation process should be carried out in such a way that the
reconstruction of original database from the fragments is possible.

Types of data Fragmentation

There are three types of data fragmentation:


1. Horizontal data fragmentation

Horizontal fragmentation divides a relation(table) horizontally into the group of rows to


create subsets of tables.

Example:
Account (Acc_No, Balance, Branch_Name, Type).
In this example if values are inserted in table Branch_Name as Pune, Baroda, Delhi.

The query can be written as:


SELECT*FROM ACCOUNT WHERE Branch_Name= “Baroda”

Types of horizontal data fragmentation are as follows:

1) Primary horizontal fragmentation


Primary horizontal fragmentation is the process of fragmenting a single table, row wise
using a set of conditions.

Example:

Acc_No Balance Branch_Name

A_101 5000 Pune

A_102 10,000 Baroda

A_103 25,000 Delhi

For the above table we can define any simple condition like, Branch_Name= 'Pune',
Branch_Name= 'Delhi', Balance < 50,000

Fragmentation1:
SELECT * FROM Account WHERE Branch_Name= 'Pune' AND Balance < 50,000

Fragmentation2:
SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000

2) Derived horizontal fragmentation


Fragmentation derived from the primary relation is called as derived horizontal
fragmentation.

Example: Refer the example of primary fragmentation given above.

The following fragmentation are derived from primary fragmentation.

Fragmentation1:
SELECT * FROM Account WHERE Branch_Name= 'Baroda' AND Balance < 50,000

Fragmentation2:
SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000

3) Complete horizontal fragmentation


 The complete horizontal fragmentation generates a set of horizontal fragmentation, which
includes every table of original relation.
 Completeness is required for reconstruction of relation so that every table belongs to at
least one of the partitions.
4) Disjoint horizontal fragmentation
The disjoint horizontal fragmentation generates a set of horizontal fragmentation in which
no two fragments have common tables. That means every table of relation belongs to only
one fragment.

5) Reconstruction of horizontal fragmentation


Reconstruction of horizontal fragmentation can be performed using UNION operation on
fragments.

2. Vertical Fragmentation

Vertical fragmentation divides a relation(table) vertically into groups of columns to create


subsets of tables.

Example:

Acc_No Balance Branch_Name

A_101 5000 Pune

A_102 10,000 Baroda

A_103 25,000 Delhi

Fragmentation1:
SELECT * FROM Acc_NO

Fragmentation2:
SELECT * FROM Balance

Complete vertical fragmentation


 The complete vertical fragmentation generates a set of vertical fragments, which can
include all the attributes of original relation.
 Reconstruction of vertical fragmentation is performed by using Full Outer Join operation
on fragments.

3) Hybrid Fragmentation

 Hybrid fragmentation can be achieved by performing horizontal and vertical partition


together.
 Mixed fragmentation is group of rows and columns in relation.

Example: Consider the following table which consists of employee information.


Emp_ID Emp_Nam Emp_Addres Emp_Age Emp_Salary
e s

101 Surendra Baroda 25 15000

102 Jaya Pune 37 12000

103 Jayesh Pune 47 10000

Fragmentation1:
SELECT * FROM Emp_Name WHERE Emp_Age < 40

Fragmentation2:
SELECT * FROM Emp_Id WHERE Emp_Address= 'Pune' AND Salary < 14000

Reconstruction of Hybrid Fragmentation


The original relation in hybrid fragmentation is reconstructed by performing UNION and
FULL OUTER JOIN.

What is data replication?

Data replication is the process in which the data is copied at multiple locations (Different
computers or servers) to improve the availability of data.

Goals of data replication

Data replication is done with an aim to:


 Increase the availability of data.
 Speed up the query evaluation.

Types of data replication

There are two types of data replication:

1. Synchronous Replication:
In synchronous replication, the replica will be modified immediately after some changes are
made in the relation table. So there is no difference between original data and replica.

2. Asynchronous replication:
In asynchronous replication, the replica will be modified after commit is fired on to the
database.

Replication Schemes

The three replication schemes are as follows:

1. Full Replication
In full replication scheme, the database is available to almost every location or user in
communication network.

Advantages of full replication


 High availability of data, as database is available to almost every location.
 Faster execution of queries.
Disadvantages of full replication
 Concurrency control is difficult to achieve in full replication.
 Update operation is slower.

2. No Replication

No replication means, each fragment is stored exactly at one location.

Advantages of no replication
 Concurrency can be minimized.
 Easy recovery of data.
Disadvantages of no replication
 Poor availability of data.
 Slows down the query execution process, as multiple clients are accessing the same server.

3. Partial replication

Partial replication means only some fragments are replicated from the database.

Advantages of partial replication


The number of replicas created for fragments depend upon the importance of data in that
fragment.

Distributed databases - Query processing and Optimization

DDBMS processes and optimizes a query in terms of communication cost of processing a


distributed query and other parameters.

Various factors which are considered while processing a query are as follows:

Costs of Data transfer

 This is a very important factor while processing queries. The intermediate data is
transferred to other location for data processing and the final result will be sent to the
location where the actual query is processing.
 The cost of data increases if the locations are connected via high performance
communicating channel.
 The DDBMS query optimization algorithms are used to minimize the cost of data
transfer.
Semi-join based query optimization

 Semi-join is used to reduce the number of relations in a table before transferring it to


another location.
 Only joining columns are transferred in this method.
 This method reduces the cost of data transfer.

Cost based query optimization

 Query optimization involves many operations like, selection, projection, aggregation.


 Cost of communication is considered in query optimization.
 In centralized database system, the information of relations at remote location is obtained
from the server system catalogs.
 The data (query) which is manipulated at local location is considered as a sub query to
other global locations. This process estimates the total cost which is needed to compute
the intermediate relations.

Distributed Transactions

 A Distributed Databases Management System should be able to survive in a system failure


without losing any data in the database.
 This property is provided in transaction processing.
 The local transaction works only on own location(Local Location) where it is considered as
a global transaction for other locations.
 Transactions are assigned to transaction monitor which works as a supervisor.
 A distributed transaction process is designed to distribute data over many locations and
transactions are carried out successfully or terminated successfully.
 Transaction Processing is very useful for concurrent execution and recovery of data.

What is recovery in distributed databases?

Recovery is the most complicated process in distributed databases. Recovery of a failed


system in the communication network is very difficult.

For example:
Consider that, location A sends message to location B and expects response from B but B is
unable to receive it. There are several problems for this situation which are as follows.

 Message was failed due to failure in the network.


 Location B sent message but not delivered to location A.
 Location B crashed down.
 So it is actually very difficult to find the cause of failure in a large communication network.
 Distributed commit in the network is also a serious problem which can affect the recovery
in a distributed databases.

Two-phase commit protocol in Distributed databases

 Two-phase protocol is a type of atomic commitment protocol. This is a distributed


algorithm which can coordinate all the processes that participate in the database and
decide to commit or terminate the transactions. The protocol is based on commit and
terminate action.
 The two-phase protocol ensures that all participant which are accessing the database
server can receive and implement the same action (Commit or terminate), in case of local
network failure.
 Two-phase commit protocol provides automatic recovery mechanism in case of a system
failure.
 The location at which original transaction takes place is called as coordinator and where
the sub process takes place is called as Cohort.

Commit request:
In commit phase the coordinator attempts to prepare all cohorts and take necessary steps
to commit or terminate the transactions.

Commit phase:
The commit phase is based on voting of cohorts and the coordinator decides to commit or
terminate the transaction.

Concurrency problems in distributed databases.

Some problems which occur while accessing the database are as follows:

1. Failure at local locations


When system recovers from failure the database is out dated compared to other locations.
So it is necessary to update the database.

2. Failure at communication location


System should have a ability to manage temporary failure in a communicating network in
distributed databases. In this case, partition occurs which can limit the communication
between two locations.

3. Dealing with multiple copies of data


It is very important to maintain multiple copies of distributed data at different locations.

4. Distributed commit
While committing a transaction which is accessing databases stored on multiple locations, if
failure occurs on some location during the commit process then this problem is called as
distributed commit.

5. Distributed deadlock
Deadlock can occur at several locations due to recovery problem and concurrency problem
(multiple locations are accessing same system in the communication network).

Concurrency Controls in distributed databases

There are three different ways of making distinguish copy of data by applying:

1) Lock based protocol


A lock is applied to avoid concurrency problem between two transaction in such a way that
the lock is applied on one transaction and other transaction can access it only when the lock
is released. The lock is applied on write or read operations. It is an important method to
avoid deadlock.

2) Shared lock system (Read lock)


The transaction can activate shared lock on data to read its content. The lock is shared in
such a way that any other transaction can activate the shared lock on the same data for
reading purpose.

3) Exclusive lock
The transaction can activate exclusive lock on a data to read and write operation. In this
system, no other transaction can activate any kind of lock on that same data.

You might also like