SAD Unit 4 Distributed Database Design

System Analysis &
Design
Unit 4- Distributed
Database Design
Insert Image
1. DistributedDatabase Design
1.1 Data Fragmentation
1.2 Data Replication
1.3 Data Allocation
Topics covered
Distributed Database Design
While distributing database at different geographical sites,following
three issues must be considered while designing distributed
database.
1. How to partition database into fragments? –Data Fragmentation

2. Which fragments to replicate? –Data Replication
3. Where to locate those fragments ? –Data Allocation
3
1. Data Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The
subsets of the table are called fragments.
• These fragments may be stored at different locations.
• The data fragmentation process should be carrried out in such a way that
the reconstruction of original database from the fragments is possible.
Data Fragmentation can be of three types:

1) Horizontal data fragmentation
2) Vertical Fragmentation
3) Hybrid Fragmentation
4
1. Data Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The
subsets of the table are called fragments.
• These fragments may be stored at different locations.
• The data fragmentation process should be carrried out in such a way that
the reconstruction of original database from the fragments is possible.
Data Fragmentation can be of three types:

1) Horizontal data fragmentation
2) Vertical Fragmentation
3) Hybrid Fragmentation
5
Horizontal Fragmentation
Itis a horizontal subset of a relation(a table) which contain those of tuples which satisfy selection
conditions.
Specified in the SELECT operation of the relational algebra on single or multiple attributes
Consider the Employee relation with selection condition (Branch = Delhi). All tuples or rows satisfying
this condition will create a subset which will be a horizontal fragment of Employee relation.
Fragment 1= (Branch=Delhi) Employee
Fragment 2= (Branch=Gurugram) Employee
Fragment 3= (Branch=Faridabad) Employee
Reconstruction of original relation(table)

To reconstruct the relation R from a complete horizontal fragmentation, we need to apply the UNION
operation to the fragments.
Original Employee is constructed via:
•Employee  (Fragment1) U (Fragment2) U (Fragment3)
6
Horizontal Fragmentation Example
Consider PROJ table containing all projects which is horizontally fragmented
into PROJ1 and PROJ2 according to following conditions.
PROJ1 = (Budget<200000) Proj PROJ
Sigma returns rows or tuples
projcts with budgets less than $200,000
PROJ2 = (Budget>=200000) Proj

projects with budgets greater than or
equal to $200,000
7
Vertical Fragmentation
Each site may not need all the attributes(columns) of a relation(a table) . Thus we use Vertical fragmentation which
divides a relation “vertically” by columns.
It is a subset of a relation which is created by a subset of columns.
Consider the Employee relation containing name,address,gender,salary,deptid
Eid Name Address Gender Salary Deptid

A vertical fragments of can be created by keeping the values as shown below
Eid Name Address Gender Salary Eid Deptid

Fragment1 Fragment2
PROJECT operation of the relational algebra is used
Fragment 1= Eid,Name,Address,Gender,Salary (Employee)

Fragment 2= Eid,deptid(Employee)
Reconstruction:
Because there is no condition for creating a vertical fragment, each fragment must include the primary key
attribute of the parent relation Employee. In this way all vertical fragments of a relation are connected by
join operation ⋈. So,Employee= Fragment1 ⋈ Fragment2 8

.
Vertical Fragmentation Example
Consider the following Student table which is vertically fragmented
Rollno Name Age Address Language
1 Rahi 20 Faridabad English
2 Abhishek 19 Delhi Hindi
3 Govind 20 Gurugram Hindi
4 Shivangi 20 Faridabad English
Above student table is divided into two vertical fragments
Stud_info = Rollno,Name,Age,Address (Student)
Stud_Language = Rollno,Language (Student)
Note:Remember to keep primary key of original table in all fragments so that they can be reconstructed.
Stud_info Stud_Language
Rollno Name Age Address Rollno Language
1 Rahi 20 Faridabad 1 English
2 Abhishek 19 Delhi 2 Hindi
3 Govind 20 Gurugram 3 Hindi
4 Shivangi 20 Faridabad 4 English
9
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
 This is the most flexible fragmentation technique since it generates fragments with minimal extraneous
information.
 Reconstruction of the original table is often an expensive task.
 Orignial relation is reconstructed by the combination of JOIN and Union operations.
10
Let us consider Employee table
Eid Name Age Address Salary
101 Abhishek 71 Gurugram 100000
102 Bhawna 60 Delhi 250000
103 Satyendra 59 Delhi 120000
104 Dhruv 41 Palwal 400000
105 Madhu 69 Faridabad 350000
So,it can be divided into fragments using both horizontal and vertical
Firstly we fragment horizontally on basis of age
Employee_senior = (age>=60) Employee ,it give details of senior citizens
Employee_junior= (age<60) Employee ,it give details of non seniors
11
Employee_senior
101 Abhishek 71 Gurugram 100000
102 Bhawna 60 Delhi 250000
105 Madhu 69 Faridabad 350000
Employee_junior
103 Satyendra 59 Delhi 120000
104 Dhruv 41 Palwal 400000
Now we apply vertical fragmentation on these tables

Employee_senior_info= Eid,Name,Age,Address(Employee_Senior)
Employee_senior_salary= Eid,Salary(Employee_Senior)
Employee_junior_info= Eid,Name,Age,Address(Employee_junior)
Employee_junior_salary= Eid,Salary(Employee_junior)
12
Employee_Senior_info Employee_Senior_salary
Eid Name Age Address Eid Salary
101 Abhishek 71 Gurugram 101 100000
102 Bhawna 60 Delhi 102 250000
105 Madhu 69 Faridabad 105 350000
Employee_junior_info Employee_junior_salary
Eid Name Age Address Eid Salary

103 Satyendra 59 Delhi 103 120000
104 Dhruv 41 Palwal 104 400000
So, as shown above we get four fragments by applying hybrid fragmentation in following way
Employee_senior_info= Eid,Name,Age,Address( (age>=60) Employee)

Employee_senior_salary= Eid,Salary( (age>=60) Employee)
Employee_junior_info= Eid,Name,Age,Address( (age<60) Employee)

Employee_junior_salary= Eid,Salary( (age<60) Employee) 13
Advantages of Fragmentation
• Efficiency
Data is stored close to where it is most frequently used.
In addition, data that is ,not needed by' local applications is not stored.
• Parallelism
With fragments as the unit of distribution, a transaction can be divided into several sub queries that operate on
fragments. This should increase the degree of concurrency, or parallelism, in the system, thereby allowing
transactions to execute in parallel.
• Data management is easy as fragments are smaller compare to the complete database.
• Increased availability of data to the users/queries that are local to the site in which the data stored.
• Data that are not required by local applications are not stored locally. It leads to reduced data transfer between
sites, and increased security.
• A database is fragmented and fragments are made available at different sites. It reduces disk space at each site
considerably.
Disadvantages of Fragmentation
• Performance
The performance of global application that requires data from several fragments located
at different sites may be slower.
• Integrity
Integrity control may be more difficult as data is fragmented are fragmented and located
at different sites.
• Problems in defining best fragmentations
Different application accessing same data may have different access view
requirements,defining best fragmentation is tedious task.
2. Data Replication
• Data replication is the process in which the data is copied at multiple locations
(Different computers or servers) to improve the availability of data.
• Copy of relation is also termed as replica of relation.
Data replication is done with an aim to:
• Increase the availability of data
• Speed up the query evaluation
• There are two types of data replication:
1. Synchronous Replication:
In synchronous replication, the replica will be modified immediately after some
changes are made in the relation table. So there is no difference between original
data and replica.
2. Asynchronous replication:
In asynchronous replication, the replica will be modified after commit is fired on to the
database.
Fragment1
Fragment2 copy
Fragment3 copy
Fragment1 copy
Fragment4 copy
Replication Schemes Fragment2
Fragment3 copy
Fragment4 copy Site1
There are three replication schemes as follows:
1. Full Replication
In full replication scheme, the copy of database is available to almost
every location or user in communication network.
Advantages of Full Replication Site2 Site3
• High Availability of Data.
• Improves the performance for retrieval of global queries as the result
can be obtained locally from any of the local site.
• Faster execution of Queries.
Site4
Disadvantages of full replication Fragment1 copy
Fragment2 copy
• Concurrency control is difficult to achieve in full replication. Fragment3
Fragment1 copy
• Update operation is slower. Fragment2 copy Fragment4 copy
Fragment3 copy
Fragment4
Fragment1
Replication Schemes
Fragment2 Site1
2. No Replication
No replication means, each fragment is stored exactly at one location.
Advantages of no replication
• Concurrency can be minimized.
Site2 Site3
• Easy recovery of data.
Disadvantages of no replication
• Poor availability of data.
• Slows down the query execution process, as multiple clients are
accessing the same server. Site4
Fragment3
Fragment4
Fragment1
Fragment2 copy
Fragment3 copy
Fragment1 copy
Fragment4 copy
Replication Schemes Fragment2
Fragment3 copy
Site1
3. Partial replication
Partial replication means only some fragments are replicated from the
database.
Site2 Site3
Advantages of partial replication
The number of replicas created for fragments depend upon the importance
of data in that fragment.
Site4
Fragment3
Fragment4 copy
Fragment4
Comparison of various replication schemes
Here, partitioning
Means no replication.
Advantages of Replication
• Availability
If one of the sites containing relation R fails, then the relation R can be obtained from another site.
Thus, queries (involving relation R) can be continued to be processed in spite of the failure of one site.
• Parallelism
The sites containing relation R can process queries (involving relation R) in parallel .
This leads to faster query execution.
• Less Data Movement over Network

The more replicas of, a relation are there, the greater are the chances that the required data is found
where the transaction is executing. Hence, data replication reduces movement of data among sites and
increases speed of processing.
Disadvantages of Replication
• Increased overhead on update.
When an updation is required, a database system must ensure that all replicas are updated.
• Require more disk space

Storing replicas of same data at different sites consumes more disk space.
• Expensive
Concurrency control and recovery techniques will be more advanced and hence more expensive. Software
overhead, and management associated with synchronizing transactions and their components
vs. fault-tolerance benefits that are associated with replicated data will add to cost.
• Data transmission costs:

The amount of data replicated will have an impact on on the data transmission costs. Replicating large
amounts of data requires a window of time and higher network bandwidth that could affect other
applications.
• Usage frequency:
The frequency of data usage determines how frequently the data needs to be updated. Frequently used
data needs to be updated more often, for example, than large data sets that are used only every quarter.

Data Allocation
Data allocation is a process of deciding where to store the data. It also involves a decision as to which data is
stored at what location. Data allocation strategies can be centralised, partitioned or replicated.
Data Allocation Strategy Description

Centralised The entire database is stored at one site.
No distribution occurs.
Partitioned The database is divided into several
fragments that are stored at several sites.
Replicated Copies of one or more database fragments
are stored at several sites.
Example
Consider an example of fragmenting and distributing the company data-base .Suppose that
the company has three computer sites— one for each current department. Sites 2 and 3 are
for departments 5 and 4, respectively. At each of these sites, we expect frequent access to
the EMPLOYEE and PROJECT information for the employees who work in that
department and the proj-ects controlled by that department. Further, we assume that these
sites mainly access the Name, Ssn, Salary, dept_id and Super_ssn attributes
of EMPLOYEE. Site 1 is used by company headquarters and accesses all employee and
project information regularly, in addition to keeping track of DEPENDENT information
for insurance purposes.
Consider the above and design distributed database.

Employee
Example
SSn Name Salary Superssn Dept_id did
Department 12 Shivam 34000 16 4 101
Did Dname 13 Sahil 70000 14 5 103
4 IT 14 Rahi 32000 14 5 104
5 Accounts 15 Teena 40000 14 5 102

16 Sangeeta 60000 16 4 105
Works_on Projects Dependent

Proj_ Proj_name Dnum
SSn Proj_no did Depenedent Dependent Phoneno
no
id name
12 1 4 1 Food Website 4
101 Rakesh 798919
13 2 5
2 Payroll 5 103 Ajay 753653
14 4 5
Management
104 Sameer 431434
15 2 5
3 ERP 5 105 Vaishali 562562
16 3 4
4 Insurance 4 106 Shiksha 375735
Dept_4
Steps of designing distributed database
Did Dname
4 IT
1. Data Fragmentation Dept_5
Fragment the given data as told in question Did Dname
Hybrid fragmentation of Employee table,horizontal
fragmentation of other tables except Dependents table. 5 Accounts
(Write equation using sigma or project symbols by yourself)
SSn Name Salary Superssn Dept_id

13 Sahil 70000 14 5
Emp_5
14 Rahi 32000 14 5
15 Teena 40000 14 5
SSn Name Salary Superssn Dept_id

12 Shivam 34000 16 4
Emp_4
16 Sangeeta 60000 16 4
Example
Workson_4
Emp_dependents Dependents
Depenedent Dependent Phoneno SSn Proj_no did
SSn did
id name 12 1 4
12 101
101 Rakesh 798919 16 3 4
13 103
103 Ajay 753653
14 104 104 Sameer 431434
15 102 105 Vaishali 562562
16 105 106 Shiksha 375735
Workson_5
Projects_5 Projects_4
Proj_ Proj_name Dnum Proj_ Proj_name Dnum SSn Proj_no did
no no 13 2 5
2 Payroll 5 2 Payroll 5 14 4 5
Management Management
15 2 5
3 ERP 5 3 ERP 5
Example
2. Data Allocation : Here fragments are distributed to various sites Emp_5,

Workson_5
Projects_5
Dept_5
Site 1 Site2
Emp_dependents,
Dependents
Site 3
Emp_4,
Workson_4
Projects_4
Dept_4
Example
3. Data Replication : Here fragments are copied to various other sites.In this example we used partical
replication to copy site 2 and site 3 fragments on only site 1 which is headquarter site.
Emp_5,
Workson_5
Projects_5
Dept_5
Site 1 Site2
Emp_dependents,
Dependents
Emp_4,
Workson_4 Site 3
Projects_4
Dept_4
Emp_4,
Emp_5, Workson_4
Workson_5 Projects_4
Projects_5 Dept_4
Dept_5
T
H
A
N
THANK
K YOU
Y
O
U
Email
Website
ENJOY LEARNING!!

SAD Unit 4 Distributed Database Design

Uploaded by

Copyright:

Available Formats

SAD Unit 4 Distributed Database Design

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SAD Unit 4 Distributed Database Design

Uploaded by

Copyright:

Available Formats

System Analysis &

1.1 Data Fragmentation

1.2 Data Replication

1.3 Data Allocation

1. How to partition database into fragments? –Data Fragmentation

Data Fragmentation can be of three types:

Data Fragmentation can be of three types:

Reconstruction of original relation(table)

PROJ2 = (Budget>=200000) Proj

Eid Name Address Gender Salary Deptid

Eid Name Address Gender Salary Eid Deptid

Fragment 1= Eid,Name,Address,Gender,Salary (Employee)

join operation ⋈. So,Employee= Fragment1 ⋈ Fragment2 8

105 Madhu 69 Faridabad 350000

Now we apply vertical fragmentation on these tables

Eid Name Age Address Eid Salary

Employee_senior_info= Eid,Name,Age,Address( (age>=60) Employee)

Employee_junior_info= Eid,Name,Age,Address( (age<60) Employee)

• Less Data Movement over Network

• Require more disk space

• Data transmission costs:

Data Allocation Strategy Description

Consider the above and design distributed database.

Did Dname 13 Sahil 70000 14 5 103

4 IT 14 Rahi 32000 14 5 104

5 Accounts 15 Teena 40000 14 5 102

Works_on Projects Dependent

SSn Name Salary Superssn Dept_id

SSn Name Salary Superssn Dept_id

2. Data Allocation : Here fragments are distributed to various sites Emp_5,

You might also like