SAD Unit 4 Distributed Database Design
SAD Unit 4 Distributed Database Design
SAD Unit 4 Distributed Database Design
Design
Unit 4- Distributed
Database Design
Insert Image
1. DistributedDatabase Design
Topics covered
Distributed Database Design
While distributing database at different geographical sites,following
three issues must be considered while designing distributed
database.
3
1. Data Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The
subsets of the table are called fragments.
• These fragments may be stored at different locations.
• The data fragmentation process should be carrried out in such a way that
the reconstruction of original database from the fragments is possible.
4
1. Data Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The
subsets of the table are called fragments.
• These fragments may be stored at different locations.
• The data fragmentation process should be carrried out in such a way that
the reconstruction of original database from the fragments is possible.
5
Horizontal Fragmentation
Itis a horizontal subset of a relation(a table) which contain those of tuples which satisfy selection
conditions.
Specified in the SELECT operation of the relational algebra on single or multiple attributes
Consider the Employee relation with selection condition (Branch = Delhi). All tuples or rows satisfying
this condition will create a subset which will be a horizontal fragment of Employee relation.
Fragment 1= (Branch=Delhi) Employee
Fragment 2= (Branch=Gurugram) Employee
Fragment 3= (Branch=Faridabad) Employee
7
Vertical Fragmentation
Each site may not need all the attributes(columns) of a relation(a table) . Thus we use Vertical fragmentation which
divides a relation “vertically” by columns.
It is a subset of a relation which is created by a subset of columns.
Consider the Employee relation containing name,address,gender,salary,deptid
Reconstruction:
Because there is no condition for creating a vertical fragment, each fragment must include the primary key
attribute of the parent relation Employee. In this way all vertical fragments of a relation are connected by
Stud_info Stud_Language
Rollno Name Age Address Rollno Language
1 Rahi 20 Faridabad 1 English
2 Abhishek 19 Delhi 2 Hindi
3 Govind 20 Gurugram 3 Hindi
4 Shivangi 20 Faridabad 4 English
9
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
This is the most flexible fragmentation technique since it generates fragments with minimal extraneous
information.
Reconstruction of the original table is often an expensive task.
Orignial relation is reconstructed by the combination of JOIN and Union operations.
10
Hybrid Fragmentation
Let us consider Employee table
Eid Name Age Address Salary
101 Abhishek 71 Gurugram 100000
102 Bhawna 60 Delhi 250000
103 Satyendra 59 Delhi 120000
104 Dhruv 41 Palwal 400000
So,it can be divided into fragments using both horizontal and vertical
Firstly we fragment horizontally on basis of age
Employee_senior = (age>=60) Employee ,it give details of senior citizens
Employee_junior= (age<60) Employee ,it give details of non seniors
11
Hybrid Fragmentation
Employee_senior
Eid Name Age Address Salary
101 Abhishek 71 Gurugram 100000
102 Bhawna 60 Delhi 250000
105 Madhu 69 Faridabad 350000
Employee_junior
Eid Name Age Address Salary
103 Satyendra 59 Delhi 120000
104 Dhruv 41 Palwal 400000
Employee_junior_info Employee_junior_salary
So, as shown above we get four fragments by applying hybrid fragmentation in following way
• Efficiency
Data is stored close to where it is most frequently used.
In addition, data that is ,not needed by' local applications is not stored.
• Parallelism
With fragments as the unit of distribution, a transaction can be divided into several sub queries that operate on
fragments. This should increase the degree of concurrency, or parallelism, in the system, thereby allowing
transactions to execute in parallel.
• Data management is easy as fragments are smaller compare to the complete database.
• Increased availability of data to the users/queries that are local to the site in which the data stored.
• Data that are not required by local applications are not stored locally. It leads to reduced data transfer between
sites, and increased security.
• A database is fragmented and fragments are made available at different sites. It reduces disk space at each site
considerably.
Disadvantages of Fragmentation
• Performance
The performance of global application that requires data from several fragments located
at different sites may be slower.
• Integrity
Integrity control may be more difficult as data is fragmented are fragmented and located
at different sites.
• Problems in defining best fragmentations
Different application accessing same data may have different access view
requirements,defining best fragmentation is tedious task.
2. Data Replication
• Data replication is the process in which the data is copied at multiple locations
(Different computers or servers) to improve the availability of data.
• Copy of relation is also termed as replica of relation.
Data replication is done with an aim to:
• Increase the availability of data
• Speed up the query evaluation
• There are two types of data replication:
1. Synchronous Replication:
In synchronous replication, the replica will be modified immediately after some
changes are made in the relation table. So there is no difference between original
data and replica.
2. Asynchronous replication:
In asynchronous replication, the replica will be modified after commit is fired on to the
database.
Fragment1
Fragment2 copy
Fragment3 copy
Fragment1 copy
Fragment4 copy
Replication Schemes Fragment2
Fragment3 copy
Fragment4 copy Site1
There are three replication schemes as follows:
1. Full Replication
In full replication scheme, the copy of database is available to almost
every location or user in communication network.
Advantages of Full Replication Site2 Site3
• High Availability of Data.
• Improves the performance for retrieval of global queries as the result
can be obtained locally from any of the local site.
• Faster execution of Queries.
Site4
Disadvantages of full replication Fragment1 copy
Fragment2 copy
• Concurrency control is difficult to achieve in full replication. Fragment3
Fragment1 copy
• Update operation is slower. Fragment2 copy Fragment4 copy
Fragment3 copy
Fragment4
Fragment1
Replication Schemes
Fragment2 Site1
2. No Replication
No replication means, each fragment is stored exactly at one location.
Advantages of no replication
• Concurrency can be minimized.
Site2 Site3
• Easy recovery of data.
Disadvantages of no replication
• Poor availability of data.
• Slows down the query execution process, as multiple clients are
accessing the same server. Site4
Fragment3
Fragment4
Fragment1
Fragment2 copy
Fragment3 copy
Fragment1 copy
Fragment4 copy
Replication Schemes Fragment2
Fragment3 copy
Site1
3. Partial replication
Partial replication means only some fragments are replicated from the
database.
Site2 Site3
Advantages of partial replication
The number of replicas created for fragments depend upon the importance
of data in that fragment.
Site4
Fragment3
Fragment4 copy
Fragment4
Comparison of various replication schemes
Here, partitioning
Means no replication.
Advantages of Replication
• Availability
If one of the sites containing relation R fails, then the relation R can be obtained from another site.
Thus, queries (involving relation R) can be continued to be processed in spite of the failure of one site.
• Parallelism
The sites containing relation R can process queries (involving relation R) in parallel .
This leads to faster query execution.
• Expensive
Concurrency control and recovery techniques will be more advanced and hence more expensive. Software
overhead, and management associated with synchronizing transactions and their components
vs. fault-tolerance benefits that are associated with replicated data will add to cost.
• Usage frequency:
The frequency of data usage determines how frequently the data needs to be updated. Frequently used
data needs to be updated more often, for example, than large data sets that are used only every quarter.
Data Allocation
Data allocation is a process of deciding where to store the data. It also involves a decision as to which data is
stored at what location. Data allocation strategies can be centralised, partitioned or replicated.
Consider an example of fragmenting and distributing the company data-base .Suppose that
the company has three computer sites— one for each current department. Sites 2 and 3 are
for departments 5 and 4, respectively. At each of these sites, we expect frequent access to
the EMPLOYEE and PROJECT information for the employees who work in that
department and the proj-ects controlled by that department. Further, we assume that these
sites mainly access the Name, Ssn, Salary, dept_id and Super_ssn attributes
of EMPLOYEE. Site 1 is used by company headquarters and accesses all employee and
project information regularly, in addition to keeping track of DEPENDENT information
for insurance purposes.
Site 1 Site2
Emp_dependents,
Dependents
Site 3
Emp_4,
Workson_4
Projects_4
Dept_4
Example
3. Data Replication : Here fragments are copied to various other sites.In this example we used partical
replication to copy site 2 and site 3 fragments on only site 1 which is headquarter site.
Emp_5,
Workson_5
Projects_5
Dept_5
Site 1 Site2
Emp_dependents,
Dependents
Emp_4,
Workson_4 Site 3
Projects_4
Dept_4
Emp_4,
Emp_5, Workson_4
Workson_5 Projects_4
Projects_5 Dept_4
Dept_5
T
H
A
N
THANK
K YOU
Y
O
U
Email
Website
ENJOY LEARNING!!