Unit 2-PartII Distributed Database Design
Unit 2-PartII Distributed Database Design
User
s
Concept Conceptu
al
ual view
Schema
Intern Internal
al view
Sche
Distributed ma © M. T. Özsu & P.
Ch.3/3
Generic DBMS Architecture
ts
Interface
Semantic
Processor
Optimizer se
Recovery
Handler
Manager
Monitor
Process
Executi
Runtim
Suppor
Global
Global
Query
Query
User
Data
Local
Local
on
or
USER
t
System
response
s
GCS
Local Local
User Multi- User
Reque DBMS Reque
st Global LayerGlobal Global st
Subreque Subreque Subreque
st st st
• Bottom-up
● when the databases already exist at a number of sites
● Suitable for applications where database already exists
● Starting point is individual conceptual schemas
● Exists primarily in the context of heterogeneous database.
Objectives
User
Conceptual Input View
View Design
Design
Integration
Access
GCS Information ES’s
Distributio
n User
Design Input
LCS’s
Physical
Design
LIS’s
Distributed © M. T. Özsu & P.
Ch.3/18
Top down Design Process
● Suitable for applications where database needs to be
build from scratch
❷ How to fragment?
❺ How to allocate?
❻ Information requirements?
PROJ1 PROJ2
PROJ1 PROJ2
PNO BUDGE PNO PNAME LOC
T
P 1500 P Instrumentati Montre
P
1 1350
00 P
1 Database
on New
al
P
2 2500
00 P
2 CAD/CAM
Develop. New
York
P
3 3100
00 P
3 Maintenan Pari
York
P
4 5000
00 P
4 CAD/CAM
ce Bosto
s
5 00 5 n
finite number of
alternatives
tuples relation
or s
attribut
es
Possible Possible
REALITY Realisti
applicatio applicatio
c
Distributed
n © M. T. Özsu & P.
n
Ch.3/32
Information Requirements
• Four categories:
● Database information
● Application information
● Communication network information
● Computer system information
• I) Database Information.
- The database information concerns the global conceptual
schema. In this context it is important to note how the database
relations are connected to one another, especially with joins.
• Database Information
● relationship
SKILL
TITLE, SAL
L1
EMP PROJ
ENO, ENAME, TITLE PNO, PNAME, BUDGET,
LOC
ASG
ENO, PNO, RESP,
DUR
● cardinality of each relation: card(R)
• Thumb rule:
- most active 20% of user queries account for 80% of the total
data accesses.
- This “80/20 rule” may be used as a guideline in carrying out this
analysis
Preliminaries :
Pr should be complete
Pr should be minimal :
• Example :
● Assume PROJ[PNO,PNAME,BUDGET,LOC] has two applications
defined on it.
● Find the budgets of projects at each location. (1)
● Find projects with budgets less than $200000. (2)
Pr ={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”,
BUDGET≤200000,BUDGET>200000}
which is complete.
• Distributed
These minterm predicates determine
© M. T. Özsu & P. the fragments that are used
Ch.3/54
Cont…
• reducing the number of minterm predicates that need to be
considered in fragmentation.
PAY1 PAY2
TITLE SAL TITLE SAL
Mech. 270 Elect. 400
Eng.
Programm 00
240 Eng.
Syst. 00
340
er 00 Anal. 00
PROJ1 PROJ2
R = ∪∀Ri ∈FR Ri
• Disjointness
● Minterm predicates that form the basis of fragmentation should be
mutually exclusive.
L1
EMP PROJ
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC
L2 L3
ASG
ENO, RESP,
PNO, DUR
Ri = R ⋉Si, 1≤i≤w
where w is the maximum number of fragments that will be defined
on R and
Si = σFi (S)
where Fi is the formula according to which the primary horizontal
fragment Si is defined.
EMP1 EMP2
ENO ENAME TITLE ENO ENAME TITLE
E A. Mech. E J. Elect.
3
E Lee
J. Eng.
Programm 1
E DoeM. Eng.
Syst.
4
E Miller
R. er
Mech. 2
E Smith
B. Anal.
Syst.
7 Davis Eng. 5
E Casey
L. Anal.
Elect.
6
E ChuJ. Eng.
Syst.
Distributed © M. T. Özsu & P.
8 Jones Anal.
Ch.3/66
Cont…
• To carryout a derived horizontal fragmentation , 3
inputs are needed.
⎧ 1 if attribute A is referenced by
use(qi,Aj) ⎨ j
⎩ 0 query qi
= otherwise
Distributed © M. T. Özsu & P.
Ch.3/74
VF – Attribute Usage Matrix
Consider the following 4 queries for relation PROJ
q 1: SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ FROM PROJ
WHERE PNO=Value
q 3: SELECT PNAME q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value
Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC.
A1 A2 A3 A4
Attribute Usage matrixq 1 0 1 0
use(qi, Aj )= 1
q 0 1 1 0
q2 0 1 0 1
3
q 0 0 1 1
Distributed 4 © M. T. Özsu & P.
Ch.3/75
VF – Affinity Measure aff(Ai,Aj)
• The frequency measure can be included in the definition of the
attribute affinity measure aff (Ai,Aj ).
The attribute affinity measure between two attributes Ai and Aj of a
relation R[A1, A2, …, An] with respect to the set of applications Q =
(q1, q2, …, qq) is defined as follows :
application qk at site Sl and accl(qk) is the application access frequencies at different sites.
The result of this computation is an nn matrix, each element of which is one of the measures
defined above. We call this matrix the attribute affinity matrix (AA).
Distributed © M. T. Özsu & P.
Ch.3/76
Attribute Affinity Matrix
AM
=
∑∑ (affinity of Ai and Aj with their
i j neighbors)
where
n
bond(Ax,Ay) aff(Az,Ax)aff(Az,
= z Ay)
∑ =1
Thus cont(A2,A3,A4)=1780
Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4) = 1780
Distributed © M. T. Özsu & P.
Ch.3/93
BEA – Example
• Therefore, the CA matrix has the form A1 A3 A2
4 4
5 5 0
8
0 5 0
4 5
5 3 5
7
0 3 5
• When A is placed, the final form of the CA matrix (after row
4
organization) is A1 A3 A2 A4
A1 4 4
5 5 0 0
A3 4 5
5 3 5 3
A2 8 7
0 5 0 5
A4 7 7
Distributed © M. T. Özsu & P.
0 3 5 8
Ch.3/94
VF – Partitioning ALgorithm
• The objective of the splitting activity is to find sets
of attributes that are accessed solely, or for the
most part, by distinct sets of applications.
• For example, if it is possible to identify two
attributes,A1 andA2,which are accessed only by
application q1, and attributes A3 and A4, which are
accessed by, say, two applications q2 and q3, it
would be quite straightforward to decide on the
fragments.
• The task lies in finding an algorithmic method of
identifying these groups.
Distributed © M. T. Özsu & P.
Ch.3/95
Cont…
How can you divide a set of clustered attributes {A1, A2,
…, An} into two (or more) sets {A1, A2, …, Ai} and {Ai, …,
An} such that there are no (or minimal) applications that
access both (or more than one) of the sets.
A1 A2 A3 … Ai Ai+1 . .Am
A1 .
A2
TA
..
.
Ai
Ai+1
BA
..
.
Am
CTQ*CBQ−CO
Q2
R1 R2
VF VF VF VF VF
Decision Variable
1 if fragment Fi is stored at
xij
0 site Sj
=
otherwise
Distributed © M. T. Özsu & P.
Ch.3/117
Allocation Model
• Total Cost
∑ query processing
cost +
all
queries
∑ ∑ cost of storing a fragment at
a site
all all
• Storage Costsites fragments
(of fragment Fj at Sk)
∑ ∑ update message
cost +
all all
sites ∑
fragments ∑ acknowledgment
cost
all all
sites fragments
● Retrieval Cost
∑ mi
n
all
sites
(cost of retrieval
command +
all cost of sending back the
fragments result)