0% found this document useful (0 votes)
65 views

Chapter 3 Distributed Database Design

Uploaded by

sehar john0344
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
65 views

Chapter 3 Distributed Database Design

Uploaded by

sehar john0344
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 34
Chapter 3 Distributed Database Design Table of Contents Alternative Design Strategies Distribution Design Issues . . ® Fragmentation . Allocation 1. Alternative Design Strategies # Two major strategies ¥ Top-down approaches ¥ Bottom-up approaches 1.1 Top-Down Design Process Requrersent asia (Sie Renews co i Yen Se rs (teotniconcepia Schema] [Acme |“ saroaen ons J (Geen Concaer seroma ) Physal Deson (rrr Sener) seus Details of Design Process Requirement Analysi v Defines the environment of the system. ¥ Elicits both the data and processing needs of all potential DB users © System Requirements ¥ Whete the final system is expected to stand? ¥ Performance, Reliability, Availability, Economics, Flexibility Details of Design Process (Cont'd) ® Conceptual Design ¥ Determines entity types and relationships among these entities ¥ Entity analysis: determines the entities, attributes, and relationships ¥ Functional analysis: — determines the fandamental functions with which the modeled enterprise is involved ¥ The process is identical to the centralized database design Details of Design Process (Cont'd) iew Design ¥ Defines the interfaces for end users ¥ The conceptual schema can be interpreted as being an integration of user views. © Distribution Design ¥ Designs the local conceptual schema by distributing the entities over the sites of the distributed system ¥ Consists of two steps : fragmentation and allocation 1.2 Bottom-Up Design Process © Top-Down Approach: Suitable when a system is being designed from scratch © Bottom-Up Approach Suitable when many DBs exist, and the design task involves integrating them into-one DB ~> The bottom-up design process consists of integrating local schemas into the global. conceptual schema, > Schema Translation & Schema Integrating > In the context of Heterogeneous Database ! 2. Distribution Design Issues © Why fragment at all? © How should we fragment? © How much should we fragment? Is there any way to test the correctness of decomposition? © How should we allocate? © What is necessary information for fragmentation and allocation? 2.1 Reasons for Fragmentation Arelation is not an appropriate unit of distribution. ¥ Application views are usually subsets of relations. ¥ Unnecessarily high volume of remote data access or unnecessary replication ¥ Not support intra-query concurrency > decompose a relation into fragments © Disadvantages of fragmentation ¥ applications defined on more than one fragments performance degradation by snion or join v¥ semantic data control integrity checking is very difficult 5 SSSUEBE OE cops 10 2.2 Fragmentation Alternative ® Horizontal Fragmentation or Vertical Fragmentation es JINAME in R Instumentaton Dotabase Develo, ok 2NO. JINAME BUDGET tor. B in anos Maintenance span ‘0000 New vor Pans Example of Horizontal Partitioning Capers h, BUDGET WO. 130000 aL 138000 2 0000 ay 310000 Mt NAME Instrumentation Detabase Develop canjcam Maintenance Example of Vertical Pastitioning 2.3 Degree of Fragmentation © Notto fragment at all: relation © Fragment to the level of individual tuples or Fragment to the level of individual attributes. © Suitable level of fragmentation? ¥ Such a level can only be defined with respect to the applications that run on the database. > According to the valne of applicatian-specific parameters, individual fragments cem be ielentified. Capers 2.4 Correctness Rules of Fragmentation © Completeness ¥ If arelation instance R is decomposed into fragments R,, Ry, .. .. R,, each data item can be found in R can also be found in one or more of R,'s. © Reconstruction ¥ If arelation instance R is decomposed into fragments R,, R,... «Rye it should be possible to define a relational operator Vsuch that R=YR, WR, € Fy © Disjointess ¥ If arelation instance R is decomposed into fragments R,,R),.. « R,.and data item d,is in R,, it is not any other fragment R, (£#/), cupers ut 2.5 Allocation Alternatives © Comparison of Replication Alternatives Fall Replication [Partial Replication | Partitioning ‘Query cae Easy Difficult Difficult Directory Easy or Management | _ponexisient panel ee "Consureney Goatrol Reliability “Very high igh Tow Moderate Difficult Essy Reality Possible application] Realistic _| Possible application Capers 2.6 Information Requirements © Database Information © Application Information © Communication Network Information © Computer System Information 3. Fragmentation © Design of Horizontal Fragmentation © Design of Vertical Fragmentation © Design of Hybrid Fragmentation 3.1 Horizontal Fragmentation © Primary horizontal fragmentation © Derived horizontal fragmentation © Information requirements of horizontal fragmentation ¥ Database Information ¥ Application Information Database Information © Coneemis the global conceptual schema ¥ How the DB relations are connected to one another, especially with joins? ¥ Expression of relationships among relations using liaks © Example : dain nail 2 (oonea weeps 7 mes | 1 ZR AUNE TMF] OWA, BIGOT] 6 FO TuD, WES, OU Capers 19 10 Application Information © Predicates used in user queries v The most active 20% of user queries account for 80% of the total data access. © Simple Predicate Vy Ay O Value, 8 € {=< 4.5,>,2} ¥ Pr, : set of all simple predicates defined on relation R, ® Minterm Predicates ¥ m,: the conjunction of simple predicates ¥M, : the set of minterm predicates for relation R, M,= {m,|m,-Ap,'}.1¢é¢m,1¢/ 30000 py TITLE ="Mechéng” 2 m,:+(TITLE ="“Elect.Eng”) « SAL s 30000 Pa: TITLE = "Programmer" 2m, : “(TITLE = “Elect.Eng”) x SAL > 30000 py: SAL # 30000 im, : TITLE = "Programmer" 4 SAL = 30000 P, : SAL > 30000 my: TITLE = “Programmer” SAL > 30000 Application Information (71l 4) © Quantitative Information v Minter selectivity: sei(i,) Number of tuples of relations that would be accessed by a user query specified according to a given minterm ¥ Access frequency: acely,) Frequency with which user application access data Primary Horizontal Fragmentation © Definition A ftagmentation generated by a selection operation on the owner relation of a database schema ¥ Given relation R,, its horizontal fragments are R, - 0-(R,), 1 ££, F, : the selection formula (m,) © Example: Sample Relation J 12 Example: Minterm Fragments Name Instrumentation Mantenance Simple Predicate2| BI} 1% © Completeness ¥Pr°l simple predicate °| 8 484! fragment Oli HH 2 fragment] RESIS APSE SSH t501 SS # SS, Pris complet ¥ Example = Pr = (LOC = "Montreal", LOC = “New York", LOC = “Paris” } — Pr = Pr U (BUDGET s 2a9000, BUDGET > 200000 } © Minimality ¥ PrOfl °l 2H fragment FI} Fit F.2 2BS OP, Soe stg Sez ¥ Example = PrT= Pr’ U { INAME'= “Instrumentation” } at 13 Algerithm COM Hn input: RL: relation; Pr: set of simple precicates output : 61: set af simple predicates decane: F set of minterm fragments begin find @p, © Pr such that p partons R aooording to Rte (the mineerm fragment accorcirg top.) begin find ap, « Pr such that p, partons some fof Pr’ Sctording to Pe Pre prug, Pra Prop, FoFul eng-begin ntl Bis complete end {COM_MIN} P_Rufe : fundamental rule of completeness and minimality, which states, that a fragment is partitioned “into at Least two parts which are accessed differently by ai least one applisation.” cops 28 Aigrithm PHORIZONTAL Input: R,: relation; Pr, set of spl predicates utput: Meet oF minterm Fragments begin Pe = COM _ MINER, Pr) etermine the set M, of minterm precicatas determine the set Lf implications among p< Pr? far eoen m, = ca fim is contradictory according to I then M= Hm, eo end far end {PHORIZONTAL} Example: Py: att = valued 1 att ~ valuel) ~ (att att ~ valued) = (att ~ valued) Mm: (att = valued) a (att = valuez) contradictory by E mt (att = valuel) A —fatt = value2) tg: afatt'= valve) (att = value2) my {att = valved) » fat = value) contradictary by T ‘Copers- 27 14 Example © S.J: subject of primary horizontal fragmentation ® Assumption for S wv There is only | application that accesses S, ¥ That application checks the salary information. ¥ Queries for S are issued at two sites. # Simple Predicates py: SAL ¢ 30000 P) : SAL > 30000 =P, = {0y, Pz) = complete aod minimal by COM_MIN Example (Cont'd) © Minterm Predicates 1 (SAL = 30000) A (SAL » 30000) im, : (SAL-= 30000) 1 « (SAL ~ 30000) {SAL £30000) A (SAL » 30000) SAL = 30000) n «(SAL » 30000) > m, and m, is contradictory: M = {m,, M3} Therefore, we define two fragments, SAL ‘0000 3000 15 Example (Cont'd) © Assumption for J ¥ There are 2 applications that access J ~ The first is issued at three sites and finds the names and budgets of projects given their number. ¥ The second is issued at two sites and has to do with the management of the projects. © Simple Predicates Pp, = LOC = "New York” BUDGET > 200000 = Pr {D4, Pz, By, Pa, is complete and minimal : COM_MIN Tapers Example (Cont'd) © Minterm Predicates m, : LOC = “Montreal” 4 (BUDGET = 20000) m, : LOC = "Montreal" , (BUDGET » 20000) img : LOC = "New York" « (BUDGET < 20000) ma, : LOC = "New York” 4 (BUDGET » 20000) m, : LOC = "Paris" n (BUDGET = 20000) m, : LOC = "Paris" A (BUDGET » 20000) Therefore, we define six fragments, F, = Quy Jor Jay Jar Joy Je} according to M. 16 Derived Horizontal Fragmentation ead ¥ Defined on a member relation of a link according toa selection operation on its owner. ¥ Given a link L, ownenL)~ S & member(L) = R -R,-R semi joinS, 1i£w # of fragments that will be defined on R },= a primary horizontal fragment for S © Example L, : owner(L,) = S and member(L,)=E E, : E semi_join §,, where $, = Gea, « scocol) E+E semi_join $,, where S; = Osa, - gaol) Potential Complication e FMS When there are more than two Links into a relation, there is more than ene possible horizontal fragmentation of the relation, Two criteria ¥ Fragmentation used in more applicstioas Y¥ Fragmentation with better join characteristics Recall the advantages of the fragmentation ¥ Performing. qpery on smaller relations 1 Perfonning joins ina distibured fashion Simple Graph A gph with only one link coming in or going out of a fragment ¥ Effects of storage and join pecformance! 17 Example : Fragmentation of G Assumption There are two applications which access G. wv One finds the names of engineers who work at certain places. sf The other accesses the project that employees work om and how long they will work oa those projects. © The first fragmentation according to J,, Jp. J3 ¥ G, = Gsemi join Jy, Where J, ! Og —-ryitmar() JG, = Gsemi_join J, where 3; > O.o¢ onan tore) 4G, = Gsemi join Js, where; + o.oc --peed) © The second fragmentation according to E,. Ey ¥ G, = Gsemi_join E, ¥G, = Gsemi_joinE, The final choice af the fragmemation scheme may be a decision problem adivessed dia mg allocarion. opens 34 Checking for Correctness © Completeness ¥ PHF: A set of complete and mivimal predicates, Pr’ ¥ DHF: Ensures referential integrity © Reconstruction ¥ for a relation R with fragments F; R= UR, WR € Fa © Disjointness PHF: Mintenm predicates determining the fingmentation are nautually exclusive ¥ DHF: Disjointuess can be guaranteed if the join graph is simple; otherwise investigate actual tuple values Capers 18 3.2 Vertical Fragmentation © Definition Partitions R to fiagments R,, R,, .... R,, each which contains a subset of R’s attribute as welll as the primary key of R @ Inherently more complicated than horizontal partitioning ¥ Total number of alternatives ¥ Obtaining optimal solution is very difficult ¥ Resort to heuristics Two Types of Heuristics © Grouping ¥ Starts by assigning each attribute to one fragment ¥ Joins some of fiagments until some etitesia is satisfied. © Splitting ¥ Starts with a relation and decides on beneficial partitioning ¥ Top-down desien methodology dll *{ @ Note = Replication of the key in the fragments = Therefore, splng 1 considered only for shoe crates that do sot participate ithe rier es 19 Information Requirements of Vertical Fragmentation © What needs to be detesmined about applications? ¥ Affinity of tributes: How closely related the atributes ace? Y Antribote usage valve: vee{qy Ay) = 1 oF 0 © Example » SELECT BUDGET FROM 1 WHERE JNO = Vaue; SELECT JRE, BUDGET FROM J; STLECT JMAME FROM } WHERE LOC = Value; SELECT SUM(QUDGET) FROM J WHERE LOC = Va A= 20 ainisy Matric ‘a= JANE. Attribute Affinity © Attribute Affinity aflfA,.A,)= > Yreila, dace ia.) ref(g,) : # of accesses to attributes (14, A.) for each execution of application q, at site S, ace,(q,): the application access frequency measure 20 Example — Assume that ref(q,) - I for all q, and S, — Application frequencies e0,(q,)= 15 ace(g)= 20 ace g,) = 10 aee,(ql- 5 aces DB acey(qy)- 0 neey(qyI= 25 ace 25 aees( ash acc (ad= 0 ace(qy= = ace, (q,) +ace, (q,)+ ace ,tq,) = 45 A attribute affinity matrix (AA) opr 9 Clustering Algorithm oe ‘ Find some means of grouping the attributes of a relation based on the oxcboteeinty valoes in AA. Net comribution tothe global affinity measure of pacing atirte A between A, an ~cont(Ay Ay, A} = bond, A,) + bond(, A) ~bond(A, A) where bond(A,.A, par. Laff (eo) ® Example cot Ay, Ay. Ay) = bond(A,, Ay) + bondtA,, Ay) —bord{A,, Ag bondl,. A, bondi. Capers 21 Algerithm CLUSTERING input: 4 : attribute afinty matric output: CA: custered afeity matric ies remember that AK ip enn — mutes} a index = 35 while index = n do {choose the “best” location for attnibaite RA jae, > begin Tor | mn 1 to index — 1 1 calulate conta, Rye ent for CAICUIADE CONE Aye 2+ Anton Arior sx) (DOUNICERY GONG. F Joc = placement given by maximum cont value for jem next ae By 1 0 order the rows according ta the relative ovdering af columns rd, (CLUSTERING) boudl Ay. A;)+ bond A;, A,) —bond(Ay, An) bandia,..a,)=0 A)—45 4545-0453 45430-4410 ‘gomt(Acy Ay, Ai) = 4410 Ordering(1-3-2) cont{A, Ay. ;) = bond(A,, A,)+ bord(A,, A.) —bond(A, A) bond(A,, A;)—bondA;, A;)~ 4410 bondlA., As) = 890, boudlA,. Ad} conti. Ay A. capers 22 comt{As, Ay. A.) = bondlA., As)-+ bond(A;.A,)—bond(A, A) bond(A,, A,)= $50 bomi(Ay. Ag)— bonds. A,)~ 0 conta, Aad so forth ...: The resulting Clustered affinity Mawix (CA) A a 3 15 78 capers 46 Partitioning Algorithm © The upper left-hand comer of CA: TA © The lower right-hand comer of CA: BA {A |use(.A)= 1} 1g, AQig) STAY 19, AQKg,) S BA} Q-1TQUBQ} 23 > 2 ref ,(q,)ace ,(q,) dtOes, 2 > ref (q, Jace ,(q,) cBO ze ref (q, ace (q.) cog = 2 2 ref ,(q, hace ,(g,) To find the point x such that z is maximized : z=CTQ - CBQ—COQ? opens Algorithm PARTTFION input = CA: clustered affinity matrix -R: relation output =F: set of fragments, begn 4 determine the wali For the Fest colurnn {the subscripts in the cast equations indicate the salt point } galcdate CTQ, 1, BQ,~ 11 COR, Dest = CTR,» BQ,“ (C0, a for i from n~ 2 to 1 by— do ‘aleuate CTQ, C8, COQ, 2 Cr CB, CoQ? sf (> best) then ‘255190 Dest to 2 and record the shift positon: and endfor call SHIFT(CA) ‘until ne more SHIFT is possible seconstruct the matriy according to the shit postion R= Nyt) UK {K 6 the set of primary key attributes af R } Capers a7 Checking for Correctness © Completeness A=TAUBA @ Reconstruction R-JOIN,R, VR € Fy, ® Disjointness snot important as horizontal fragmentation die to the seplication of peimary key 3.3 Hybrid Fragmentation © The levels of nesting in most practical applications do not exceed 2, Capers 25 Correctness of Hybrid Fragmentation Reconstruction Starts at the leaves of the partitioning tree and moves, jpward by performing joins and unions © Completeness ¥ Fragmentation is complete ifthe intermediate and leat fragments are complete. @ Disjoinmess ¥ Fragmentation is disjoint ifthe intermediate and leat fragments are disjoint, 4. Allocation © Definition The allocation problem involves finding the optima distribution of relations (fragments) to sites. © Measures of optimality v¥ Minimal cost —cost of storing, querying, updating, and data communication ¥ Performance —to minimize the response time and —to maximize the system throughput at each site Capers 26 4.1 Some example of data placement and allocation (Example 1) Single-relation case Table Table? pase [ome [paves] oice#] [page | pave 2| pave 3 apfaolanfas| [am|aolan a] an | 2s) | ep) (La) | ts) | Gea eofeaanlaa] fenlenlae ts) | pd | Aw | a0 20] 2s) | ao Query: ( (112, 2, 7% pI ae, @ Distributed placement — sited: page(l, 4). site 2: pages(2, 3) = site L: page(i, 2), site 2: pages(3, 4) = site L: paget, 3), site 2 pages(?, 4) (Example 2) Multiple-relation case @olenlen[ealan fen doo} To.) [dd [toy [ibd [te @oTanl ica Tas [ms [is oo fee Tao [oe [ ms) [a Oej4 =n (Ry) JOIN gy -catt Fea fR) © Distributed placement = site L: pageli.2), site 2: pagest3, 4) = site L: paget!.3), site? pagest2. 4) uepers 4.2 A practical combinatorial optimization approach to the file allocation preblem © Assumption ¥ Most files are not fragmented. ¥ Tis unlikely that we will wy to exploit parsllelisea in ows fle allocation. o Each computing facilities have tight limits on their local mass stomge capacity. ¥ Starage is considered to be a constant on the optimization, rather thaaas a cost The transaction traffic is known in advance, ¥ Roads and updates have equal casts ¥ Remote accesses all have the same unit cos. of No redundancy is permitted and fragmentation of file is forbidden, capers st Notation and Constraints Nnodes, indexed by j. capacity M files, indexed by i, size — T wansactions, indexed by &, frequency rom node j = f, ny, accesses fiom transaction é to file x, decision variable 1 —file is allocated to node j, 0— otherwise yx sbvilisisM ‘The goal of FAP - maximize(Z,, X, V,) where V,, = E, fy; (1j,) © cost of local retrievals 28 Algorithm FAP |. Caleulate HC) = J" | Vy= mV), 175 2, Anoptimal set of sy is given by xy = 1 for some j € 50) and xy~ O otherwise Lf this solutions feasible (i.e, mests the constrains), it is our answer: goto step7 Otherwise, identify all nodes which cause the constraints to be broken . For every such aver- subscribed node, solve the corresponding inepsack problem, thereby eliminating a node and the files allocated to that node from further consideration, ‘Consider Ha) for any nodes, which remain. If these are such nodes go to step 2 ‘Otherwise, we have finished opens 58 Example Allocate 8 files among five sites, each with 20 MB disk. Access rates of transactions to files (n,,) — Fle (sel bes} rarsactons [Tao] 7a) Fae] 4) [5] 67 Oe) 1” [0 30 7 5 ‘spies 7 29 The frequency of transactions in sites (f,) Sites z 2 2 4 w 2 | z Transactions ops se Ey 7a a3 | 105 a m5 173 0 rose [260 [1015 ‘Lo 70 = 130 | 382 | 495 red | a1 | 30 Ji) are the yellow elements for each i ‘If we assign x) ~ 1 for these and 0 for the ether entries we have our first solution, Site | hns been allocated §5Mbytes of files. This és not a feasible solution, ‘Site L has been allocated too umueh. ‘The maximum value(V,) we can get from storing my files.on site | is obtained by storing files 1.2, and 8 there. ‘Our new V; table is obtained by eliminating row ! above and column 1. 2. and 8. ‘The new I) are the underlined entries (all allocated to site 3) apts 0 30 ‘Assign x, = 1 to these, x, = 0 to the remainder of the entizs, 3°, Site 3 has been allocated 47 Mstes, 4, Site 3 has been overiondes, ‘The suaxinwa value we can get flom storing flee on site 3 is ebtained by storing files + and 5 there, 6. Ouraew V, table is obtained by eliminating zow'3 aad colurau 4 and 5 fromthe seduced table. New V, Fie (sizerMoytes] ce] 6.0 vai] a fo 95 | ase [30 490 [150 [90 The new j(/) are underlined (all alloested to site 4) 2° Assiga 1 to xy for these entries. 0 forthe rest 3°. Site thas been allocated 29 Mbytes 4”. Site 4 has been overloaded. Store file 3 at site 4 6°". Our new V, table is obtained by eliminating row 2, coluaan 4 from the table above Without spelling out the details, itis clear that the remaining 2 files, 6 and T,are allocated to site Total space used hbytes) 19 So eur sobution is fi 15 18 ul 31 4.3 Database Allocation Problem © DAP is different from FAP ¥ The relationship between fragments should be taken into account ¥ The relationship between the sllocation and query processing should be properly modeled, ¥ FAP do not take into consideration the cost of integrity enforcement. v The cost of enforcing concurrency control mechanisms should be considered. © There are no general heuristic models that take as input a set of fragments and produce a near-optimal allocation subject to the types of constraints discussed here. © We present a relatively general model and then discuss a number of possible heuristics that might be employed to solve it, pers 32 Information Requirements © Database information ~ the selectivity ofa fragment F, with respect to query q: sel(F,) ~ the size ofa fragment F,: size(F,) = cardi) - lengitiF, © Application information = # of tad (write) accesses from, to F, : RR,(UR,) — UM with uy (J 600), RM with ry (1 of 0}, and © with (7) ~ for each query. a maxiniam allowable sesponse time is defined © Site information — for each sit, is storage and processing capacity is defined ~ wait cost of storing daiaat site S,: USC, ~ the cost of processing one unit of workat site S,: LPC, © Network information ~ the communication cost per fame between S, and 5, 3, ~ the size (in bytes) of one frame ; size capers ot Allocation Model © Objectives ‘Mininsize( Total Cost) subject to respense-ime'steage processing conus 1 AEE, is stored at S,. and x, = @ otherwise © Total Cost roc = Forc,+ STC, STCjp: the cost of fragment Fa ste S, STC, USC,» sizalF,) “a QP; : query processing cost of application 4, QPC,~ processing cost (PC;) + transmission cas TC) Capers 33 PC, =access cast (AC,)+ integrity enforcement cast (IE, }+CC cast (CC,) ac, (ily URy + 8p BRyVe Ry LPC, ZR Br oper Solution Methods © The formulation of DAP is NP-complete. © Thus, one has to look for heuristic methods that yield suboptimal solutions. © Heuristic methods — kuapenci: problem solution = braneland-bonad — network flow algorithm: = There 1s nor enough dara to determme how close the results are to the oprtmal. Capers sr

You might also like