Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
A distributed system connects three databases: hq, mfg, and sales An application can simultaneously access or modify the data in several databases in a single distributed environment.
In a heterogeneous distributed database system, at least one of the databases uses different schemas and software.
A database system having different schema may cause a major problem for query processing. A database system having different software may cause a major problem for transaction processing.
Fragmentation
Relation is partitioned into several fragments stored in distinct sites
Advantages of Replication
Availability: failure of site containing relation r does not result in unavailability of r is replicas exist. Parallelism: queries on r may be processed by several nodes in parallel. Reduced data transfer: relation r is available locally at each site containing a replica of r.
Disadvantages of Replication
Increased cost of updates: each replica of relation r must be updated. Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented.
One solution: choose one copy as primary copy and apply concurrency control operations on primary copy.
Fragmentation
Data can be distributed by storing individual tables at different sites Data can also be distributed by decomposing a table and storing portions at different sites called Fragmentation Fragmentation can be horizontal or vertical
Horizontal Fragmentation
Each fragment, Ti , of table T contains a subset of the rows Each tuple of T is assigned to one or more fragments. Horizontal fragmentation is lossless
A bank account schema has a relation Account-schema = (branch-name, account-number, balance). It fragments the relation by location and stores each fragment locally: rows with branch-name = `Hillside` are stored in the Hillside in a fragment
Vertical Fragmentation
Each fragment, Ti, of T contains a subset of the columns, each column is in at least one fragment, and each fragment includes the key: Ti = attr_listi (T) T = T1 T2 .. Tn
All schemas must contain a common candidate key (or superkey) to ensure lossless join property. A special attribute, the tuple-id attribute may be added to each schema to serve as a candidate key.
A employee-info schema has a relation employee-info schema = (designation, name, Employee-id, salary). It fragments the relation to put information in two tables for security concern.
Commit Protocols
Commit protocols are used to ensure atomicity across sites
Atomicity states that database modifications must follow an all or nothing rule. a transaction which executes at multiple sites must either be committed at all the sites, or aborted at all the sites.