TDD: Research Topics in Distributed Databases

TDD: Research Topics in Distributed Databases
Distributed Databases
Distributed database
Distributed query processing: joins and non-join queries
Updating distributed data
TDD (LN 3)
Distributed databases
Data is stored in several sites (nodes), geographically or
administratively across multiple systems
Data centers
Each site is running an independent DBMS
What do we get?
Increased availability and reliability

Increased parallelism
Complications
Catalog management: distributed data independence and

distributed transaction atomicity
Query processing and optimization: replication and
fragmentation
Increased update costs, concurrency control: locking,
TDD (LN 3)
deadlock, commit protocol,
recovery
Architectures
TDD (LN 3)
Homogeneous vs heterogeneous systems

Homogeneous: identical DBMS, aware of each other, cooperate
Heterogeneous: different schemas/DBMS
Multidatabase system: uniform logical view of the data -common schema

difficult, yet common: system is typically gradually
developed
query answer
global schema
local schema
DBMS
DB
local schema
network DBMS
TDDDB
(LN 3)
local schema
network
DBMS
DB
Architectures
Client-server: client (user interface, front end), server (DBMS)
Client ships query to a server (query shipping)

All query processing at server
query answer
client
client
client-server
server
server
server
TDD (LN 3)
Architectures
Collaborating server: query can span several servers
query answer
server
server
server
collaboratingserver
Middleware:
Coordinator: queries and transactions across servers
TDD (LN 3)
Warehouse architecture
client applications
data warehouse
integrator
monitor/wrapper
monitor/wrapper
RDB
OODB
TDD (LN 3)
monitor/wrapper
XML
7
Monitor/wrapper
A monitor/wrapper for each data source
translation: translate an information source into a common
integrating model
change detection: detect changes to the underlying data source
and propagate the changes to the integrator

active databases (triggers: condition, event, action)
logged sources: inspecting logs
periodic polling, periodic dumps/snapshots

Data cleaning:
detect erroneous/incomplete information to ensure validity

back flushing: return cleaned data to the source
TDD (LN 3)
Integrator
Receive change notifications from the wrapper/monitors and reflect
the changes in the data warehouse.
Typically a rule-based engine:
merging information (data fusion)
handling references
Data cleaning:
removing redundancies and inconsistencies

inserting default values
blocking sources
TDD (LN 3)
When to use data warehouse

Problem: potential inconsistencies with the sources.
Commonly used for relatively static data

when clients require specific, predicable portion of the available
information
when clients require high query performance but not necessarily
the most recent state of the information

when clients want summarized/aggregated information such as
historical information
Examples:
scientific data
historical enterprise data
caching frequently requested information
TDD (LN 3)
10
Data warehouse vs. materialized views

materialized view is over an individual structured database,
while a warehouse is over a collection of heterogeneous,

distributed data sources
materialized view typically has the same form as in the
underlying database, while a warehouse stores highly

integrated and summarized data
materialized view modifications occur within the same
transaction updating its underlying database, while a

warehouse may have to deal with independent sources:
sources simply report changes
sources may not have locking capability
integrator is loosely coupled with the sources
TDD (LN 3)
11
Mediated system architecture
Virtual approach: data is not stored in the middle tier

client applications
Mediator
wrapper
wrapper
RDB
OODB
TDD (LN 3)
wrapper
XML
12
Lazy vs. eager approaches

Lazy approach (mediated systems):
accept a query, determine the appropriate set of data sources,
generate sub-queries for each data source

obtain results from the data sources, perform translation,
filtering and composing, and return the final answer

Eager approach (warehouses):
information from each source that may be of interest is
extracted in advance, translated, filtered, merged with relevant

sources, and stored in a repository
query is evaluated directly against the repository, without
accessing the original information sources

TDD (LN 3)
13
Data warehouse vs. mediated systems

Efficiency
response time: at the warehouse, queries can be answered

efficiently without accessing original data sources.
Advantageous when data sources are slow, expensive or
periodically unavailable, or when translation, filtering and
merging require significant processing
space: warehousing consumes extra storage space
Extensibility: warehouse
consistency with the sources: warehouse data may become out
of date
applicability:
warehouses: for high query performance and static data
mediated systems: for information that changes rapidly
TDD (LN 3)
14
Distributed data storage -- replication

Fragments of a relation are replicated at several sites: R is
fragmented into R1, R2, R3
Why?
Increase availability/reliability: if one site fails
Increase parallelism: faster query evaluation
Increase overhead on updates: consistency
Dynamic issues: synchronous vs. asynchronous
Primary copy: e.g.,
Bank: an account at the site in which it was opened
Airline: an flight at the site from which it originates
R1
Site 1
R2
R3
R2
TDD (LN 3)
Site 2
15
Distributed data storage -- fragmentation

A relation R may be fragmented or partitioned
Horizontal
Vertical: lossless join
Question: how to reconstruct the original R?
eid
name
city
001
joe
NYC
002
mary
NYC
003
grace
EDI
fragmentation: determined by
local ownership
query answer
global schema
local schema
NYC
DBMS
DB
local schema
network DBMS
TDDDB
(LN 3)
local schema
network
EDI
DBMS
DB
16
Transparency, independence
Distributed data transparency (independence):
location (name) transparency

fragmentation
replication transparency (catalog management)

Transaction atomicity: across several sites
All changes persist if the transaction commits

None persists if the transaction aborts
Data independency and transaction atomicity are not supported
currently: the users have to be aware of where data is located
TDD (LN 3)
17
Distributed query processing: joins and non-join queries
TDD (LN 3)
18
Distributed query processing and optimization

New challenges
Data transmission costs (network)

parallelism
Choice of replicas: lowest transmission cost
Fragmentation: to reconstruct the original relation
Query decomposition: query rewriting/unfolding
depending on how data is fragmented/replicated

query answer
decomposition
global schema
local schema
local schema
local schema
sub-query
sub-query
sub-query
network
DBMS network DBMS
DBMS
DB
TDDDB
(LN 3)
DB
19
Non-join queries
Schema: account(acc-num, branch-name, balance)
Query Q: select * from account where branch-name = `EDI
Storage: database DB is horizontally fragmented, based on
branch-name: NYC, Philly, EDI, denoted by DB1, , DBn

DB = DB1 DBn
Processing:
Rewrite Q into Q(DB1) Q(DBn)
Q(DBi) is empty if branch-name <> EDI
Q(DB1), where DB1 is the EDI branch
Q(DB1) = Q(DB1)
Q: select * from account
TDD (LN 3)
20
Simple join processing data shipping
R1
R2
R3
where Ri is at site i, S0 is the site where the query is issued
Option 1: send copies of R1, R2, R3 to S0 and compute the
joins at S0
Option 2:
Send R1 to S2, compute temp1 R1
R2 at S2
Send temp1 to S3, compute temp2 R3
temp1 at S3
Send temp2 to S0
Decision on strategies:
The volume of data shipped
The cost of transmitting a block
Relative speed of processing at each site
TDD (LN 3)
21
Semijoin reduce communication costs
R1
R2, where Ri is at site i,
Compute temp1
(R1 R2) R1 at site 1
projection on join attributes only; assume R1 smaller

Ship temp1 to site 2, instead of the entire relation of R1
Compute temp2 R2
temp1 at S2
Ship temp2 to site 1
compute result R1
temp2 at S1
Effectiveness
If sufficiently small fraction of the relation of R2 contributes to
the join
Additional computation cost may be higher than the savings in
communication costs
TDD (LN 3)
22
Bloomjoin reduce communication costs

R1
R2, where Ri is at site i,
Compute a bit vector of size k by hashing
(R1 R2) R1
bit: set to 1 if some tuple hashes to it

smaller than the projection (constant size)
Ship the vector to site 2, instead of the entire relation of R1
Hash R2 using the same hashing function
Ship to site 1 only those tuples of R2 that also hash to 1, temp1
compute result R1
temp1 at S1
Effectiveness
Less communication costs: bit-vector vs projection
The size of the reduction by hashing may be larger than that of
projection
Question: set difference?
TDD (LN 3)
23
exploring parallelism
Consider R1
R2
R3
R4, where Ri is at site i
temp1 R1
R2, by shipping R1 to site 2
temp2 R3
R4, by shipping R3 to site 4
result temp1
temp2 -- pipelining
Question: R1
R3, using
R2
parallel
Pipelined parallelism
Semi-join
both
TDD (LN 3)
24
Distributed query optimization

The volume of data shipped
The cost of transmitting a block
Relative speed of processing at each site
Site selection: replication
Two-step optimization
At compile time, generate a query plan along the same lines
as centralized DBMS
Every time before the query is executed, transform the plan and
carry out site selection (determine where the operators are to be
executed) dynamic, just site selection
TDD (LN 3)
25
Practice: validation of functional dependencies

A functional dependency (FD) defined on schema R: X Y
For any instance D of R, D satisfies the FD if for any pair of

tuples t and t, if t[X] = t[X], then t[Y] = t[Y]
Violations of the FD in D:
{ t | there exists t in D, such that
t[X] = t[X],
t[Y] t[Y] }
horizontally
or but
vertically
Now suppose that D is fragmented and distributed
Develop an algorithm that given fragmented and distributed D
and an FD, computes all the violations of the FD in D

semijoin
Minimize data shipment
bloomjoin
Questions: what can we do if we are given a set of FDs to
validate?
TDD (LN 3)
26
Updating distributed data
TDD (LN 3)
27
Updating Distributed data

Fragmentation: an update may go across several sites
Local transaction
Global transaction
Replication: multiple copies of the same data -- consistency
query answer
global schema
updates
local schema
local schema
DBMS
network DBMS
DB
propagate DB
TDD (LN 3)
local schema
network
DBMS
DB
28
System structure
Local transaction manager: either local transaction or part of a
global transaction
Maintain a log for recovery
Concurrency control for transactions at the site
Transaction coordinator (not in centralized DBMS)
Start the execution of a transaction
Break a transaction into a number of sub-transactions and
distribute them to the appropriate sites
Coordinate the termination of the transaction
Commit at all sites
Abort at all sites
TDD (LN 3)
29
Two-Phase Commit protocol (2PC): Phase 1

Transaction T; the transaction coordinator is at C
P1
(1) <prepare T>
C
(2) <ready T>
log
P2
<ready T>
prepare T
commit T
abort T
log
if all
responses
(2)
T>
are<abort
<ready>
if one of the
responses is
<abort>
P3
TDD (LN 3)
<no T>
log
30
Two-Phase Commit protocol (2PC): Phase 2

Transaction T; the transaction coordinator is at C
P1
(3) <commit T>
C
(4) <ack T>
log
P2
<ready T>
<commit T>
prepare T
commit T
complete T
(4) <ack T>
P3
<ready T>
log
Similarly for abort
<commit T>
TDD (LN 3)
log
31
Comments on 2PC
Two rounds of communication: both initiated at the coordinator
Voting
Terminating
Any site can decide to abort a transaction
Every message reflects a decision by the sender;
The decision is recorded in the log to ensure the decision
survives any failure: transaction id
The log at each participating site: the id of the coordinator
The log at the coordinator: ids of the participating sites
TDD (LN 3)
32
Concurrency control
Single local manager: at a chosen site
Simple implementation and deadlock handling

Bottleneck
Vulnerability: if the site fails
Distributed lock manager: each site has one to update data item
D at site j
Send request to the lock manager at site j
Request is either granted or delayed
Deadlock handling is hard
Major complication: replicated data
TDD (LN 3)
33
replication
Synchronous replication: all copies must be updated before the
transaction commits
data distribution transparent, consistent
expensive
Asynchronous replication: copies are periodically updated
Allow modifying transaction to commit before all copies have
been changed
users are aware of data distribution, consistency issues
Cheaper: current products follow this approach
Peer-to-peer replication (master copies)
Primary site replication (only the primary is updateable)
TDD (LN 3)
34
Synchronous replication
Majority approach -- voting: data item D replicated at n sites
A lock request is sent to more than one-half of the sites

D is locked if the majority vote yes, write n/2 + 1 copies
Each copy maintains a version number
Expensive
2(n/2 + 1) messages for lock
Read: at least n/2 + 1 copies to make sure it is current
Deadlock is more complicated: if only one copy is locked
TDD (LN 3)
35
Synchronous replication (cont.)

Majority approach -- voting
Biased protocol: read-any write-all.
Shared lock (read): simply requests a lock on D at one site

that contains a copy of D
Exclusive lock (write): lock on all sites that contain a copy
Less overhead on read, expensive on write
Commonly used approach to synchronous replication
TDD (LN 3)
36
Synchronous replication -- exercise

A distributed system uses the majority (voting) approach to update
data replicas. Suppose that a data item D is replicated at 4
different sites: S1, S2, S3, S4. What should be done if a site S
wants to
Write D
Read D
TDD (LN 3)
37
Synchronous replication -- answer

A distributed system uses majority the (voting) approach to update
data replicas. Suppose that a data item D is replicated at 4
different sites: S1, S2, S3, S4. What should be done if a site S
wants to
Write D
The site S sends a lock request to any 3 of S1, S2, S3, S4
The write operation is conducted if the lock is granted by the
lock manager of all the 3 sites; otherwise it is delayed until
the lock can be granted
Read D
The site S reads copies of D from at least 3 sites
It picks the copy with the highest version number
TDD (LN 3)
38
Asynchronous replication
Primary site: choose exactly one copy, residing at a primary site
A lock request is sent to the primary site

Replicas at other sites may not be updated; they are
secondary to the primary copy
Simple to implement
Main issues:
D becomes inaccessible if the primary site fails
Propagation of changes from the primary site to others
TDD (LN 3)
39
Asynchronous replication (cont.)

Primary site
Peer-to-peer: more than one of the copies can be a master
Change to a master copy must be propagated to others

Conflicts of changes to two copies have to be resolved
Best used when conflicts do not arise: e.g.,
Each master site owns a distinct fragment
Updating rights owned by one master at a time
TDD (LN 3)
40
Distributed deadlock detection

Recall wait-for graph
Nodes: transactions
Edges: T1 T2 if T1 requests a resource being held by T2
Local wait-for graph:
Nodes: all transactions local or holding/requesting data item

local to the site
T1 T2 if T1 (at site 1) requests a resource being held by
T2 (at site 2)
Global wait-for graph: union of local wait-for graphs
Deadlock if it contains a cycle

T1
T2
T2
T5
T3
Site 1
T3
T4
TDD (LN 3)
Site 2
T1
T2
T5
T3
T4
global
41
False cycles
Due to communication delay
Site 1: local wait-for graph has T1 T2
T2 releases the resource deletion of the edge
Site 2: T2 requests the resource again addition of T2 T1
Cycle if insert T2 T1 arrives before removal of T1 T2
Centralized deadlock detection deadlock detection coordinator
Constructs/maintains global wait-for graph
Detect cycles
If it finds a cycle, chose a victim to be rolled back
Distributed deadlock manager? More expensive
T1
Site 1
T2
T1
T2
TDD (LN 3)
Site 2
T1
T2
global
42
Summary and review

Homogeneous vs heterogeneous systems
Replication and fragmentation. Pros and cons of replication.
How to reconstruct a fragmented relation (vertical, horizontal)?

Simple join (data shipping), semijoin, bloomjoin
set-difference? Intersection? Aggregation?
Transaction manager and coordinator. Responsibilities?
Describe 2PC. Recovery: coordinator and participating sites
Replication:
majority, read-any write-all,
primary site, peer-to-peer
Local wait-for graph, global wait-for graph, deadlock detection,
deadlock handling
TDD (LN 3)
43

TDD: Research Topics in Distributed Databases

Uploaded by

Copyright:

Available Formats

TDD: Research Topics in Distributed Databases

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TDD: Research Topics in Distributed Databases

Uploaded by

Copyright:

Available Formats

TDD: Research Topics in Distributed Databases

administratively across multiple systems

Each site is running an independent DBMS

Increased availability and reliability

Catalog management: distributed data independence and

Homogeneous vs heterogeneous systems

Multidatabase system: uniform logical view of the data -common schema

Client-server: client (user interface, front end), server (DBMS)

Client ships query to a server (query shipping)

Coordinator: queries and transactions across servers

and propagate the changes to the integrator

periodic polling, periodic dumps/snapshots

detect erroneous/incomplete information to ensure validity

removing redundancies and inconsistencies

When to use data warehouse

Commonly used for relatively static data

the most recent state of the information

Data warehouse vs. materialized views

while a warehouse is over a collection of heterogeneous,

underlying database, while a warehouse stores highly

transaction updating its underlying database, while a

Mediated system architecture

Virtual approach: data is not stored in the middle tier

Lazy vs. eager approaches

generate sub-queries for each data source

filtering and composing, and return the final answer

extracted in advance, translated, filtered, merged with relevant

accessing the original information sources

Data warehouse vs. mediated systems

response time: at the warehouse, queries can be answered

Distributed data storage -- replication

Distributed data storage -- fragmentation

location (name) transparency

replication transparency (catalog management)

All changes persist if the transaction commits

Distributed query processing: joins and non-join queries

Distributed query processing and optimization

Data transmission costs (network)

Query decomposition: query rewriting/unfolding

depending on how data is fragmented/replicated

branch-name: NYC, Philly, EDI, denoted by DB1, , DBn

Simple join processing data shipping

Semijoin reduce communication costs

R2, where Ri is at site i,

(R1 R2) R1 at site 1

projection on join attributes only; assume R1 smaller

Bloomjoin reduce communication costs

R2, where Ri is at site i,

Compute a bit vector of size k by hashing

bit: set to 1 if some tuple hashes to it

R4, where Ri is at site i

R2, by shipping R1 to site 2

R4, by shipping R3 to site 4

Distributed query optimization

Practice: validation of functional dependencies

For any instance D of R, D satisfies the FD if for any pair of

and an FD, computes all the violations of the FD in D

Updating distributed data

Updating Distributed data