TDD: Research Topics in Distributed Databases

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 43

TDD: Research Topics in Distributed Databases

Distributed Databases
Distributed database
Distributed query processing: joins and non-join queries
Updating distributed data

TDD (LN 3)

Distributed databases
Data is stored in several sites (nodes), geographically or

administratively across multiple systems

Data centers

Each site is running an independent DBMS

What do we get?

Increased availability and reliability


Increased parallelism
Complications

Catalog management: distributed data independence and


distributed transaction atomicity
Query processing and optimization: replication and
fragmentation
Increased update costs, concurrency control: locking,
TDD (LN 3)
deadlock, commit protocol,
recovery

Architectures

TDD (LN 3)

Homogeneous vs heterogeneous systems


Homogeneous: identical DBMS, aware of each other, cooperate
Heterogeneous: different schemas/DBMS

Multidatabase system: uniform logical view of the data -common schema


difficult, yet common: system is typically gradually
developed
query answer
global schema
local schema

DBMS
DB

local schema

network DBMS
TDDDB
(LN 3)

local schema
network

DBMS
DB

Architectures

Client-server: client (user interface, front end), server (DBMS)

Client ships query to a server (query shipping)


All query processing at server

query answer

client

client
client-server

server

server

server

TDD (LN 3)

Architectures
Collaborating server: query can span several servers

query answer

server

server
server

collaboratingserver

Middleware:

Coordinator: queries and transactions across servers

TDD (LN 3)

Warehouse architecture
client applications

data warehouse

integrator

monitor/wrapper

monitor/wrapper

RDB

OODB
TDD (LN 3)

monitor/wrapper

XML
7

Monitor/wrapper
A monitor/wrapper for each data source
translation: translate an information source into a common

integrating model
change detection: detect changes to the underlying data source

and propagate the changes to the integrator


active databases (triggers: condition, event, action)
logged sources: inspecting logs

periodic polling, periodic dumps/snapshots


Data cleaning:

detect erroneous/incomplete information to ensure validity


back flushing: return cleaned data to the source
TDD (LN 3)

Integrator
Receive change notifications from the wrapper/monitors and reflect
the changes in the data warehouse.
Typically a rule-based engine:
merging information (data fusion)

handling references
Data cleaning:

removing redundancies and inconsistencies


inserting default values
blocking sources

TDD (LN 3)

When to use data warehouse


Problem: potential inconsistencies with the sources.

Commonly used for relatively static data


when clients require specific, predicable portion of the available

information
when clients require high query performance but not necessarily

the most recent state of the information


when clients want summarized/aggregated information such as

historical information
Examples:
scientific data
historical enterprise data
caching frequently requested information
TDD (LN 3)

10

Data warehouse vs. materialized views


materialized view is over an individual structured database,

while a warehouse is over a collection of heterogeneous,


distributed data sources
materialized view typically has the same form as in the

underlying database, while a warehouse stores highly


integrated and summarized data
materialized view modifications occur within the same

transaction updating its underlying database, while a


warehouse may have to deal with independent sources:
sources simply report changes
sources may not have locking capability
integrator is loosely coupled with the sources
TDD (LN 3)

11

Mediated system architecture

Virtual approach: data is not stored in the middle tier


client applications

Mediator

wrapper

wrapper

RDB

OODB
TDD (LN 3)

wrapper

XML
12

Lazy vs. eager approaches


Lazy approach (mediated systems):
accept a query, determine the appropriate set of data sources,

generate sub-queries for each data source


obtain results from the data sources, perform translation,

filtering and composing, and return the final answer


Eager approach (warehouses):
information from each source that may be of interest is

extracted in advance, translated, filtered, merged with relevant


sources, and stored in a repository
query is evaluated directly against the repository, without

accessing the original information sources


TDD (LN 3)

13

Data warehouse vs. mediated systems


Efficiency

response time: at the warehouse, queries can be answered


efficiently without accessing original data sources.
Advantageous when data sources are slow, expensive or
periodically unavailable, or when translation, filtering and
merging require significant processing
space: warehousing consumes extra storage space
Extensibility: warehouse
consistency with the sources: warehouse data may become out
of date
applicability:
warehouses: for high query performance and static data
mediated systems: for information that changes rapidly
TDD (LN 3)

14

Distributed data storage -- replication


Fragments of a relation are replicated at several sites: R is
fragmented into R1, R2, R3
Why?
Increase availability/reliability: if one site fails
Increase parallelism: faster query evaluation
Increase overhead on updates: consistency
Dynamic issues: synchronous vs. asynchronous
Primary copy: e.g.,
Bank: an account at the site in which it was opened
Airline: an flight at the site from which it originates
R1
Site 1

R2

R3

R2

TDD (LN 3)

Site 2

15

Distributed data storage -- fragmentation


A relation R may be fragmented or partitioned
Horizontal
Vertical: lossless join
Question: how to reconstruct the original R?
eid

name

city

001

joe

NYC

002

mary

NYC

003

grace

EDI

fragmentation: determined by
local ownership
query answer

global schema
local schema
NYC

DBMS
DB

local schema

network DBMS
TDDDB
(LN 3)

local schema
network
EDI

DBMS
DB

16

Transparency, independence
Distributed data transparency (independence):

location (name) transparency


fragmentation

replication transparency (catalog management)


Transaction atomicity: across several sites

All changes persist if the transaction commits


None persists if the transaction aborts
Data independency and transaction atomicity are not supported
currently: the users have to be aware of where data is located

TDD (LN 3)

17

Distributed query processing: joins and non-join queries

TDD (LN 3)

18

Distributed query processing and optimization


New challenges

Data transmission costs (network)


parallelism
Choice of replicas: lowest transmission cost
Fragmentation: to reconstruct the original relation

Query decomposition: query rewriting/unfolding

depending on how data is fragmented/replicated


query answer

decomposition

global schema

local schema
local schema
local schema
sub-query
sub-query
sub-query
network
DBMS network DBMS
DBMS
DB

TDDDB
(LN 3)

DB

19

Non-join queries
Schema: account(acc-num, branch-name, balance)
Query Q: select * from account where branch-name = `EDI
Storage: database DB is horizontally fragmented, based on

branch-name: NYC, Philly, EDI, denoted by DB1, , DBn


DB = DB1 DBn
Processing:
Rewrite Q into Q(DB1) Q(DBn)
Q(DBi) is empty if branch-name <> EDI
Q(DB1), where DB1 is the EDI branch
Q(DB1) = Q(DB1)
Q: select * from account

TDD (LN 3)

20

Simple join processing data shipping

R1
R2
R3
where Ri is at site i, S0 is the site where the query is issued
Option 1: send copies of R1, R2, R3 to S0 and compute the
joins at S0
Option 2:
Send R1 to S2, compute temp1 R1
R2 at S2
Send temp1 to S3, compute temp2 R3
temp1 at S3
Send temp2 to S0
Decision on strategies:
The volume of data shipped
The cost of transmitting a block
Relative speed of processing at each site
TDD (LN 3)

21

Semijoin reduce communication costs

R1

R2, where Ri is at site i,

Compute temp1

(R1 R2) R1 at site 1

projection on join attributes only; assume R1 smaller


Ship temp1 to site 2, instead of the entire relation of R1
Compute temp2 R2
temp1 at S2
Ship temp2 to site 1
compute result R1
temp2 at S1

Effectiveness
If sufficiently small fraction of the relation of R2 contributes to
the join
Additional computation cost may be higher than the savings in
communication costs
TDD (LN 3)

22

Bloomjoin reduce communication costs


R1

R2, where Ri is at site i,

Compute a bit vector of size k by hashing

(R1 R2) R1

bit: set to 1 if some tuple hashes to it


smaller than the projection (constant size)
Ship the vector to site 2, instead of the entire relation of R1
Hash R2 using the same hashing function
Ship to site 1 only those tuples of R2 that also hash to 1, temp1
compute result R1
temp1 at S1
Effectiveness
Less communication costs: bit-vector vs projection
The size of the reduction by hashing may be larger than that of
projection
Question: set difference?
TDD (LN 3)
23

exploring parallelism

Consider R1

R2

R3

R4, where Ri is at site i

temp1 R1

R2, by shipping R1 to site 2

temp2 R3

R4, by shipping R3 to site 4

result temp1

temp2 -- pipelining

Question: R1

R3, using

R2

parallel

Pipelined parallelism
Semi-join
both
TDD (LN 3)

24

Distributed query optimization


The volume of data shipped
The cost of transmitting a block
Relative speed of processing at each site
Site selection: replication

Two-step optimization
At compile time, generate a query plan along the same lines
as centralized DBMS
Every time before the query is executed, transform the plan and
carry out site selection (determine where the operators are to be
executed) dynamic, just site selection
TDD (LN 3)

25

Practice: validation of functional dependencies


A functional dependency (FD) defined on schema R: X Y

For any instance D of R, D satisfies the FD if for any pair of


tuples t and t, if t[X] = t[X], then t[Y] = t[Y]
Violations of the FD in D:
{ t | there exists t in D, such that
t[X] = t[X],
t[Y] t[Y] }
horizontally
or but
vertically
Now suppose that D is fragmented and distributed
Develop an algorithm that given fragmented and distributed D

and an FD, computes all the violations of the FD in D


semijoin
Minimize data shipment
bloomjoin
Questions: what can we do if we are given a set of FDs to
validate?
TDD (LN 3)

26

Updating distributed data

TDD (LN 3)

27

Updating Distributed data


Fragmentation: an update may go across several sites

Local transaction
Global transaction
Replication: multiple copies of the same data -- consistency
query answer

global schema
updates

local schema

local schema

DBMS

network DBMS

DB

propagate DB
TDD (LN 3)

local schema
network

DBMS
DB
28

System structure
Local transaction manager: either local transaction or part of a

global transaction
Maintain a log for recovery
Concurrency control for transactions at the site
Transaction coordinator (not in centralized DBMS)
Start the execution of a transaction
Break a transaction into a number of sub-transactions and
distribute them to the appropriate sites
Coordinate the termination of the transaction
Commit at all sites
Abort at all sites
TDD (LN 3)

29

Two-Phase Commit protocol (2PC): Phase 1


Transaction T; the transaction coordinator is at C

P1
(1) <prepare T>

C
(2) <ready T>

log

P2

<ready T>

prepare T
commit T
abort T

log

if all
responses
(2)
T>
are<abort
<ready>
if one of the
responses is
<abort>

P3

TDD (LN 3)

<no T>

log

30

Two-Phase Commit protocol (2PC): Phase 2


Transaction T; the transaction coordinator is at C

P1
(3) <commit T>

C
(4) <ack T>

log

P2

<ready T>
<commit T>

prepare T
commit T
complete T

(4) <ack T>

P3
<ready T>

log

Similarly for abort

<commit T>
TDD (LN 3)

log

31

Comments on 2PC
Two rounds of communication: both initiated at the coordinator

Voting
Terminating
Any site can decide to abort a transaction
Every message reflects a decision by the sender;
The decision is recorded in the log to ensure the decision
survives any failure: transaction id
The log at each participating site: the id of the coordinator
The log at the coordinator: ids of the participating sites

TDD (LN 3)

32

Concurrency control
Single local manager: at a chosen site

Simple implementation and deadlock handling


Bottleneck
Vulnerability: if the site fails
Distributed lock manager: each site has one to update data item

D at site j
Send request to the lock manager at site j
Request is either granted or delayed
Deadlock handling is hard
Major complication: replicated data
TDD (LN 3)

33

replication
Synchronous replication: all copies must be updated before the

transaction commits
data distribution transparent, consistent
expensive
Asynchronous replication: copies are periodically updated
Allow modifying transaction to commit before all copies have
been changed
users are aware of data distribution, consistency issues
Cheaper: current products follow this approach
Peer-to-peer replication (master copies)
Primary site replication (only the primary is updateable)

TDD (LN 3)

34

Synchronous replication
Majority approach -- voting: data item D replicated at n sites

A lock request is sent to more than one-half of the sites


D is locked if the majority vote yes, write n/2 + 1 copies
Each copy maintains a version number
Expensive
2(n/2 + 1) messages for lock
Read: at least n/2 + 1 copies to make sure it is current
Deadlock is more complicated: if only one copy is locked

TDD (LN 3)

35

Synchronous replication (cont.)


Majority approach -- voting
Biased protocol: read-any write-all.

Shared lock (read): simply requests a lock on D at one site


that contains a copy of D
Exclusive lock (write): lock on all sites that contain a copy
Less overhead on read, expensive on write
Commonly used approach to synchronous replication

TDD (LN 3)

36

Synchronous replication -- exercise


A distributed system uses the majority (voting) approach to update
data replicas. Suppose that a data item D is replicated at 4
different sites: S1, S2, S3, S4. What should be done if a site S
wants to
Write D
Read D

TDD (LN 3)

37

Synchronous replication -- answer


A distributed system uses majority the (voting) approach to update
data replicas. Suppose that a data item D is replicated at 4
different sites: S1, S2, S3, S4. What should be done if a site S
wants to
Write D
The site S sends a lock request to any 3 of S1, S2, S3, S4
The write operation is conducted if the lock is granted by the
lock manager of all the 3 sites; otherwise it is delayed until
the lock can be granted
Read D
The site S reads copies of D from at least 3 sites
It picks the copy with the highest version number
TDD (LN 3)

38

Asynchronous replication
Primary site: choose exactly one copy, residing at a primary site

A lock request is sent to the primary site


Replicas at other sites may not be updated; they are
secondary to the primary copy
Simple to implement
Main issues:
D becomes inaccessible if the primary site fails
Propagation of changes from the primary site to others

TDD (LN 3)

39

Asynchronous replication (cont.)


Primary site
Peer-to-peer: more than one of the copies can be a master

Change to a master copy must be propagated to others


Conflicts of changes to two copies have to be resolved
Best used when conflicts do not arise: e.g.,
Each master site owns a distinct fragment
Updating rights owned by one master at a time

TDD (LN 3)

40

Distributed deadlock detection


Recall wait-for graph

Nodes: transactions
Edges: T1 T2 if T1 requests a resource being held by T2
Local wait-for graph:

Nodes: all transactions local or holding/requesting data item


local to the site
T1 T2 if T1 (at site 1) requests a resource being held by
T2 (at site 2)
Global wait-for graph: union of local wait-for graphs

Deadlock if it contains a cycle


T1

T2

T2

T5

T3
Site 1

T3

T4

TDD (LN 3)

Site 2

T1

T2

T5

T3

T4

global

41

False cycles
Due to communication delay
Site 1: local wait-for graph has T1 T2
T2 releases the resource deletion of the edge
Site 2: T2 requests the resource again addition of T2 T1
Cycle if insert T2 T1 arrives before removal of T1 T2
Centralized deadlock detection deadlock detection coordinator
Constructs/maintains global wait-for graph
Detect cycles
If it finds a cycle, chose a victim to be rolled back
Distributed deadlock manager? More expensive

T1
Site 1

T2

T1

T2
TDD (LN 3)
Site 2

T1

T2
global

42

Summary and review


Homogeneous vs heterogeneous systems
Replication and fragmentation. Pros and cons of replication.

How to reconstruct a fragmented relation (vertical, horizontal)?


Simple join (data shipping), semijoin, bloomjoin
set-difference? Intersection? Aggregation?
Transaction manager and coordinator. Responsibilities?
Describe 2PC. Recovery: coordinator and participating sites
Replication:
majority, read-any write-all,
primary site, peer-to-peer
Local wait-for graph, global wait-for graph, deadlock detection,
deadlock handling
TDD (LN 3)

43

You might also like