0% found this document useful (0 votes)
3 views

Parallel Database System

The document discusses Parallel Database Systems, emphasizing the benefits of parallelism in processing large databases and transactions. It covers various types of parallelism, including I/O, inter-query, and intra-query parallelism, along with partitioning techniques and their implications on performance. Additionally, it highlights the importance of cache coherency and presents algorithms for parallel operations such as sorting and joining in a distributed environment.

Uploaded by

ashis.bbsr2012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Parallel Database System

The document discusses Parallel Database Systems, emphasizing the benefits of parallelism in processing large databases and transactions. It covers various types of parallelism, including I/O, inter-query, and intra-query parallelism, along with partitioning techniques and their implications on performance. Additionally, it highlights the importance of cache coherency and presents algorithms for parallel operations such as sorting and joining in a distributed environment.

Uploaded by

ashis.bbsr2012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Parallel Database

Systems

Samaresh Mishra,
School of Computer Engineering,
KIIT University, Bhubaneswar
Parallel Database Systems
 Introduction
 I/O Parallelism
 Inter-query Parallelism
 Intra-query Parallelism
 Intra-operation Parallelism
 Inter-operation Parallelism
 Design of Parallel Systems

Parallel Database System 2


Why Parallel Systems?
 It improves processing and I/O speed by
using multiple CPUs and disk in parallel.
 To process the applications that have to query
extremely large databases ( order of
terabytes, 1012 bytes.).
 To process a large number of transactions
per second.
 In parallel processing, many operations are
processed simultaneously, as opposed to
serial processing.
Parallel Database System 3
Parallel Machines
 A coarse-grain parallel machines consists of
a small number of powerful processors.
 A massive-grain parallel machine (fine-grain)
uses thousands of smaller processors.
 Two main measures of the performance of
parallel database system:
– Throughput
– Response time.

Parallel Database System 4


Parallel Machines…Example
 Objectivity/DB
 IBM_DB2
 Oracle Parallel Server Management
 The HP Parallel Database Clusters (PDC) for
Linux are multi-node shared storage clusters,
specifically designed, tested and optimized for
Oracle 10g grid environments and Oracle
Real Application Clusters (RAC) databases.

Parallel Database System 5


Why parallelism!!!
 Parallel machines are becoming quite
common and affordable
o Prices of microprocessors, memory and
disks have dropped sharply
 Databases are growing increasingly large
o Large volumes of transaction data are
collected and stored for later analysis.
o Multimedia objects like images are
increasingly stored in databases

Parallel Database System 6


Why parallelism…
 Large-scale parallel database systems
increasingly used for:
o Storing large volumes of data
o Processing time-consuming decision-support
queries
o Providing high throughput for transaction
processing

Parallel Database System 7


Parallelism in Databases
 Data can be partitioned across multiple
disks for parallel I/O.
 Individual relational operations (e.g., sort,
join, aggregation) can be executed in
parallel
o data can be partitioned and each processor
can work independently on its own partition.

Parallel Database System 8


Parallelism in Databases
 Queries are expressed in high level
language (SQL, translated to relational
algebra)
o makes parallelization easier.
 Different queries can be run in parallel
with each other. Concurrency control
takes care of conflicts.
 Thus, databases naturally lend
themselves to parallelism.

Parallel Database System 9


I/O Parallelism
 I/O parallelism refers to reduction on the
time required to retrieve relations
from disk by partitioning the relations on
multiple disks.
 Horizontal partitioning (most common
partitioning technique)
– tuples of a relation are divided among many
disks such that each tuples resides on one
disk.

Parallel Database System 10


Partitioning techniques
Assume: number of disks = n
 Round-robin:
o Send the ith tuple inserted in the relation to disk i
mod n.
o Ensure even distribution of tuples across disks.
 Hash partitioning:
o Choose one or more attributes as the partitioning
attributes.
o Choose hash function h with range 0…n – 1.
o Let i denote result of hash function h applied to the
partitioning attribute value of a tuple. Send this tuple
to disk i.
Parallel Database System 11
Partitioning techniques
 Range partitioning:
o Choose an attribute A as the partitioning attribute.
o A partitioning vector [vo, v1, ..., vn-2] is
chosen.
o Let v be the partitioning attribute value of a tuple.
o Consider a tuple t such that t[A] = x.
 If x < vo,then t goes to disk D0.
 If x ≥ vn-2 then t goes to disk Dn-1.
 In general, if vi ≤ x < vi+1, then t goes to disk Di+1.

Parallel Database System 12


Comparison of partitioning
Techniques:
 Once partitioned, the transfer rate for reading
and/or writing of an entire relation are much
faster with I/O parallelism than without it.
 Access to data can be classified into:
1) Scanning the entire relation
2) Point Queries (seek a tuple that have a specific
value)
3) Range Queries (seek tuples that have a range
of values)

Parallel Database System 13


Comparison of partitioning
Techniques:
Round-Robin:
 Ideally suited for sequentially reading of
the entire relation.
 Both point and range queries are complicated.
(since each of n disks have to be accessed)

Parallel Database System 14


Comparison of partitioning
Techniques:
Hash-Partitioning:
• Best suited for Point Queries based on the
partitioning attributes.
• Also suitable for sequential scans for entire
relations.
• Not suitable for point queries based on non-
partitioning attributes.
• Not suitable for range queries.

Parallel Database System 15


Comparison of partitioning
Techniques:
Range-Partitioning:
 Well suited for both Range and Point
Queries based on the partitioning
attribute.
 For Point queries: consult the partitioning
vector to locate the disk.
 For range queries: consult the partitioning
vector to find the range of disks.

Parallel Database System 16


Comparison of partitioning
Techniques:
Range-Partitioning:
Advantages
• If there are few tuples in the queried range, then
the query is typically sent to one disk, and other
disk can be assigned to other queries => Higher
throughput while maintaining good response
time.
Disadvantages
• If many tuples in the queried range, then more
disk have to be accessed => I/O bottleneck.
• Results execution skew.
Parallel Database System 17
Comparison of partitioning
Techniques:
 If a relation contains very few tuples that can fit
in a single disk block, then the relation should
be assigned to a single disk.
 Large relations are preferably partitioned
across all the available disk.
 If a relation consists of m disk blocks and
there are n disks available, then the relation
should be allocated into min(m,n) disks.

Parallel Database System 18


Skew
 The distribution of tuples may be Skewed:
“A higher percentage of tuples may resides in
some partition and fewer tuples in another
partition.”
 Attribute value skew
– Some values appear in the partitioning attributes of
many tuples; all the tuples with the same value
for the partitioning attribute end up in the same
partition.
– Can occur with range-partitioning and hash-
partitioning.
Parallel Database System 19
Skew
 Partition skew (load imbalance)
– With range-partitioning, badly chosen partition
vector may assign too many tuples to some
partitions and too few to others.

– Less likely with hash-partitioning if a good hash-


function is chosen.

Parallel Database System 20


Skew
 Skew becomes an increasing problem with a
higher degree of parallelism.
Example:
1000 tuples divided into 10 parts (100 tuples in each part)
If one part is 200 tuples (Skewed), then speed up in accessing the
partitions in parallel = 1000/200 = 5 in stead of 10.

• If partitioning attribute is the key, then a good partitioning


vector can be constructed by means of sorting.
• Let n denotes number of partitions to be constructed. 1/n
tuples to be put in one partition, while reading the tuples
sequentially.
Parallel Database System 21
Inter-query Parallelism:
 Different queries or transactions execute in
parallel with one another.
 Increases transaction throughput
 Response time of individual transactions are no
faster

 Easiest form of parallelism to support,


particularly in a shared-memory parallel
database, because even sequential database
systems support concurrent processing.

Parallel Database System 22


Inter-query Parallelism:
 More complicated to implement on
shared-disk or shared-nothing
architectures
– Locking and logging must be coordinated by
passing messages between processors.
– Must ensure that two processors do not
update the same data independently at the
same time.
– Cache-coherency has to be maintained —
reads and writes of data in buffer must find
latest version of data.
Parallel Database System 23
Cache Coherency Protocol for
Shared-disk
• Example of a cache coherency protocol for
shared disk systems:
– Before reading/writing to a page, the page must be locked in
shared/exclusive mode, as appropriate.
– On locking a page, the page must be read from disk
– Before unlocking a page, the page must be written to disk if it
was modified.

• More complex protocols with fewer disk reads/writes


exist.
• Oracle, Oracle Rdb systems are shared-disk parallel
database systems supporting inter-query parallelism.

Parallel Database System 24


Cache Coherency Protocol for
Shared-nothing
 Cache coherency protocols for shared-
nothing systems are similar.
– Each database page is assigned a home processor.
Requests to fetch the page or write it to disk are sent
to the home processor, since others can not directly
communicate with the disk.

Parallel Database System 25


Intra-query Parallelism:
• Execution of a single query in parallel on
multiple processors/disks; important for
speeding up long-running queries.
• Two complementary forms of intra-query
parallelism (can be used simultaneously):
1. Intra-operation Parallelism – parallelize the
execution of each individual operation in the query.
2. Inter-operation Parallelism – execute the different
operations in a query expression in parallel.

The first form scales better with increasing


parallelism because the number of tuples processed
by each operation is typically more than the number
of operations in a query
Parallel Database System 26
Parallel Processing of Relational
Operations:
• Our discussion of parallel algorithms assumes:
1) read-only queries
2) shared-nothing architecture
3) n processors, P0, ..., Pn-1, and n disks D0, ..., Dn-1, where
disk Di is associated with processor Pi.

• If a processor has multiple disks they can simply


simulate a single disk Di.
• Shared-nothing architectures can be efficiently
simulated on shared-memory and shared-disk
systems.
– Algorithms for shared-nothing systems can thus be
run on shared-memory and shared-disk systems.
– However, some optimizations may be possible.
Parallel Database System 27
Parallel Sort:
Sorting relation r which resides on n disks
D0,D1,…,Dn-1.

 If r is range-partitioned on the sorting


attribute, then do the following:
I. Sort the partitions separately.
II. Then concatenate the results to get the full
sorted relation r.

Parallel Database System 28


Parallel Sort:
Sorting relation r which resides on n disks
D0,D1,…,Dn-1.

 If r is partitioned in any other way, then


adopt any one of the following approaches:
1) Range-partitioning sort
2) Parallel external merge-sort

Parallel Database System 29


Parallel Sort:
1) Range-Partitioning Sort
 First range partitioning the relation( not
necessarily on the same set of processors or
disks),
 Then, sorting each partition separately.
2) Parallel External Sort-Merge
 Each processor Pi locally sorts the data on disk
Di and
 Then the system merges the sorted runs on
each processor to get the final sorted outputs.
Parallel Database System 30
1) Range-Partitioning Sort
 Choose processors P0, ..., Pm, where m n -1
to do sorting.

 Create range-partition vector with m


entries, on the sorting attributes, ( i = 0 to m )
1. Redistribute the relation using range partitioning
 All tuples that lie in the ith range are sent to
processor Pi
 Pi stores the tuples it received temporarily on disk
Di.
 This step requires I/O and communication overhead.
Parallel Database System 31
1) Range-Partitioning Sort CONT…
 Each processor Pi sorts its partition of the
relation locally.
– Each processors executes same operation (sort)
in parallel with other processors, without any
interaction with the others (data parallelism).

 Final merge operation is trivial:


– Range-partitioning ensures that, for 1 ≤ i < j ≤ m,
the key values in processor Pi are all less than the
key values in Pj.

Parallel Database System 32


2) Parallel External Sort-Merge
 Assume the relation has already been
partitioned among disks D0, ..., Dn-1 (in
whatever manner).

I. Each processor Pi locally sorts the data on disk


Di.
II. The sorted runs on each processor are then
merged to get the final sorted output.

Parallel Database System 33


2) Parallel External Sort-Merge

 Parallelize the merging of sorted runs as


follows:
a) The sorted partitions at each processor Pi
are range-partitioned across the
processors P0, ..., Pm-1.
b) Each processor Pi performs a merge on
the streams as they are received, to get a
single sorted run.
c) The sorted runs on processors P0,..., Pm-1
are concatenated to get the final result.
Parallel Database System 34
CONT…
• Above sequence of action results execution
skew:
• At first every processor sends all blocks of
partition 0 to P0, then every processor sends all
blocks of partition 1 to P1 and so on.
– It implies sending happens in parallel, receiving tuples
becomes sequential. P0 receives then P1 receives.
• To avoid this problem, each processor sends
first block of every partition, then sends
second block of every partition ,and so on
– It implies receiving done in parallel.

Parallel Database System 35


Parallel Join:
 The join operation requires that the system test
pairs of tuples to see if they satisfy the join
condition, and if they do, the pair is added to
the join output.
 Parallel join algorithms do as follow:
i. Attempt to split the pairs to be tested over
several processors.
ii. Each processor then computes part of the join
locally.
iii. In a final step, the results from each processor
can be collected together to produce the final
result.

Parallel Database System 36


1. Partitioned Join:
 Works for equi-joins and natural joins.
 It partitions the two input relations across the processors,
and compute the join locally at each processor.
 Let r and s be the input relations, and we want to
compute r r.A=s.B s.
 r and s each are partitioned into n partitions, denoted r0, r1, ..., rn-1
and s0, s1, ..., sn-1.
 Can use either range partitioning or hash partitioning.
 r and s must be partitioned on their join attributes r.A and s.B,
using the same range-partitioning vector or hash function.
 Partitions ri and si are sent to processor Pi,
 Each processor Pi locally computes ri ri.A=si.B si. Any of the
standard join methods can be used.

Parallel Database System 37


Partitioned Join (Cont.)

Parallel Database System 38


Partitioned Join (Cont.)

• We can optimize the join algorithm used locally


at each processor to reduce I/O by buffering
some of the tuples to memory, instead of writing
them to disk.
• Skew presents a special problem when range
partitioning is used.
• With a good hash function, hash partitioning is
likely to have a smaller skew, except when
there are many tuples with the same values for
the join attribute.

Parallel Database System 39


2. Fragment-and-Replicate Join:
• Partitioning not possible for some join conditions
– e.g., non-equijoin conditions, such as r.A > s.B.
• For joins were partitioning is not applicable,
parallelization can be accomplished by
fragment and replicate technique
– Depicted on next slide
 Asymmetric fragment-and-replicate:
1. One of the relations, say r, is partitioned; any
partitioning technique can be used.
2. The other relation, s, is replicated across all the
processors.
3. Processor Pi then locally computes the join of ri with
all of s using any join technique.
Parallel Database System 40
Depiction of Fragment-and-Replicate Joins

back
Parallel Database System 41
Fragment-and-Replicate Join
• General case: reduces the sizes of the relations
at each processor.
 r is partitioned into n partitions : r0, r1, ..., r n-1;
and s is partitioned into m partitions : s0, s1,
..., s m-1.
 Any partitioning technique may be used.
 There must be at least m * n processors.
 Label the processors as
 P0,0, P0,1, ..., P0,m-1, P1,0, ..., Pn-1m-1.
 Pi,j computes the join of ri with sj. In order to do so, ri
is replicated to Pi,0, Pi,1, ..., Pi,m-1, while si is replicated
to P0,i, P1,i, ..., Pn-1,i
 Any join technique can be used at each processor Pi,j.
Parallel Database System 42
Fragment-and-Replicate Join (Cont.)
• Both versions of fragment-and-replicate work with any
join condition, since every tuple in r can be tested with
every tuple in s.
• Usually has a higher cost than partitioning, since one of
the relations (for asymmetric fragment-and-replicate) or
both relations (for general fragment-and-replicate) have
to be replicated.
• Sometimes asymmetric fragment-and-replicate is
preferable even though partitioning could be used.
– E.g., say s is small and r is large, and already partitioned. It may
be cheaper to replicate s across all processors, rather than
repartition r and s on the join attributes.

Parallel Database System 43


3. Partitioned Parallel Hash-Join
Parallelizing partitioned hash join:
• Assume s is smaller than r ( s < < r ) and therefore s is
chosen as the build relation. n processors: P0, P1,…, Pn-1.
• A hash function h1 takes the join attribute value of each
tuple in s and maps this tuple to one of the n processors.
• Each processor Pi reads the tuples of s that are on its
disk Di, and sends each tuple to the appropriate
processor based on hash function h1. Let si denote the
tuples of relation s that are sent to processor Pi.
• As tuples of relation s are received at the destination
processors, they are partitioned further using another
hash function, h2, which is used to compute the hash-
join locally. (Cont.)

Parallel Database System 44


• Once the tuples of s have been distributed, the larger
relation r is redistributed across the m processors using
the hash function h1
– Let ri denote the tuples of relation r that are sent to processor
Pi.
• As the r tuples are received at the destination
processors, they are repartitioned using the function h2
– (just as the probe relation is partitioned in the sequential hash-
join algorithm).
• Each processor Pi executes the build and probe phases
of the hash-join algorithm on the local partitions ri and s
of r and s to produce a partition of the final result of the
hash-join.
• Note: Hash-join optimizations can be applied to the
parallel case
– e.g., the hybrid hash-join algorithm can be used to cache some
of the incoming tuples in memory and avoid the cost of writing
them and reading them back in.

Parallel Database System 45


Parallel Nested-Loop Join
• Assume that
– relation s is much smaller than relation r and that r is stored by
partitioning.
– there is an index on a join attribute of relation r at each of the
partitions of relation r.
• Use asymmetric fragment-and-replicate, with relation s
being replicated, and using the existing partitioning of
relation r.
• Each processor Pj where a partition of relation s is stored
reads the tuples of relation s stored in Dj, and replicates
the tuples to every other processor Pi.
– At the end of this phase, relation s is replicated at all sites that
store tuples of relation r.
• Each processor Pi performs an indexed nested-loop join
of relation s with the ith partition of relation r.

Parallel Database System 46


Other Relational Operations
Selection: (r)
Case-1:
• If is of the form ai = v, where ai is an attribute and v a
value and If r is partitioned on ai
– then the selection is performed at a single processor.
Case-2:
• If is of the form 1 <= ai <= u (i.e., is a range
selection) and the relation has been range-partitioned on
ai
– Then the selection is performed at each processor whose
partition overlaps with the specified range of values.
• In all other cases:
– The selection is performed in parallel at all the processors.

Parallel Database System 47


• Duplicate elimination
– Perform by using either of the parallel sort techniques
• eliminate duplicates as soon as they are found during
sorting.
– Can also partition the tuples (using either range- or
hash- partitioning) and perform duplicate elimination
locally at each processor.

• Projection
– Projection without duplicate elimination can be
performed as tuples are read in from disk in parallel.
– If duplicate elimination is required, any of the above
duplicate elimination techniques can be used.
Parallel Database System 48
Grouping/Aggregation
• Partition the relation on the grouping attributes and then
compute the aggregate values locally at each processor.
• Can reduce cost of transferring tuples during partitioning
by partly computing aggregate values before partitioning.
• Consider the sum aggregation operation:
– Perform aggregation operation at each processor Pi on those
tuples stored on disk Di
• results in tuples with partial sums at each processor.
– Result of the local aggregation is partitioned on the grouping
attributes, and the aggregation performed again at each
processor Pi to get the final result.
• Fewer tuples need to be sent to other processors during
partitioning.
Parallel Database System 49
Cost of Parallel Evaluation of Operations
• If there is no skew in the partitioning, and there
is no overhead due to the parallel evaluation,
expected speed-up will be 1/n
• If skew and overheads are also to be taken into
account, the time taken by a parallel operation
can be estimated as
Tpart + Tasm + max (T0, T1, …, Tn-1)
– Tpart is the time for partitioning the relations
– Tasm is the time for assembling the results
– Ti is the time taken for the operation at processor Pi
• this needs to be estimated taking into account the skew, and
the time wasted in contentions.

Parallel Database System 50


Inter-operator Parallelism

Parallel Database System 51


Pipelined parallelism
Consider a join of four relations
– R1 r2 r3 r4
• Set up a pipeline that computes the three joins in parallel
– Let P1 be assigned the computation of
temp1 = r1 r2
– And P2 be assigned the computation of temp2 = temp1 r3
– And P3 be assigned the computation of temp2 r4
• Each of these operations can execute in parallel, sending
result tuples it computes to the next operation even as it
is computing further results
– Provided a pipelineable join evaluation algorithm (e.g. indexed
nested loops join) is used

Parallel Database System 52


Factors Limiting Utility of Pipeline
Parallelism
• Pipeline parallelism is useful since it avoids
writing intermediate results to disk
• Useful with small number of processors, but
does not scale up well with more processors.
One reason is that pipeline chains do not attain
sufficient length.
• Cannot pipeline operators which do not produce
output until all inputs have been accessed
(e.g. aggregate and sort)
• Little speedup is obtained for the frequent cases
of skew in which one operator's execution
cost is much higher than the others.
Parallel Database System 53
Independent Parallelism
• Consider a join of four relations
r1 r2 r3 r4
– Let P1 be assigned the computation of
temp1 = r1 r2
– And P2 be assigned the computation of temp2 = r3 r4
– And P3 be assigned the computation of temp1 temp2
– P1 and P2 can work independently in parallel
– P3 has to wait for input from P1 and P2
• Can pipeline output of P1 and P2 to P3, combining independent
parallelism and pipelined parallelism
• Does not provide a high degree of parallelism
– useful with a lower degree of parallelism.
– less useful in a highly parallel system,

Parallel Database System 54


Any Questions!!!

Parallel Database System 55

You might also like