I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
Introduction
I/O Parallelism
Interquery Parallelism
Intraquery Parallelism
Intraoperation Parallelism
Interoperation Parallelism
Design of Parallel Systems
20.1
Introduction
20.2
Parallelism in Databases
20.3
I/O Parallelism
20.4
I/O Parallelism (Cont.)
Range partitioning:
Choose an attribute as the partitioning attribute.
A partitioning vector [vo, v1, ..., vn-2] is chosen.
Let v be the partitioning attribute value of a tuple. Tuples such that vi vi+1 go to disk I +
1. Tuples with v < v0 go to disk 0 and tuples with v vn-2 go to disk n-1.
E.g., For example, range partitioning with three disks numbered 0, 1, and 2 may assign
tuples with values less than 5 to disk 0, values between 5 and 40 to disk1, and values
greater than 40 to disk 2.
20.5
Comparison of Partitioning Techniques
20.6
Comparison of Partitioning Techniques (Cont.)
Round robin:
Advantages
Best suited for sequential scan of entire relation on each query.
All disks have almost an equal number of tuples; retrieval work is
thus well balanced between disks.
Range queries are difficult to process
No clustering -- tuples are scattered across all disks
20.7
Comparison of Partitioning Techniques(Cont.)
Hash partitioning:
Good for sequential access
Assuming hash function is good, and partitioning attributes form a
key, tuples will be equally distributed between disks
Retrieval work is then well balanced between disks.
Good for point queries on partitioning attribute
Can lookup single disk, leaving others available for answering other
queries.
Index on partitioning attribute can be local to disk, making lookup
and update more efficient
No clustering, so difficult to answer range queries
20.8
Comparison of Partitioning Techniques (Cont.)
Range partitioning:
Provides data clustering by partitioning attribute value.
Good for sequential access
Good for point queries on partitioning attribute: only one disk
needs to be accessed.
For range queries on partitioning attribute, one to a few disks
may need to be accessed
Remaining disks are available for other queries.
Good if result tuples are from one to a few blocks.
If many blocks are to be fetched, they are still fetched from one
to a few disks, and potential parallelism in disk access is wasted
20.9
Partitioning a Relation across Disks
If a relation contains only a few tuples which will fit into a single
disk block, then assign the relation to a single disk.
Large relations are preferably partitioned across all the available
disks.
If a relation consists of m disk blocks and there are n disks
available in the system, then the relation should be allocated
min(m,n) disks.
20.10
Handling of Skew
The distribution of tuples to disks may be skewed — that is, some disks
have many tuples, while others may have fewer tuples.
Types of skew:
Attribute-value skew.
Some values appear in the partitioning attributes of many tuples; all the
tuples with the same value for the partitioning attribute end up in the
same partition.
Can occur with range-partitioning and hash-partitioning.
For example, 1000 tuples divided on 10 disk then 100
20.11
Handling Skew in Range-Partitioning
20.12
Handling Skew using Histograms
Balanced partitioning vector can be constructed from histogram in a
relatively straightforward fashion
Assume uniform distribution within each range of the histogram
Histogram can be constructed by scanning relation, or sampling
(blocks containing) tuples of the relation
20.13
Handling Skew Using Virtual Processor
Partitioning
20.14
Interquery Parallelism
Queries/transactions execute in parallel with one another.
Increases transaction throughput; used primarily to scale up a
transaction processing system to support a larger number of
transactions per second.
Easiest form of parallelism to support, particularly in a shared-
memory parallel database, because even sequential database
systems support concurrent processing.
More complicated to implement on shared-disk or shared-
nothing architectures
Locking and logging must be coordinated by passing messages
between processors.
Data in a local buffer may have been updated at another processor.
Cache-coherency has to be maintained — reads and writes of data
in buffer must find latest version of data.
20.15
Cache Coherency Protocol
20.16
Intraquery Parallelism
20.17
Parallel Processing of Relational Operations
20.18
Parallel Sort
Range-Partitioning Sort
Choose processors P0, ..., Pm, where m n -1 to do sorting.
Create range-partition vector with m entries, on the sorting attributes
Redistribute the relation using range partitioning
all tuples that lie in the ith range are sent to processor Pi
Pi stores the tuples it received temporarily on disk Di.
This step requires I/O and communication overhead.
Each processor Pi sorts its partition of the relation locally.
Each processors executes same operation (sort) in parallel with other
processors, without any interaction with the others (data parallelism).
Final merge operation is trivial: range-partitioning ensures that, for 1 j
m, the key values in processor Pi are all less than the key values in Pj.
20.19
Parallel Sort (Cont.)
20.20
Parallel Join
20.21
Partitioned Join
For equi-joins and natural joins, it is possible to partition the two input
relations across the processors, and compute the join locally at each
processor.
Let r and s be the input relations, and we want to compute r r.A=s.B s.
r and s each are partitioned into n partitions, denoted r0, r1, ..., rn-1 and
s0, s1, ..., sn-1.
Can use either range partitioning or hash partitioning.
r and s must be partitioned on their join attributes r.A and s.B), using
the same range-partitioning vector or hash function.
Partitions ri and si are sent to processor Pi,
Each processor Pi locally computes ri ri.A=si.B si. Any of the standard
join methods can be used.
20.22
Partitioned Join (Cont.)
20.23
Fragment-and-Replicate Join
20.24
Depiction of Fragment-and-Replicate Joins
a. Asymmetric
Fragment and
Replicate b. Fragment and Replicate
20.25
Fragment-and-Replicate Join (Cont.)
20.26
Fragment-and-Replicate Join (Cont.)
20.27
Partitioned Parallel Hash-Join
20.28
Partitioned Parallel Hash-Join (Cont.)
20.29
Parallel Nested-Loop Join
Assume that
relation s is much smaller than relation r and that r is stored by
partitioning.
there is an index on a join attribute of relation r at each of the
partitions of relation r.
Use asymmetric fragment-and-replicate, with relation s being
replicated, and using the existing partitioning of relation r.
Each processor Pj where a partition of relation s is stored reads
the tuples of relation s stored in Dj, and replicates the tuples to
every other processor Pi.
At the end of this phase, relation s is replicated at all sites that store
tuples of relation r.
Each processor Pi performs an indexed nested-loop join of
relation s with the ith partition of relation r.
20.30
Other Relational Operations
Selection (r)
If is of the form ai = v, where ai is an attribute and v a value.
If r is partitioned on ai the selection is performed at a single
processor.
If is of the form l <= ai <= u (i.e., is a range selection) and the
relation has been range-partitioned on ai
Selection is performed at each processor whose partition overlaps
with the specified range of values.
In all other cases: the selection is performed in parallel at all the
processors.
20.31
Other Relational Operations (Cont.)
Duplicate elimination
Perform by using either of the parallel sort techniques
eliminate duplicates as soon as they are found during sorting.
Can also partition the tuples (using either range- or hash-
partitioning) and perform duplicate elimination locally at each
processor.
Projection
Projection without duplicate elimination can be performed as tuples
are read in from disk in parallel.
If duplicate elimination is required, any of the above duplicate
elimination techniques can be used.
20.32
Grouping/Aggregation
20.33
Cost of Parallel Evaluation of Operations
20.34
Interoperator Parallelism
Pipelined parallelism
Consider a join of four relations
r1 r2 r3 r4
Set up a pipeline that computes the three joins in parallel
Let P1 be assigned the computation of
temp1 = r1 r2
And P2 be assigned the computation of temp2 = temp1 r3
And P3 be assigned the computation of temp2 r4
Each of these operations can execute in parallel, sending result
tuples it computes to the next operation even as it is computing
further results
Provided a pipelineable join evaluation algorithm (e.g. indexed
nested loops join) is used
20.35
Factors Limiting Utility of Pipeline
Parallelism
20.36
Independent Parallelism
Independent parallelism
Consider a join of four relations
r1 r2 r3 r4
Let P1 be assigned the computation of
temp1 = r1 r2
And P2 be assigned the computation of temp2 = r3 r4
And P3 be assigned the computation of temp1 temp2
P1 and P2 can work independently in parallel
P3 has to wait for input from P1 and P2
– Can pipeline output of P1 and P2 to P3, combining
independent parallelism and pipelined parallelism
Does not provide a high degree of parallelism
useful with a lower degree of parallelism.
less useful in a highly parallel system,
20.37
Query Optimization
20.38
Query Optimization (Cont.)
The number of parallel evaluation plans from which to choose from is much
larger than the number of sequential evaluation plans.
Therefore heuristics are needed while optimization
Two alternative heuristics for choosing parallel plans:
No pipelining and inter-operation pipelining; just parallelize every operation
across all processors.
Finding best plan is now much easier --- use standard optimization
technique, but with new cost model
Volcano parallel database popularize the exchange-operator model
– exchange operator is introduced into query plans to partition and
distribute tuples
– each operation works independently on local data on each processor, in
parallel with other copies of the operation
First choose most efficient sequential plan and then choose how best to
parallelize the operations in that plan.
Can explore pipelined parallelism as an option
Choosing a good physical organization (partitioning technique) is important
to speed up queries.
20.39
Design of Parallel Systems
20.40
Design of Parallel Systems (Cont.)
20.41
End of Chapter