2 Parallel Databases
2 Parallel Databases
2 Parallel Databases
Round robin:
Advantages
Best suited for sequential read of entire relation for
each query.
All disks have almost an equal number of tuples;
retrieval work is thus well balanced between disks.
Point & Range queries are difficult to process
No clustering -- tuples are scattered across all disks
Each of n disks must be used for search.
Comparison of Partitioning Techniques
Hash partitioning:
Good for sequential access
Assuming hash function is good, and partitioning
attributes form a key, tuples will be equally distributed
between disks
Retrieval work is then well balanced between disks.
Good for point queries on partitioning attribute
Can lookup single disk, leaving others available for
answering other queries. Phone number
Index on partitioning attribute can be local to disk,
making lookup and update more efficient
Not suitable for point queries on nonpartitioning
attributes
No clustering, so difficult to answer range queries
Comparison of Partitioning Techniques
Range partitioning:
Provides data clustering by partitioning attribute value.
Good for sequential access
Good for point queries on partitioning attribute: only one
disk needs to be accessed.
For range queries on partitioning attribute, one to a few
disks may need to be accessed
Remaining disks are available for other queries.
Good if result tuples are from one to a few blocks.
If many blocks are to be fetched, they are still fetched from one
to a few disks, and potential parallelism in disk access is wasted
Example of execution skew.
Partitioning a Relation across Disks
If a relation contains only a few tuples which will
fit into a single disk block, then assign the relation
to a single disk.
Large relations are preferably partitioned across
all the available disks.
If a relation consists of m disk blocks and there
are n disks available in the system, then the
relation should be allocated min(m,n) disks.
Handling of Skew
Handling Skew in Range-Partitioning
To create a balanced partitioning vector (assuming
partitioning attribute forms a key of the relation):
Sort the relation on the partitioning attribute.
Construct the partition vector by scanning the relation in
sorted order as follows.
After every 1/nth of the relation has been read, the value of the
partitioning attribute of the next tuple is added to the partition
vector.
n denotes the number of partitions to be constructed.
Duplicate entries or imbalances can result if duplicates are
present in partitioning attributes.
Disadvantages – Extra I/O overheads.
Alternative technique based on histograms used in
practice
Handling Skew using Histograms
Balanced partitioning vector can be constructed from
histogram in a relatively straightforward fashion
Assume uniform distribution within each range of the
histogram
Histogram can be constructed by scanning relation, or
sampling (blocks containing) tuples of the relation
Handling Skew using Virtual Processor Partitioning
Projection
Projection without duplicate elimination can be
performed as tuples are read in from disk in parallel.
If duplicate elimination is required, any of the above
duplicate elimination techniques can be used.
Grouping/Aggregation
Partition the relation on the grouping attributes and then
compute the aggregate values locally at each processor.
Can reduce cost of transferring tuples during partitioning
by partly computing aggregate values before partitioning.
Consider the sum aggregation operation:
Perform aggregation operation at each processor Pi on those
tuples stored on disk Di
results in tuples with partial sums at each processor.
Result of the local aggregation is partitioned on the grouping
attributes, and the aggregation performed again at each
processor Pi to get the final result.
Fewer tuples need to be sent to other processors during
partitioning.
Cost of Parallel Evaluation of Operations
If there is no skew in the partitioning, and there is
no overhead due to the parallel evaluation,
expected speed-up will be 1/n
If skew and overheads are also to be taken into
account, the time taken by a parallel operation
can be estimated as
Tpart + Tasm + max (T0, T1, …, Tn-1)
Tpart is the time for partitioning the relations
Tasm is the time for assembling the results
Ti is the time taken for the operation at processor P i
this needs to be estimated taking into account the skew, and
the time wasted in contentions.
2.4 Interoperator Parallelism
Pipelined parallelism
– Consider a join of four relations
• r1 r2 r3 r4
– Set up a pipeline that computes the three joins in parallel
• Let P1 be assigned the computation of
temp1 = r1 r2
• And P2 be assigned the computation of temp2 = temp1 r3
• And P3 be assigned the computation of temp2 r4
– Each of these operations can execute in parallel, sending
result tuples it computes to the next operation even as it
is computing further results
• Provided a pipelineable join evaluation algorithm (e.g. indexed
nested loops join) is used
Factors Limiting Utility of Pipeline Parallelism
Independent Parallelism
Independent parallelism
– Consider a join of four relations
r1 r 2 r 3 r 4
• Let P1 be assigned the computation of
temp1 = r1 r2
• And P2 be assigned the computation of temp2 = r3 r4
• And P3 be assigned the computation of temp1 temp2
• P1 and P2 can work independently in parallel
• P3 has to wait for input from P1 and P2
– Can pipeline output of P1 and P2 to P3, combining independent parallelism
and pipelined parallelism
– Does not provide a high degree of parallelism
• useful with a lower degree of parallelism.
• less useful in a highly parallel system,
Query Optimization
Query optimization in parallel databases is significantly more
complex than query optimization in sequential databases.
Cost models are more complicated, since we must take into
account partitioning costs and issues such as skew and resource
contention.
When scheduling execution tree in parallel system, must
decide:
How to parallelize each operation and how many processors to use
for it.
What operations to pipeline, what operations to execute
independently in parallel, and what operations to execute
sequentially, one after the other.
Determining the amount of resources to allocate for each
operation is a problem.
E.g., allocating more processors than optimal can result in high
communication overhead.
Long pipelines should be avoided as the final operation may
wait a lot for inputs, while holding precious resources
Query Optimization
The number of parallel evaluation plans from which to choose from is much
larger than the number of sequential evaluation plans.
Therefore heuristics are needed while optimization
Two alternative heuristics for choosing parallel plans:
No pipelining and inter-operation pipelining; just parallelize every operation
across all processors.
Finding best plan is now much easier --- use standard optimization
technique, but with new cost model
Volcano parallel database popularize the exchange-operator model
exchange operator is introduced into query plans to partition and
distribute tuples
each operation works independently on local data on each
processor, in parallel with other copies of the operation
First choose most efficient sequential plan and then choose how best to
parallelize the operations in that plan.
Can explore pipelined parallelism as an option
Choosing a good physical organization (partitioning technique) is important to
speed up queries.
2.5 Design of Parallel Systems
Some issues in the design of parallel systems:
Parallel loading of data from external sources is
needed in order to handle large volumes of
incoming data.
Resilience to failure of some processors or disks.
Probability of some disk or processor failing is higher in a
parallel system.
Operation (perhaps with degraded performance) should
be possible in spite of failure.
Redundancy achieved by storing extra copy of every data
item at another processor.
Design of Parallel Systems