0% found this document useful (0 votes)
8 views

Lect8 Parallel System

Sorting ordered data is easier to manipulate than random data. There are two main types of sorting algorithms: internal sorting which uses main memory and external sorting which uses auxiliary storage like hard disks. Comparison-based sorting algorithms repeatedly compare and exchange elements, with a lower bound of Θ(n log n) operations. Non-comparison based sorting uses known properties of elements like binary representation, with a lower bound of Θ(n) operations. Parallelizing sorting involves distributing elements among processors, which raises issues around where the input/output sequences reside and how comparisons are performed between elements on different processors.

Uploaded by

sama akram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lect8 Parallel System

Sorting ordered data is easier to manipulate than random data. There are two main types of sorting algorithms: internal sorting which uses main memory and external sorting which uses auxiliary storage like hard disks. Comparison-based sorting algorithms repeatedly compare and exchange elements, with a lower bound of Θ(n log n) operations. Non-comparison based sorting uses known properties of elements like binary representation, with a lower bound of Θ(n) operations. Parallelizing sorting involves distributing elements among processors, which raises issues around where the input/output sequences reside and how comparisons are performed between elements on different processors.

Uploaded by

sama akram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

sorting

Sorting data are easier to manipulate than random


ordered data.

Sorting Algorithm
1- Internal 2- External
The number of Use auxiliary storage (such
elements is small hard disks) to store because
enough to fit into the number of elements to
the processor’s be sorted is too large to fit
main memory into memory
Sorting Algorithm
1- Comparison based
Repeatedly comparing pairs of elements and if they are
out of order, exchange them. This is called (compare-
exchange).
Lower bound is Θ(n log n) on the sequential computer.

2- non comparison based


Used certain know properties of elements (such as their
binary representation).
Lower bound is Θ(n) on the sequential computer.
sorting
*Parallelizing a sequential sorting algorithm involves distributing
the elements to be sorted into the available processors. This
process raises a number of issues that we must address in order to
make the presentation of parallel sorting algorithms clearer.

*In sequential sorting algorithms, the input and the sorting


sequences are sorted in the processor’s memory.

*In parallel sorting, there are two places where these sequences
can reside. They may be sorted on only one of the processors, or
they may be distributed among the processors.

*A general method of distributed is to enumerate the processors


and use this enumeration to specify a global ordering for the sorted
sequence.
sorting
Ex
If Pi comes before Pj in the enumeration, all elements
sorted in Pi will be smaller than those sorted in Pj .

How comparisons are performed:


A sequential sorting algorithm can easily perform a
compare- exchange on two elements because they are
sorted locally in the processor’s memory. In parallel
sorting algorithms this step is not so easy. As the
elements reside on different processors.
sorting
One elements per processor
ai aj ai, aj ai, aj
PPP i
i
i
Pj PPP i
i
i
Pj
Step 1 Step 2

Min(ai, aj) Max(ai, aj) Each compare- exchange


operation required one
PPP i i
i
Pj comparison step and one
communication step.

Step 3
sorting
More than one elements per processor

We refer to this operation of comparing and splitting two


sorted blocks as compare - split
sorting Network
The key component of sorting network is a comparator.
A comparator is a device with two inputs x, y and two outputs x’
and y’.
sorting Network
sorting Network
The key component of these networks is a comparator. A
comparator is a device with two inputs x and y and two outputs x'
and y'. For an increasing comparator, x' = min{x, y} and y' = max{x,
y}; for a decreasing comparator x' = max{x, y} and y' = min{x, y}.
As the two elements enter the input wires of the comparator, they
are compared and, if necessary, exchanged before they go to the
output wires. A sorting network is usually made up of a series of
columns, and each column contains a number of comparators
connected in parallel. Each column of comparators performs a
permutation, and the output obtained from the final column is
sorted in increasing or decreasing order. The depth of a network is
the number of columns it contains. Since the speed of a
comparator is a technology-dependent constant, the speed of the
network is proportional to its depth.
Bitonic sorting
The key operation of bitonic sorting network
1- convert input sequence into bitonic sequence.
2- rearrangement of a bitonic sequence into a sorting
sequence.
Bitonic sorting
A bitonic sequence : is a sequence of elements < a0, a1,
a2, … , an-1> with the property that either
1- there exists an index i, 0 ≤ i ≤ n-1, such that <a0, a1,
… , ai > is monotonically increasing and < ai+1, … , an-1>
is monotonically decreasing.
2- there exit a cyclic shift of indices so that (1) is
satisfy.
Bitonic sorting
-Therefore, every element of the first sequence is
smaller than every element of the second sequence
and each of them is bitonic sequence.
-Thus we reduce the problem of rearranging a bitonic
sequence of size n to that of rearranging two smaller
bitonic sequences and concatenating the result.
-This known as a bitonic split.
Bitonic sorting
-we can recursively obtain shorter bitonic sequences
for each the bitonic subsequences until obtain
subsequences of size one.
-at that point the output is sorted in monotonically
increasing order.
-the number of splits steps required to rearrange the
bitonic sequence into a sort sequence is log n.
The network that used that technique is called Bitonic
Merging Network.
Bitonic sorting

- Merging a 16 elements bitonic sequence through a series of


log 16 bitonic splits
- Using sequential program to make that algorithm, it will take
n log n unit time.
Bitonic sorting
-Bitonic merging network contain log n columns, each -
column contains n/2 comparators which perform one
step of the bitonic merge.
-This network take as input a bitonic sequence and
output the sequence in sorting order.
- a bitonic merging network with n input was defined by
ΘBM[n] if output sorted in increasing order
ΘBM[n] if output sorted in decreasing order
Bitonic sorting
Bitonic sorting
Bitonic sorting
Bitonic sorting
the depth, d(n), of the network

The algorithm perform a total number of steps


= ( 1 + log n ) log (n ) / 2

≈ Θ
Mapping Bitonic sort algorithm into a
Hypercube and Mesh Networks
Mapping Bitonic sort algorithm into a Hypercube
and Mesh Networks

The bitonic sort network for sorting n elements contain log n


stages, and each stage I consists of I columns of n/2 comparators.
Each column of comparator performs compare exchange
operation s on n wires.
On parallel computer the compare exchange function is
performed by a pair of processors.
1- one element per processor
Each of the n processors contain one element of the input
sequence.
Each wire of the bitonic sorting network represent a distinct
processor.
During each step of the algorithm, the compare exchange
operation performed by a column of comparators are performed by
n/2 pairs of processors.
To obtain a good mapping, we must investigate the way that input
wires are paired during each stage of bitonic sort.
In any step, the compare exchange operation is performed
between two wires if their labels differ in exactly one bit.
1- one element per processor
During each of the four stages, wires whose labels differ in the
least significant bit perform a compare exchange in the last step
of each stage.
During the last three stages, wires whose labels differ in the
second least significant bit perform a compare exchange in the
second to last step of each stage.
In general, wires whose lables differ in the ith least significant
bit perform a compare exchange ( log (n) – i + 1 ) times.
This observation helps us efficiency map wires into processor
by mapping wires that perform compare exchange operations
more frequently to processors that are close to each other.
* Hypercube
Mapping wires into processors of a hypercube connected parallel
computer is straight forward.
Compare exchange operations take place between wires whose
labels differ in only one bit.
In hypercube, processors whose labels differ in only one bit are
neighbors.
Thus an optimal mapping of input wires to hypercube processors
is the one that maps an input wire with label l to a processor with
label l where l = 0, 1, 2, … , n-1.

* Hypercube
A hypercube with d dimensional ( that P = 2d ).
In the final stage of bitonic sort, the input has been converted into a
bitonic sequence.
During first step of this stage, processors that differ only in the dth bit
of the binary representation of their labels ( the most significant bit)
compare exchange their elements.
Thus the compare exchange operation takes place between
processors along the dth dimension.
During the second step of the algorithm, compare exchange
operation takes place among processors along the (d-1)st dimension.
Bitonic sorting
Bitonic sorting
* Hypercube
In general, during the ith step of the final stage, processors
communicate along the (d- (i-1))st dimension.
During each step of the algorithm, every processor performs
a compare exchange operation.
The algorithm performs a total of ( 1 + log n ) ( log n) /2 such
steps.
The parallel run time


* Mesh
The connectivity of a mesh is lower than that of a hypercube, so it is
impossible to map wires to processors such that each compare
exchange operation occurs only between neighboring processors.
Instead, we map wires such that the most frequent compare exchange
operations occur between neighboring processors.
There are several ways to map the input wires into the mesh
processors.
Each processor is label by the wire that is mapped into it.
In this formula, in general wires that differ in the ith least significant bit
are mapped into processors that are 2[(i-1)/2] communication links away
* Mesh – row major shuffled mapping
The advantage of row major shuffled mapping is that perform
compare exchange operation reside on square subsections of the
mesh.
For every stage of bitonic sort (corresponding to wires that differ in
the least significant bit ) are neighbors.
total amount of communication performed by each processor = 7
A block of elements per processors

Let p be the number of processors


n be the number of elements, n > p
Each processor is assigned a block (n/p) elements to be sorted.
The main difference between this formulation and the one that
uses virtual processors is that (n/p) elements assigned to each
processor are initially sorted locally, using a fast sequential sorting
algorithm.
A block of elements per processors
Hypercubev

Mesh
Bubble sort and its variants
The sequential bubble sort algorithm compares and exchanges
adjacent elements in the sequence to be sorted.

Bubble sort is difficult to paralyzed

You might also like