Slides Sorting
Slides Sorting
(Slides based on the book ‘Parallel Programming: Techniques and Applications Using Networked Workstations
and Parallel Computers. B. Wilkinson, M. Allen, Prentice Hall’)
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 1 / 41
Sorting in Parallel
Why?
Sorting, or rearranging a list of numbers into increasing (decreasing)
order, is a fundamental operation that appears in many applications
Potential speedup?
Best sequential sorting algorithms (mergesort and quicksort) have
(respectively worst-case and average) time complexity of O(n log(n))
The best we can aim with a parallel sorting algorithm using n
processing units is thus a time complexity of
O(n log(n))/n = O(log(n))
But, in general, a realistic O(log(n)) algorithm with n processing
units is a goal that is not easy to achieve with comparasion-based
sorting algorithms
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 2 / 41
Compare-and-Exchange
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 4 / 41
Parallel Compare-and-Exchange
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 5 / 41
Data Partitioning
So far, we have assumed that there is one processing unit for each
number, but normally there would be many more numbers (n) than
processing units (p) and, in such cases, a list of n/p numbers would be
assigned to each processing unit.
When dealing with lists of numbers, the operation of merging two
sorted lists is a common operation in sorting algorithms.
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 6 / 41
Parallel Merging
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 7 / 41
Parallel Merging
Version 2 – both processing units exchange their lists, then both perform
the merge operation and P1 keeps the lower half of the merged list and P2
keeps the higher half of the merged list.
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 8 / 41
Bubble Sort
In bubble sort, the largest number is first moved to the very end of the list
by a series of compare-and-exchange operations, starting at the opposite
end. The procedure repeats, stopping just before the previously positioned
largest number, to get the next-largest number. In this way, the larger
numbers move (like a bubble) toward the end of the list.
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 10 / 41
Parallel Bubble Sort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 11 / 41
Odd-Even Transposition Sort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 12 / 41
Odd-Even Transposition Sort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 13 / 41
Odd-Even Transposition Sort
rank = process_id();
A = initial_value();
for (i = 0; i < N; i++) {
if (i % 2 == 0) { // even phase
if (rank % 2 == 0) { // even process
recv(B, rank + 1); send(A, rank + 1);
A = min(A,B);
} else { // odd process
send(A, rank - 1); recv(B, rank - 1);
A = max(A,B);
}
} else if (rank > 0 && rank < N - 1) { // odd phase
if (rank % 2 == 0) { // even process
recv(B, rank - 1); send(A, rank - 1);
A = max(A,B);
} else { // odd process
send(A, rank + 1); recv(B, rank + 1);
A = min(A,B);
}
}
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 14 / 41
Mergesort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 15 / 41
Mergesort
Computations only occur when merging the sublists. In the worst case, it
takes 2s − 1 steps to merge two sorted sublists of size s. If we have m = ns
sorted sublists in a merging step, it takes
m m m
(2s − 1) = ms − =n−
2 2 2
steps to merge all sublists (two by two).
Since in total there are log(n) merging steps, this corresponds to a time
complexity of O(n log(n)).
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 16 / 41
Parallel Mergesort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 17 / 41
Parallel Mergesort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 18 / 41
Quicksort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 19 / 41
Quicksort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 20 / 41
Quicksort: Lomuto Partition Scheme
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 21 / 41
Quicksort: Hoare Partition Scheme
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 22 / 41
Parallel Quicksort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 23 / 41
Parallel Quicksort
Allocating processes in a tree structure leads to two fundamental problems:
In general, the partition tree is not perfectly balanced (selecting
good pivot candidates is crucial for efficiency)
The process of assigning work to processes seriously limits the
efficient usage of the available processes (the initial partition only
involves one process, then the second partition involves two processes,
then four processes, and so on)
Again, if we ignore communication time and consider that pivot selection
is ideal, i.e., creating sublists of equal size, then it takes
log(n)
X n
≈ 2n
i=0
2i
steps to obtain the final sorted list in a parallel implementation, which
corresponds to a time complexity of O(n). The worst-case pivot selection
degenerates to a time complexity of O(n2 ).
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 24 / 41
Odd-Even Merge
Odd-even merge is a merging algorithm for sorted lists which can merge
two sorted lists into one sorted list. It works as follows:
Let A = [a0 , . . . , an−1 ] and B = [b0 , . . . , bn−1 ] be two sorted lists
Consider E (A) = [a0 , a2 , . . . , an−2 ], E (B) = [b0 , b2 , . . . , bn−2 ] and
O(A) = [a1 , a3 , . . . , an−1 ], O(B) = [b1 , b3 , . . . , bn−1 ] as the sublists
with the elements of A and B in even and odd positions, respectively
Recursively merge E (A) with E (B) to obtain C
Recursively merge O(A) with O(B) to obtain D
Interleave C with D to get an almost-sorted list E
Rearrange unordered neighbors (using compare-and-exchange
operations) to completely sort E
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 25 / 41
Odd-Even Merge
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 26 / 41
Odd-Even Merge
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 27 / 41
Odd-Even Merge
For example, the call to odd_even_merge([2, 4, 5, 8], [1, 3, 6, 7]) leads to:
One call to odd_even_merge([2, 5], [1, 6])
Another call to odd_even_merge([4, 8], [3, 7])
3 compare-and-exchange operations
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 28 / 41
4x4 Odd-Even Merge
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 29 / 41
Parallel Odd-Even Merge
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 30 / 41
Parallel Odd-Even Merge
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 31 / 41
Odd-Even Mergesort
Odd-even mergesort is a parallel sorting algorithm based on the recursive
application of the odd-even merge algorithm that merges sorted
sublists bottom up – starting with sublists of size 2 and merging them into
bigger sublists – until the final sorted list is obtained.
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 33 / 41
Bitonic Mergesort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 34 / 41
Bitonic Mergesort
In addition to both sequences being bitonic sequences, all values in the
left sequence are less than all the values in the right sequence.
Hence, if we apply recursively these compare-and-exchange
operations to a given bitonic sequence, we will get a sorted sequence.
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 35 / 41
Bitonic Mergesort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 36 / 41
Bitonic Mergesort
To sort an unsorted sequence, we first transform it in a bitonic
sequence. Starting from adjacent pairs of values of the given unsorted
sequence, bitonic sequences are created and then recursively merged
into (twice the size) larger bitonic sequences. At the end, a single
bitonic sequence is obtained, which is then sorted (as described before)
into the final increasing sequence.
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 37 / 41
Bitonic Mergesort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 38 / 41
Bitonic Mergesort
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 39 / 41
Bitonic Mergesort
The bitonic mergesort algorithm can be split into k phases (with n = 2k ):
Phase 1: convert adjacent pair of values into increasing/decreasing
sequences, i.e., into 4-bit bitonic sequences
Phases 2 to k-1: sort each m-bit bitonic sequence into
increasing/decreasing sequences and merge adjacent sequences into a
2m-bit bitonic sequence, until reaching a n-bit bitonic sequence
Phase k: sort the n-bit bitonic sequence
Given an unsorted list of size n = 2k , there are k phases, with each phase i
taking i parallel steps. Therefore, in total it takes
k
X k(k + 1) log(n)(log(n) + 1)
i= =
i=1
2 2
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 41 / 41