0% found this document useful (0 votes)
22 views

Slides Sorting

This document discusses parallel sorting algorithms. It begins by explaining why sorting is important and how parallelization could provide speedup over sequential algorithms. It then examines how basic sorting operations like compare-and-exchange and merging can be performed in parallel. Several parallel sorting algorithms are described, including bubble sort, odd-even transposition sort, mergesort, and quicksort. Mergesort and quicksort maintain an O(n log n) complexity when performed in parallel through distributed work across processing units.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Slides Sorting

This document discusses parallel sorting algorithms. It begins by explaining why sorting is important and how parallelization could provide speedup over sequential algorithms. It then examines how basic sorting operations like compare-and-exchange and merging can be performed in parallel. Several parallel sorting algorithms are described, including bubble sort, odd-even transposition sort, mergesort, and quicksort. Mergesort and quicksort maintain an O(n log n) complexity when performed in parallel through distributed work across processing units.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Parallel Sorting Algorithms

Ricardo Rocha and Fernando Silva

Computer Science Department


Faculty of Sciences
University of Porto

Parallel Computing 2015/2016

(Slides based on the book ‘Parallel Programming: Techniques and Applications Using Networked Workstations
and Parallel Computers. B. Wilkinson, M. Allen, Prentice Hall’)

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 1 / 41
Sorting in Parallel

Why?
Sorting, or rearranging a list of numbers into increasing (decreasing)
order, is a fundamental operation that appears in many applications

Potential speedup?
Best sequential sorting algorithms (mergesort and quicksort) have
(respectively worst-case and average) time complexity of O(n log(n))
The best we can aim with a parallel sorting algorithm using n
processing units is thus a time complexity of
O(n log(n))/n = O(log(n))
But, in general, a realistic O(log(n)) algorithm with n processing
units is a goal that is not easy to achieve with comparasion-based
sorting algorithms

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 2 / 41
Compare-and-Exchange

An operation that forms the basis of several classical sequential sorting


algorithms is the compare-and-exchange (or compare-and-swap)
operation.
In a compare-and-exchange operation, two numbers, say A and B, are
compared and if they are not ordered, they are exchanged. Otherwise, they
remain unchanged.

if (A > B) { // sorting in increasing order


temp = A;
A = B;
B = temp;
}

Question: how can compare-and-exchange be done in parallel?


R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 3 / 41
Parallel Compare-and-Exchange

Version 1 – P1 sends A to P2 , which then compares A and B and sends


back to P1 the min(A, B).

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 4 / 41
Parallel Compare-and-Exchange

Version 2 – P1 sends A to P2 and P2 sends B to P1 , then both perform


comparisons and P1 keeps the min(A, B) and P2 keeps the max (A, B).

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 5 / 41
Data Partitioning

So far, we have assumed that there is one processing unit for each
number, but normally there would be many more numbers (n) than
processing units (p) and, in such cases, a list of n/p numbers would be
assigned to each processing unit.
When dealing with lists of numbers, the operation of merging two
sorted lists is a common operation in sorting algorithms.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 6 / 41
Parallel Merging

Version 1 – P1 sends its list to P2 , which then performs the merge


operation and sends back to P1 the lower half of the merged list.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 7 / 41
Parallel Merging

Version 2 – both processing units exchange their lists, then both perform
the merge operation and P1 keeps the lower half of the merged list and P2
keeps the higher half of the merged list.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 8 / 41
Bubble Sort
In bubble sort, the largest number is first moved to the very end of the list
by a series of compare-and-exchange operations, starting at the opposite
end. The procedure repeats, stopping just before the previously positioned
largest number, to get the next-largest number. In this way, the larger
numbers move (like a bubble) toward the end of the list.

for (i = N - 1; i > 0; i--)


for (j = 0; j < i; j++) {
k = j + 1;
if (a[j] > a[k]) {
temp = a[j];
a[j] = a[k];
a[k] = temp;
}
}

The total number of compare-and-exchange operations is


Pn−1 n(n−1)
i=1 i = 2 , which corresponds to a time complexity of O(n2 ).
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 9 / 41
Bubble Sort

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 10 / 41
Parallel Bubble Sort

A possible idea is to run multiple iterations in a pipeline fashion, i.e.,


start the bubbling action of the next iteration before the preceding
iteration has finished in such a way that it does not overtakes it.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 11 / 41
Odd-Even Transposition Sort

Odd-even transposition sort is a variant of bubble sort which operates in


two alternating phases:
Even Phase: even processes exchange values with right neighbors
(P0 ↔ P1 , P2 ↔ P3 , ...)
Odd Phase: odd processes exchange values with right neighbors
(P1 ↔ P2 , P3 ↔ P4 , ...)

For sequential programming, odd-even transposition sort has no particular


advantage over normal bubble sort. However, its parallel implementation
corresponds to a time complexity of O(n).

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 12 / 41
Odd-Even Transposition Sort

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 13 / 41
Odd-Even Transposition Sort

rank = process_id();
A = initial_value();
for (i = 0; i < N; i++) {
if (i % 2 == 0) { // even phase
if (rank % 2 == 0) { // even process
recv(B, rank + 1); send(A, rank + 1);
A = min(A,B);
} else { // odd process
send(A, rank - 1); recv(B, rank - 1);
A = max(A,B);
}
} else if (rank > 0 && rank < N - 1) { // odd phase
if (rank % 2 == 0) { // even process
recv(B, rank - 1); send(A, rank - 1);
A = max(A,B);
} else { // odd process
send(A, rank + 1); recv(B, rank + 1);
A = min(A,B);
}
}

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 14 / 41
Mergesort

Mergesort is a classical sorting algorithm using a divide-and-conquer


approach. The initial unsorted list is first divided in half, each half sublist
is then applied the same division method until individual elements are
obtained. Pairs of adjacent elements/sublists are then merged into
sorted sublists until the one fully merged and sorted list is obtained.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 15 / 41
Mergesort

Computations only occur when merging the sublists. In the worst case, it
takes 2s − 1 steps to merge two sorted sublists of size s. If we have m = ns
sorted sublists in a merging step, it takes
m m m
(2s − 1) = ms − =n−
2 2 2
steps to merge all sublists (two by two).

Since in total there are log(n) merging steps, this corresponds to a time
complexity of O(n log(n)).

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 16 / 41
Parallel Mergesort

The idea is to take advantage of the tree structure of the algorithm


to assign work to processes.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 17 / 41
Parallel Mergesort

If we ignore communication time, computations still only occur when


merging the sublists. But now, in the worst case, it takes 2s − 1 steps to
merge all sublists (two by two) of size s in a merging step.

Since in total there are log(n) merging steps, it takes


log(n)
X
(2i − 1)
i=1

steps to obtain the final sorted list in a parallel implementation, which


corresponds to a time complexity of O(n).

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 18 / 41
Quicksort

Quicksort is a popular sorting algorithm also using a divide-and-conquer


approach. The initial unsorted list is first divided into two sublists in such
a way that all elements in the first sublist are smaller than all the
elements in the second sublist. This is achieved by selecting one
element, called a pivot, against which every other element is compared
(the pivot could be any element in the list, but often the first or last
elements are chosen). Each sublist is then applied the same division
method until individual elements are obtained. With proper ordering of the
sublists, the final sorted list is then obtained.
On average, the quicksort algorithm shows a time complexity of
O(n log(n)) and, in the worst case, it shows a time complexity of O(n2 ),
though this behavior is rare.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 19 / 41
Quicksort

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 20 / 41
Quicksort: Lomuto Partition Scheme

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 21 / 41
Quicksort: Hoare Partition Scheme

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 22 / 41
Parallel Quicksort

As for mergesort, the idea is to take advantage of the tree structure of


the algorithm to assign work to processes.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 23 / 41
Parallel Quicksort
Allocating processes in a tree structure leads to two fundamental problems:
In general, the partition tree is not perfectly balanced (selecting
good pivot candidates is crucial for efficiency)
The process of assigning work to processes seriously limits the
efficient usage of the available processes (the initial partition only
involves one process, then the second partition involves two processes,
then four processes, and so on)
Again, if we ignore communication time and consider that pivot selection
is ideal, i.e., creating sublists of equal size, then it takes
log(n)
X n
≈ 2n
i=0
2i
steps to obtain the final sorted list in a parallel implementation, which
corresponds to a time complexity of O(n). The worst-case pivot selection
degenerates to a time complexity of O(n2 ).
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 24 / 41
Odd-Even Merge

Odd-even merge is a merging algorithm for sorted lists which can merge
two sorted lists into one sorted list. It works as follows:
Let A = [a0 , . . . , an−1 ] and B = [b0 , . . . , bn−1 ] be two sorted lists
Consider E (A) = [a0 , a2 , . . . , an−2 ], E (B) = [b0 , b2 , . . . , bn−2 ] and
O(A) = [a1 , a3 , . . . , an−1 ], O(B) = [b1 , b3 , . . . , bn−1 ] as the sublists
with the elements of A and B in even and odd positions, respectively
Recursively merge E (A) with E (B) to obtain C
Recursively merge O(A) with O(B) to obtain D
Interleave C with D to get an almost-sorted list E
Rearrange unordered neighbors (using compare-and-exchange
operations) to completely sort E

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 25 / 41
Odd-Even Merge

Let A = [2, 4, 5, 8] and B = [1, 3, 6, 7] then:


E (A) = [2, 5] and O(A) = [4, 8]
E (B) = [1, 6] and O(B) = [3, 7]

Odd-even merge algorithm:


Recursively merge E (A) with E (B) to obtain C
C = odd_even_merge([2, 5], [1, 6]) = [1, 2, 5, 6]
Recursively merge O(A) with O(B) to obtain D
D = odd_even_merge([4, 8], [3, 7]) = [3, 4, 7, 8]
Interleave C with D to obtain E
E = [1, 2, 3, 5, 4, 6, 7, 8]
Rearrange unordered neighbors (c, d) as (min(c, d), max (c, d))
E = [1, 2, 3, 4, 5, 6, 7, 8]

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 26 / 41
Odd-Even Merge

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 27 / 41
Odd-Even Merge

In summary, merging two sorted lists of size n requires:


Two recursive calls to odd_even_merge() with each call merging two
sorted sublists of size n2
n − 1 compare-and-exchange operations

For example, the call to odd_even_merge([2, 4, 5, 8], [1, 3, 6, 7]) leads to:
One call to odd_even_merge([2, 5], [1, 6])
Another call to odd_even_merge([4, 8], [3, 7])
3 compare-and-exchange operations

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 28 / 41
4x4 Odd-Even Merge

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 29 / 41
Parallel Odd-Even Merge

For each odd_even_merge() call with lists of size n, we can:


Run in parallel the two recursive calls to odd_even_merge() for
merging sorted sublists of size n2
Run in parallel the n − 1 compare-and-exchange operations

Since in total there are log(n) − 1 recursive merging steps, it takes


log(n) + 1 parallel steps to obtain the final sorted list in a parallel
implementation, which corresponds to a time complexity of O(log(n)):
2 parallel steps for the compare-and-exchange operations of the last
recursive calls to odd_even_merge() (sublists of size 2)
log(n) − 1 parallel steps for the remaining compare-and-exchange
operations

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 30 / 41
Parallel Odd-Even Merge

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 31 / 41
Odd-Even Mergesort
Odd-even mergesort is a parallel sorting algorithm based on the recursive
application of the odd-even merge algorithm that merges sorted
sublists bottom up – starting with sublists of size 2 and merging them into
bigger sublists – until the final sorted list is obtained.

How many steps does the algorithm?


Since in total there are log(n) sorting steps and each step (odd-even merge
algorithm) corresponds to a time complexity of O(log(n)), the odd-even
mergesort time complexity is thus O(log(n)2 ).
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 32 / 41
Bitonic Mergesort
The basis of bitonic mergesort is the bitonic sequence, a list having
specific properties that are exploited by the sorting algorithm.
A sequence is considered bitonic if it contains two sequences, one
increasing and one decreasing, such that:
a1 < a2 < . . . < ai−1 < ai > ai+1 > ai+2 > . . . > an
for some value i (0 ≤ i ≤ n). A sequence is also considered bitonic if the
property is attained by shifting the numbers cyclically (left or right).

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 33 / 41
Bitonic Mergesort

A special characteristic of bitonic sequences is that if we perform a


compare-and-exchange operation with elements ai and ai+n/2 for all i
(0 ≤ i ≤ n/2) in a sequence of size n, we obtain two bitonic sequences
in which all the values in one sequence are smaller than the values
of the other.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 34 / 41
Bitonic Mergesort
In addition to both sequences being bitonic sequences, all values in the
left sequence are less than all the values in the right sequence.
Hence, if we apply recursively these compare-and-exchange
operations to a given bitonic sequence, we will get a sorted sequence.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 35 / 41
Bitonic Mergesort

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 36 / 41
Bitonic Mergesort
To sort an unsorted sequence, we first transform it in a bitonic
sequence. Starting from adjacent pairs of values of the given unsorted
sequence, bitonic sequences are created and then recursively merged
into (twice the size) larger bitonic sequences. At the end, a single
bitonic sequence is obtained, which is then sorted (as described before)
into the final increasing sequence.

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 37 / 41
Bitonic Mergesort

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 38 / 41
Bitonic Mergesort

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 39 / 41
Bitonic Mergesort
The bitonic mergesort algorithm can be split into k phases (with n = 2k ):
Phase 1: convert adjacent pair of values into increasing/decreasing
sequences, i.e., into 4-bit bitonic sequences
Phases 2 to k-1: sort each m-bit bitonic sequence into
increasing/decreasing sequences and merge adjacent sequences into a
2m-bit bitonic sequence, until reaching a n-bit bitonic sequence
Phase k: sort the n-bit bitonic sequence

Given an unsorted list of size n = 2k , there are k phases, with each phase i
taking i parallel steps. Therefore, in total it takes
k
X k(k + 1) log(n)(log(n) + 1)
i= =
i=1
2 2

parallel steps to obtain the final sorted list in a parallel implementation,


which corresponds to a time complexity of O(log(n)2 ).
R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 40 / 41
Time Complexity Summary

Algorithm Sequential Parallel


Bubble Sort O(n2 ) O(n) ∗

Mergesort O(n log(n)) O(n)


Quicksort O(n log(n)) ∗∗ O(n) ∗∗∗

Odd-Even Mergesort – O(log(n)2 ) ∗∗∗∗

Bitonic Mergesort – O(log(n)2 )



odd-even transposition sort
∗∗
average time complexity
∗∗∗
ideal pivot selection
∗∗∗∗
recursive odd-even merge

R. Rocha and F. Silva (DCC-FCUP) Parallel Sorting Algorithms Parallel Computing 15/16 41 / 41

You might also like