Sorting
Sorting
Sorting
• Ground rules:
• sort the values in increasing order
• sort “in place,” using only a small amount of additional storage
• Terminology:
• position: one of the memory locations in the array
• element: one of the data items stored in the array
• element i: the element at position i
• Goal: minimize the number of comparisons C and the number
of moves M needed to sort the array.
• move = copying an element from one position to another
example: arr[3] = arr[5];
Defining a Class for our Sort Methods
public class Sort {
public static void bubbleSort(int[] arr) {
...
}
public static void insertionSort(int[] arr) {
...
}
...
}
• ~cscie119/examples/sorting/Sort.java
stack heap
arr
15 7 ...
0 1 2 3 4
2 6 15 12 44
0 1 2 3 4 0 1 2 3 4
2 4 15 12 66 2 4 6 12 15
Selecting an Element
• When we consider position i, the elements in positions
0 through i – 1 are already in their final positions.
0 1 2 3 4 5 6
example for i = 3: 2 4 7 21 25 10 17
Time Analysis
• Some algorithms are much more efficient than others.
• The time efficiency or time complexity of an algorithm is some
measure of the number of “operations” that it performs.
• for sorting algorithms, we’ll focus on two types of
operations: comparisons and moves
• The number of operations that an algorithm performs typically
depends on the size, n, of its input.
• for sorting algorithms, n is the # of elements in the array
• C(n) = number of comparisons
• M(n) = number of moves
(n - 1)((n - 1) 1)
=
2
(n - 1)n 2
= C(n) = n 2 - n 2
2
Focusing on the Largest Term
• When n is large, mathematical expressions of n are dominated
by their “largest” term — i.e., the term that grows fastest as a
function of n.
• example: n n2/2 n/2 n2/2 – n/2
10 50 5 45
100 5000 50 4950
10000 50,000,000 5000 49,995,000
Big-O Notation
• We specify the largest term using big-O notation.
• e.g., we say that C(n) = n2/2 – n/2 is O(n2)
• Common classes of algorithms:
name example expressions big-O notation
constant time 1, 7, 10 O(1)
logarithmic time 3log10n, log2n + 5 O(log n)
linear time 5n, 10n – 2log2n O(n)
slower
140
120
100 n^2
n log n
80
n
60 log n
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12
n
g(n) = n2
n
• Big-O notation specifies an upper bound on a function f(n)
as n grows large.
g(n) = n2
• Example:
0 1 2 3 4
15 4 2 12 6
4 15 2 12 6
2 4 15 12 6
2 4 12 15 66
2 4 6 12 15
• Sorting by selection:
• consider position 0: find the element (2) that belongs there
• consider position 1: find the element (9) that belongs there
• …
• Sorting by insertion:
• consider the 12: determine where to insert it
• consider the 15; determine where to insert it
• …
Inserting an Element
• When we consider element i, elements 0 through i – 1
are already sorted with respect to each other.
0 1 2 3 4
example for i = 3: 6 14 19 9 …
• To insert element i:
• make a copy of element i, storing it in the variable toInsert:
0 1 2 3
toInsert 9 6 14 19 9
int j = i;
do {
arr[j] = arr[j-1];
j = j - 1;
} while (j > 0 && toInsert < arr[j-1]);
arr[j] = toInsert;
}
}
}
}
Sorting Subarrays
• Basic idea:
• use insertion sort on subarrays that contain elements
separated by some increment
• increments allow the data items to make larger “jumps”
• repeat using a decreasing sequence of increments
• Example for an initial increment of 3:
0 1 2 3 4 5 6 7
36 18 10 27 3 20 9 8
• three subarrays:
1) elements 0, 3, 6 2) elements 1, 4, 7 3) elements 2 and 5
27 3 10 36 18 20 9 8
27 3 10 36 18 20 9 8
9 3 10 27 18 20 36 8
9 3 10 27 8 20 36 18
• Our version uses values that are one less than a power of two.
• 2k – 1 for some k
• … 63, 31, 15, 7, 3, 1
• can get to the next lower increment using integer division:
incr = incr/2;
int j = i;
do {
arr[j] = arr[j-incr];
j = j - incr;
} while (j > incr-1 &&
toInsert < arr[j-incr]);
arr[j] = toInsert;
} (If you replace incr with 1
} in the for-loop, you get the
incr = incr/2; code for insertion sort.)
}
}
Time Analysis of Shell Sort
• Difficult to analyze precisely
• typically use experiments to measure its efficiency
• With a bad interval sequence, it’s O(n2) in the worst case.
• With a good interval sequence, it’s better than O(n2).
• at least O(n1.5) in the average and worst case
• some experiments have shown average-case running times
of O(n1.25) or even O(n7/6)
• Significantly better than insertion or selection for large n:
n n2 n1.5 n1.25
10 100 31.6 17.8
100 10,000 1000 316
10,000 100,000,000 1,000,000 100,000
106 1012 109 3.16 x 107
• Running time:
Sorting by Exchange II: Quicksort
• Like bubble sort, quicksort uses an approach based on
exchanging out-of-order elements, but it’s more efficient.
• A recursive, divide-and-conquer algorithm:
• divide: rearrange the elements so that we end up with two
subarrays that meet the following criterion:
each element in the left array <= each element in the right array
example:
12 8 14 4 6 13 6 8 4 14 12 13
7 15 4 9 6 18 9 12
partition using a pivot of 9
7 9 4 6 9 18 15 12
all values <= 9 all values >= 9
4 8 14 12 6 18 4 8 14 12 6 18
pivot = 18
Partitioning Example 2
• Start i j
(pivot = 13): 24 5 2 13 18 4 20 19
i j
• Find: 24 5 2 13 18 4 20 19
i j
• Swap: 4 5 2 13 18 24 20 19
i j
• Find: 4 5 2 13 18 24 20 19
and now the indices are equal, so we return j.
i j
• Subarrays: 4 5 2 13 18 24 20 19
Partitioning Example 3 (done together)
• Start i j
(pivot = 5): 4 14 7 5 2 19 26 6
• Find: 4 14 7 5 2 19 26 6
• This approach benefits from the fact that you perform the
algorithm in parallel with each other.
1 1 1 1 1 1 … 1 1 1 1 0
1 n-1 n-1
1 n-2 n-2
1 n-3 n-3
....... ...
1 2 2
1 1
n
• C(n) = i = O(n2). M(n) and run time are also O(n2).
i 2
Mergesort
• All of the comparison-based sorting algorithms that we've seen
thus far have sorted the array in place.
• used only a small amount of additional memory
2 5 7 8 9 11 14 24
5 7 9 11
Merging Sorted Arrays
• To merge sorted arrays A and B into an array C, we maintain
three indices, which start out on the first elements of the arrays:
i
A 2 8 14 24 k
j C
B 5 7 9 11
12 8 14 4 6 33 2 27
split 12 8 14 4 6 33 2 27
split 12 8 14 4 6 33 2 27
split 12 8 14 4 6 33 2 27
merge 8 12 4 14 6 33 2 27
merge 4 8 12 14 2 6 27 33
merge 2 4 6 8 12 14 27 33
12 8 14 4 6 33 2 27
split into two 4-element subarrays, and make a recursive call to sort the left subarray:
12 8 14 4 6 33 2 27
12 8 14 4
split into two 2-element subarrays, and make a recursive call to sort the left subarray:
12 8 14 4 6 33 2 27
12 8 14 4
12 8
Tracing the Calls to Mergesort
split into two 1-element subarrays, and make a recursive call to sort the left subarray:
12 8 14 4 6 33 2 27
12 8 14 4
12 8
12
base case, so return to the call for the subarray {12, 8}:
12 8 14 4 6 33 2 27
12 8 14 4
12 8
12 8 14 4
12 8
base case, so return to the call for the subarray {12, 8}:
12 8 14 4 6 33 2 27
12 8 14 4
12 8
Tracing the Calls to Mergesort
merge the sorted halves of {12, 8}:
12 8 14 4 6 33 2 27
12 8 14 4
12 8 8 12
end of the method, so return to the call for the 4-element subarray, which now has
a sorted left subarray:
12 8 14 4 6 33 2 27
8 12 14 4
8 12 14 4
14 4
split it into two 1-element subarrays, and make a recursive call to sort the left subarray:
12 8 14 4 6 33 2 27
8 12 14 4
14 4
14 base case…
Tracing the Calls to Mergesort
return to the call for the subarray {14, 4}:
12 8 14 4 6 33 2 27
8 12 14 4
14 4
12 8 14 4 6 33 2 27
8 12 14 4
14 4
4 base case…
8 12 14 4
14 4
12 8 14 4 6 33 2 27
8 12 14 4
14 4 4 14
Tracing the Calls to Mergesort
end of the method, so return to the call for the 4-element subarray, which now has
two sorted 2-element subarrays:
12 8 14 4 6 33 2 27
8 12 4 14
12 8 14 4 6 33 2 27
8 12 4 14 4 8 12 14
4 8 12 14 6 33 2 27
perform a similar set of recursive calls to sort the right subarray. here's the result:
4 8 12 14 2 6 27 33
finally, merge the sorted 4-element subarrays to get a fully sorted 8-element array:
4 8 12 14 2 6 27 33
2 4 6 8 12 14 27 33
Implementing Mergesort
• One approach is to create new arrays for each new set of
subarrays, and to merge them back into the array that was split.
• Instead, we'll create a temp. array of the same size as the original.
• pass it to each call of the recursive mergesort method
• use it when merging subarrays of the original array:
arr 8 12 4 14 6 33 2 27
temp 4 8 12 14
• after each merge, copy the result back into the original array:
arr 4 8 12 14 6 33 2 27
temp 4 8 12 14
1 1 1 1 1 1 … 1 1 1 1
• at all but the last level of the call tree, there are 2n moves
• how many levels are there?
• M(n) = ?
• C(n) = ?
Summary: Comparison-Based Sorting Algorithms
algorithm best case avg case worst case extra
memory
selection sort O(n2) O(n2) O(n2) O(1)
insertion sort O(n) O (n2) O (n2) O(1)
Shell sort O(n log n) O (n1.5) O (n1.5) O(1)
bubble sort O (n2) O (n2) O (n2) O(1)
quicksort O(n log n) O(n log n) O(n2) O(1)
mergesort O(n log n) O(n log n) O(nlog n) O(n)
• O(n2)-time
• O(n3)-time
• O(log2n)-time
• O(2n)-time
How Does the Actual Running Time Scale?
• How much time is required to solve a problem of size n?
• assume that each operation requires 1 sec (1 x 10-6 sec)
time problem size (n)
function 10 20 30 40 50 60
n .00001 s .00002 s .00003 s .00004 s .00005 s .00006 s
n2 .0001 s .0004 s .0009 s .0016 s .0025 s .0036 s
n5 .1 s 3.2 s 24.3 s 1.7 min 5.2 min 13.0 min
2n .001 s 1.0 s 17.9 min 12.7 days 35.7 yrs 36,600 yrs
• sample computations:
• when n = 10, an n2 algorithm performs 102 operations.
102 * (1 x 10-6 sec) = .0001 sec
• when n = 30, a 2n algorithm performs 230 operations.
230 * (1 x 10-6 sec) = 1073 sec = 17.9 min
• sample computations:
• 1 hour = 3600 sec
that's enough time for 3600/(1 x 10-6) = 3.6 x 109 operations
• n2 algorithm:
n2 = 3.6 x 109 n = (3.6 x 109)1/2 = 60,000
n
• 2 algorithm:
2n = 3.6 x 109 n = log2(3.6 x 109) ~= 31