Chapter 1:
Introduction to Algorithms
Part IV
● Algorithms (Part I)
● Mathematical preliminaries
○ Order of growth (Part I)
Contents ○
○
Summation (Part II)
Recurrence (Part II)
● Proof of correctness (Part III)
● Analysis of sorting algorithms
● Sorting in Linear time
2
Analysis of Sorting Algorithms
3
Sorting
One of the most common data-processing applications
The process of arranging a collection of items/records in a specific order
The items/records consist of one or more fields or members
One of these fields is designated as the "sort key" in which the records are ordered
Sort order: Data may be sorted in either ascending sequence or descending
sequence
4
Sorting
Input Output
A sequence of numbers A permutation of the sequence of
a1, a2, a3, a4, ... , an Sorting numbers
b1, b2, b3, b4, ... , bn
Example: 2 5 6 1 12 10 Example: 1 2 5 6 10 12
5
Sorting
Types
1. Internal sort
○ All of the data are held in primary memory during the sorting process
○ Examples: Insertion, selection, heap, bubble, quick, shell sort
2. External sort
○ Uses primary memory for the data currently being sorted and secondary
storage for any data that does not fit in primary memory
○ Examples: merge sort
6
Sorting
Sort stability
Indicates that data with equal keys maintain their relative input order in the output
365 blue 119 purple 119 purple
212 green 212 green 212 blue
876 white 212 yellow 212 green
212 yellow 212 blue 212 yellow
119 purple 365 blue 365 blue
212 blue 876 white 876 white
Unsorted data Stable sort Unstable sort 7
● Selection Sort ✔
● Insertion Sort ✔
Sorting ● Merge Sort ✔
algorithms ● Quick Sort ✔
● Heap Sort
8
Sorting algorithms
Algorithm Best case Worst case Average case
Selection sort O(n2) O(n2) O(n2)
Insertion sort O(n) O(n2) O(n2)
Quick sort O(n log n ) O(n2) O(n log n)
Merge sort O(n log n) O(n log n) O(n log n)
Heap sort O(n log n) O(n log n) O(n log n)
9
Quick sort vs merge sort
In merge sort, the divide step does hardly anything, and all the real work happens
in the combine step whereas in quick sort, the real work happens in the divide step.
Quicksort works in place.
In practice, quicksort outperforms merge sort, and it significantly outperforms
selection sort and insertion sort.
10
Heap sort
● Is among the fastest sorting algorithms
● Is used with very large arrays
● Uses heap data structure for sorting
11
Heap
● A nearly complete binary tree in which the root contains the largest (or
smallest) element in the tree.
● Is completely filled on all levels except possibly the lowest, which is filled from
the left up to a point.
12
Heap property
There are two kinds of binary heaps: max-heaps and min-heaps.
In both kinds, the values in the nodes satisfy a heap property.
In a max-heap, the max-heap property is that for every node other than the root,
the value of a node is at most the value of its parent. Thus, the largest element in
a max-heap is stored at the root.
Similarly, in a min-heap, the smallest element is at the root.
13
Heap
78
56 32
45 8 23
78 56 32 45 8 23
[1] [2] [3] [4] [5] [6]
Heap in its array form
14
Heap sort
To implement the heap sort using a max-heap, we need two basic algorithms:
1. Max-heapify
Maintains the max-heap property by pushing the root down the tree until it is in
its correct position in the heap.
2. Build-max-heap
Produces a max-heap from an unordered input array.
15
Heap sort
16
Heap sort
Steps:
1. Convert the array into a max heap
2. Find the largest element of the list (i.e., the root of the heap) and then place it
at the end of the list. Decrement the heap size by 1 and readjust the heap
3. Repeat Step 2 until the unsorted list is empty
17
Heap sort
Example:
Sort the following data using heap sort:
8 32 45 78 23 56
18
Heap sort
Convert the array into a max heap
8
8 32 45 78 23 56
[1]
32 45
[2] [3]
78 23 56
[4] [5] [6]
19
Heap sort
Convert the array into a max heap
8
8 32 45 78 23 56
[1]
Starting from the index, i, of the node just
32 45 above the leaf level, check if the tree starting
at i is a max-heap. If not, fix it (by reheap down)
[2] [3]
78 23 56
[4] [5] [6]
20
Heap sort
Convert the array into a max heap
8
8 32 45 78 23 56
[1]
32 45
[2] [3] Heap property is not satisfied
78 23 56
[4] [5] [6]
21
Heap sort
Convert the array into a max heap
8
8 32 56 78 23 45
[1]
32 56
[2] [3] Heap property is not satisfied. Therefore, swap them
78 23 45
[4] [5] [6]
22
Heap sort
Convert the array into a max heap
8
8 32 56 78 23 45
[1]
32 56
[2] [3]
78 23 45
[4] [5] [6]
23
Heap sort
Convert the array into a max heap
8
8 32 56 78 23 45
[1]
32 56
[2] [3]
78 23 45
[4] [5] [6]
24
Heap sort
Convert the array into a max heap
8
8 32 56 78 23 45
[1]
32 56
[2] [3] Swap the root with the largest child
78 23 45
[4] [5] [6]
25
Heap sort
Convert the array into a max heap
8
8 78 56 32 23 45
[1]
78 56
[2] [3] Swap the root with the largest child
32 23 45
[4] [5] [6]
26
Heap sort
Convert the array into a max heap
8
8 78 56 32 23 45
[1]
78 56
[2] [3]
32 23 45
[4] [5] [6]
27
Heap sort
Convert the array into a max heap
8
8 78 56 32 23 45
[1]
78 56
[2] [3] Swap the root with the largest child
32 23 45
[4] [5] [6]
28
Heap sort
Convert the array into a max heap
78
78 8 56 32 23 45
[1]
8 56
[2] [3] Swap the root with the largest child
32 23 45
[4] [5] [6]
29
Heap sort
Convert the array into a max heap
78
78 8 56 32 23 45
[1]
8 56
[2] [3] Swap the root with the largest child
32 23 45
[4] [5] [6]
30
Heap sort
Convert the array into a max heap
78
78 32 56 8 23 45
[1]
32 56
[2] [3] Swap the root with the largest child
8 23 45
[4] [5] [6]
31
Heap sort
Convert the array into a max heap
78
78 32 56 8 23 45
[1]
32 56
[2] [3]
8 23 45
[4] [5] [6]
32
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
78
[1]
32 56
[2] [3]
8 23 45
[4] [5] [6]
33
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
78
[1]
32 56
[2] [3]
8 23 45
[4] [5] [6]
34
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
45
[1]
32 56
[2] [3]
8 23 78
[4] [5] [6]
35
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
56
[1]
32 45
[2] [3]
8 23 78
[4] [5]
36
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
56
[1]
32 45
[2] [3]
8 23 78
[4] [5]
37
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
56
[1]
32 45
[2] [3]
8 23 78
[4] [5]
38
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
23
[1]
32 45
[2] [3]
8 56 78
[4] [5]
39
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
45
[1]
32 23
[2] [3]
8 56 78
[4] [5]
40
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
45
[1]
32 23
[2] [3]
8 56 78
[4]
41
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
45
[1]
32 23
[2] [3]
8 56 78
[4]
42
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
[1]
32 23
[2] [3]
45 56 78
[4]
43
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
32
[1]
8 23
[2] [3]
45 56 78
44
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
32
[1]
8 23
[2] [3]
45 56 78
45
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
32
[1]
8 23
[2] [3]
45 56 78
46
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
23
[1]
8 32
[2] [3]
45 56 78
47
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
23
[1]
8 32
[2]
45 56 78
48
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
23
[1]
8 32
[2]
45 56 78
49
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
[1]
23 32
[2]
45 56 78
50
Swap the root of the heap with the
element at the end of the list. Decrement
Heap sort the heap size by 1 and readjust the heap.
[1]
23 32
45 56 78
51
Analysis of Max-Heapify
52
Analysis of Max-Heapify
Running time of lines 1 - 9 =
Running time of line 10 = ?
53
Analysis of Max-Heapify
Running time of lines 1 - 9 =
Running time of line 10 = ?
The running time T(n) of Max-Heapify
on a subtree of size n rooted at a
given node i is
+ Time to run Max-Heapify on a
subtree rooted at one of the
children of node i
54
Analysis of Max-Heapify
Max-Heapify may need to be called all
the way down to the bottom level.
55
Recall
What is the maximum number of nodes on level i of a binary tree?
What is the maximum number of nodes in a binary tree of height h?
56
Analysis of Max-Heapify
Max-Heapify may need to be called all
the way down to the bottom level.
In such case, children’s subtrees each
will have size of n/2 - 1 if the heap is a
complete binary tree, and at most 2n/3
if the bottom level of the tree is exactly
half full. (The latter is the worst case.)
From the master theorem, this
recurrence solves to
57
Analysis of Max-Heapify
Alternatively, we can characterize the
running time, T(n) of Max-Heapify on a
node of height h as O(h).
The height of a heap with n nodes is
58
Analysis of Build-Max-Heap
Trivial Analysis: Each call to Max-Heapify
requires log n time, we make n/2 such calls ⇒ O(n
log n).
59
Analysis of Build-Max-Heap
Tighter Bound: When Max-Heapify is called, the
running time depends on how far an element
might shift down before the process terminates.
In the worst case, the element might shift down all
the way to the leaf level.
To simplify the analysis, let’s assume that the heap
is a complete binary tree, i.e.
n = 2h+1 - 1
60
Analysis of build_heap
Tighter bound (contd.)
Number of nodes at the bottommost level =
2h but we do not call heapify on any of
these.
Number of nodes at the next to
bottommost level = 2h-1, each might shift
down 1 level.
And so on.
61
Analysis of build_heap
Tighter bound (contd.)
In general, number of nodes at level j from
the bottom = 2h-j, each might shift down j
level
So, the total time is proportional to
62
Analysis of build_heap
Recall: The infinite geometric series, for any constant x < 1
Taking the derivative of both sides w.r.t. x gives
63
Analysis of build_heap
When x = ½, we get
Using this as an approximation, we get
Since n = 2h+1 - 1, so we have T(n) ≤ n + 1, which implies T(n) is O(n).
Also, T(n) is Ω(n) since every element of the array must be accessed at least once.
Therefore, T(n) is Θ(n).
64
Analysis of heap sort
65
Analysis of heap sort
Running time of line 1 = O(n)
Running time of line 3-4 = O(1)
Running time of line 5 = O(log n)
Lines 3-5 are executed n-1 times.
Therefore, the running time of Heap-Sort is
T(n) = O(n) + O(1) + (n - 1) O(log n) = O(n log n)
66
Stability
Algorithm Stable ?
Selection sort No
Insertion sort Yes
Heap sort No
Merge sort Yes
Quick sort No
67
Sorting in linear time
68
Sorting algorithms ● Counting sort
that run in linear ● Radix sort
time ● Bucket sort
Readings
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Chapter 8: Sorting in
Linear Time. In Introduction to algorithms. MIT press.
Andersson, A., Hagerup, T., Nilsson, S., & Raman, R. (1998). Sorting in linear time?.
Journal of Computer and System Sciences, 57(1), 74-93.
Comparison sorts
Sorting algorithms studied so far are comparison sorts, i.e. they are based on
comparisons of input elements.
Sorting algorithm Time complexity
Insertion sort O(n2)
Selection sort O(n2)
Merge sort O(n lg n)
Any comparison sort algorithm requires
Heap sort O(n lg n) Ω(n lg n) comparisons in the worst case.
Quick sort O(n2)
Comparison sorts
To get information about an input sequence ⟨a1, a2, …, an⟩, comparison sort
algorithms only use comparisons of two elements ai and aj
● ai < aj,
● ai ≤ aj, If we assume all input elements are distinct, then
comparisons of the form ai = aj are useless.
● ai = aj,
Also, ai < aj, ai ≤ aj, ai ≥ aj, and ai > a are equivalent as they
● ai ≥ aj, yield identical information about the relative order of ai and aj.
● ai > aj
Comparison sorts can be viewed abstractly in terms of decision trees.
The decision tree model
● A decision tree is a full binary tree that represents the comparisons
between elements that are performed by a particular sorting algorithm
operating on an input of a given size.
● It does not consider control, data movement, and all other aspects of
the algorithm but only comparisons.
● In a decision tree, for an input of size n,
○ Each internal node is annotated by i : j for some i and j in the range 1 ≤ i, j
≤ n, indicating a comparison between ai and aj
○ Each leaf is annotated by a permutation ⟨𝛑(1), 𝛑(2), …, 𝛑(n)⟩, indicating the
ordering a𝛑(1) ≤ a𝛑(2) ≤ a𝛑(3)... ≤ a𝛑(n)
The decision tree model
The decision tree model
Because any correct sorting algorithm
must be able to produce each
permutation of its input, each of the n!
permutations on n elements must
appear as one of the leaves of the
decision tree.
Lower bound of comparison sort algorithms
The length of the longest simple path from the root of a decision tree to any of its
reachable leaves
= The worst-case number of comparisons that the corresponding sorting algorithm
performs
= The height of its decision tree
Lower bound of comparison sort algorithms
Consider a decision tree of height h with l reachable leaves.
Since each of the n! permutations of the input appears as some leaf, we have
n! ≤ l
Since a binary tree of height h has no more than 2h leaves, we have
n! ≤ l ≤ 2h
By taking logarithms, we get h ≥ lg (n!) = Ω(n lg n)
∴ Any comparison sort algorithm requires Ω(n lg n) comparisons in the worst case.
Counting sort
● Assumes that each of the n input elements is an integer in the range from 0 to
k, for some integer k
● When k = O(n), the sort runs in Θ(n) time
● Determines, for each input element x, the number of elements less than x.
○ This is how it positions x in its place in the output array
○ If there are 17 elements smaller than x, then x will be assigned position 18
● To sort an array A[1..n], we need two additional arrays
○ B[1..n] holds the sorted output
○ C[1..k] stores the number of repetitions of number in A
Counting sort
Counting sort
Counting sort
Counting sort
Counting sort
Counting sort
Counting sort
Line 2 - 3 → Θ(k)
Line 4 - 5 → Θ(n)
Line 7 - 8 → Θ(k)
Line 10 - 12 → Θ(n)
Counting sort
● Is not a comparison sort
● Beats the lower bound of Ω(n lg n)
● Is stable
● Is often used as a subroutine in radix sort
Radix sort
● The idea of Radix Sort is to do digit by digit sort starting from least significant
digit to most significant digit.
● In order for radix sort to work correctly, the digit sorts must be stable.
● Radix sort uses counting sort as a subroutine to sort.
Radix sort
Each element in the n-element array A has d digits, where digit 1 is the lowest-order
digit and digit d is the highest-order digit.
Radix sort
When each digit is in the range 0 to k - 1 (so that it can take on k possible values),
and k is not too large, counting sort is the obvious choice.
Each step for a digit takes Θ(n+k).
For d digits Θ(dn+dk) ⇒ The total time for radix sort is Θ(d(n+k))
When d is constant and k = O(n), we can make radix sort run in linear time.
Bucket sort
● Assumes that the input is drawn from a uniform distribution over the interval
[0, 1)
● Divides the interval [0, 1) into n equal-sized subintervals, or buckets, and then
distributes the n input numbers into the buckets
● To produce the output, we sort numbers in each bucket, then go through
buckets in order
Bucket sort
Bucket sort