PDSA Week 2
PDSA Week 2
Measuring performance
Example of validating SIM cards against Aadhar data
• Naive approach takes thousands of years
• Smarter solution takes a few minutes
Two main resources of interest
• Running time - how long the algorithm takes
• Space - memory requirement
Time depends on processing power
• Impossible to change for given hardware
• Enhancing hardware has only a limited impact at a practical level
Storage is limited by available memory
• Easier to configure, augment
Typically, we focus on time rather than space
Input size
Running time depends on input size
• Larger arrays will take longer to sort
Measure time efficiency as function of input size
• Input size n
• Running time t(n)
Different inputs of size n may take different amounts of time
• We will return to this point later
Example 1 SIM cards vs Aadhar cards
• n = 10^9 - number of cards
• Naive algorithm: t(n) = n^2
• Clever algorithm: t(n) = n log n
○ Log n to the base 2 (log n) - number of times you need to divide n by 2 to reach 1
○ log n = k => n = 2^k
Example 2 Video game
• Several objects on screen
• Basic step: find closest pair of objects
• n objects - naive algorithm is n^2
○ For each pair of objects, compute their distance
○ Report minimum distance across all pairs
• There is a clever algorithm that takes time n log2n
• High resolution gaming console may have 4000x2000 pixels
○ 8 * 10^6 points - 8 million
• Suppose we have 100,000 = 1 * 10^5 objects
• Naive algorithm takes 10^10 steps
○ 1000 seconds, or 16.7 minutes in Python
○ Unacceptable response time!
• log2 100,000 is under 20, so n log2 n takes a fraction of a second
Orders of magnitude
When comparing t(n), focus on orders of magnitude
• Ignore constant factors
f(n) = n^3 eventually grows faster than g(n) = 5000 n^2
• For small values of n, f(n) < g(n)
• After n = 5000, f(n) overtakes g(n)
Week 2 Page 1
• After n = 5000, f(n) overtakes g(n)
Asymptotic complexity
• What happens in the limit, as n becomes large
Typical growth functions
• Is t(n) proportional to log n, …, n^2, n^3, …, 2^n?
○ Note: log n means log2 n by default
• Logarithmic, polynomial, exponential, …
2^n is the number of possible subsets of n elements
Week 2 Page 2
• Average over what? Are all inputs equally likely?
• Need a probability distribution over inputs
Instead, worst case input
• Input that forces algorithm to take longest possible time
○ Search for a value that is not present in an unsorted list
○ Must scan all elements
• Pessimistic - worst case may be rare
• Upper bound for worst case guarantees good performance
Summary
Two important parameters when measuring algorithm performance
• Running time, memory requirement (space)
• We mainly focus on time
Running time t(n) is a function of input size n
• Interested in orders of magnitude
• Asymptotic complexity, as n becomes large
From running time, we can estimate feasible input sizes
We focus on worst case inputs
• Pessimistic, but easier to calculate than average case
• Upper bound on worst case gives us an overall guarantee on performance
Week 2 Page 3
Comparing Orders of Magnitude
07 October 2021 23:50
Orders of Magnitude
When comparing t(n), focus on orders of magnitude
• Ignore constant factors
f(n) = n^3 eventually grows faster than g(n) = 5000 n^2
How do we compare functions with respect to orders of magnitude?
Upper bounds
f(x) is said to be O(g(x)) if we can find constants c and xo such that c.g(x) is an upper bound for f(x) for x
beyond xo
f(x) <= cg(x) for every x >= xo
Week 2 Page 4
Week 2 Page 5
Week 2 Page 6
Calculating complexity
08 October 2021 00:05
Calculating complexity
• Iterative programs
• Recursive programs
Example 1
Find the maximum element in a list
• Input size is length of the list
• Single loop scans all elements
• Always takes n steps
• Overall time is O(n)
def maxElement(L):
maxval = L[0]
for i in range(len(L)):
if L[i] > maxval:
maxval = L[i]
return(maxval)
Example 2
Check whether a list contains duplicates
• Input size is length of the list
• Nested loop scans all pairs of elements
• A duplicate may be found in the very first iteration
• Worst case - no duplicates, both loops run fully
• Time is (n-1) + (n-2) + … + 1 = n(n-1)/2
• Overall time is O(n^2)
def noDuplicates(L):
for i in range(len(L)):
for j in range(len(L)):
if L[i] == L[j]:
return(False)
return(True)
Example 3
Matrix multiplication
• Matrix is represented as list of lists
○ |1 2 3|
○ |4 5 6|
○ [[1,2,3],[4,5,6]]
• Input matrices have size m x n, n x p
• Output matrix is m x p
• Three nested loops
• Overall time is O(mnp) - O(n^3) if both are n x n
def matrixMultiply(A,B):
(m,n,p) = (len(A),len(B),len(B[0]))
Week 2 Page 7
C = [[ 0 for i in range(p) ] for j in range(m) ]
for i in range(m):
for j in range(p):
for k in range(n):
C[i][j] = C[i][j] + A[i][k]*B[k][j]
return(C)
Example 4
Number of bits in binary representation of n
• log n steps for n to reach 1
• For number theoretic problems, input size is number of digits
• This algorithm is linear in input size
def numberOfBits(n):
count = 1
while n > 1:
count = count + 1
n = n//2
return(count)
Example 5
Towers of Hanoi
• Three pegs A, B, C
• Move n disks from A to B, use C as transit peg
• Never put a larger disk on a smaller one
Recursive solution
• Move n - 1 disks from A to C, use B as transit peg
• Move larger disk from A to B
• Move n - 1 disks from C to B, use A as transit peg
Recurrence
• M(n) - number of moves to transfer n disks
• M(1) = 1
• M(n) = M(n-1) + 1 + M(n-1) = 2 M(n-1) + 1
Unwind and solve
• M(n) = 2M(n-1)+1
= 2(2M(n-2) + 1) + (2+1)
= 2^3 M(n-3) + (4+2+1)
…
= 2^k M(n-k) + (2^k -1)
= 2^n-1 M(1) + (2^n-1 -1)
Summary
Iterative programs
• Focus on loops
Recursive programs
• Write and solve a recurrence
Need to be clear about accounting for "basic" operations
Week 2 Page 8
Searching in a List
08 October 2021 00:28
Search problem
Is value v present in list l?
Naive solution scans list
Input size n, the length of the list
Worst case is when v is not present in l
Worst case complexity is O(n)
def naivesearch(v,l):
for x in l:
if v == x:
return(True)
return(False)
def binarysearch(v,l):
if l == []:
return(False)
m = len(l)//2
if v == l[m]:
return(True)
if v < l[m]:
return(binarysearch(v,l[:m]))
else:
return(binarysearch(v,l[m+1:]))
Binary search
How long does this take?
• Each call halves the interval to search
• Stop when the interval becomes empty
log n - number of times to divide n by 2 to reach 1
• 1 // 2 = 0, so next call reach empty interval
O(log n) steps
Alternative calculation
T(n) : the time to search a list of length n
• If n = 0, we exit, so T(n) = 1
• If n > 0, T(n) = T(n//2) + 1
Recurrence for T(n)
• T(0) = 1
• T(n) = T(n // 2) + 1, n > 0
Week 2 Page 9
• T(n) = T(n // 2) + 1, n > 0
Solve by "unwinding"
T(n) = T(n // 2) + 1
= (T(n // 4) + 1) + 1
= T(n // 2^k) + 1 + … + 1
= T(n // 2^k) + k
= T(1) + k, for k = log n
= T(0) + 1 + log n
= 2 + log n
First non-trivial algorithm in this course
Summary
Search in an unsorted list takes time O(n)
• Need to scan the entire list
• Worst case is when the value is not present in the list
For a sorted list, binary search takes time O(log n)
• Halve the interval to search each time
In a sorted list, we can determine that v is absent by examining just log n values!
Week 2 Page 10
Selection Sort
08 October 2021 10:23
Sorting a list
Sorting a list makes many other computations easier
• Binary search
• Finding the median
• Checking for duplicates
• Building a frequency table of values
How do we sort a list?
Selection sort
Select the next element in sorted order
Append it to the final sorted list
Avoid using a second list
• Swap the minimum element into the first position
• Swap the second minimum element into the second position
• …
Eventually the list is rearranged in place in ascending order
def SelectionSort(L):
n = len(L)
if n < 1:
return(L)
for i in range(n):
# Assume L[:i] is sorted
mpos = i
# mpos: position of minimum in L[i:]
for j in range(i+1, n):
if L[j] < L[mpos]:
mpos = j
# L[mpos] : smallest value in L[i:]
# Exchange L[mpos] and L[i]
(L[i],L[mpos]) = (L[mpos],L[i])
# Now L[:i+1] is sorted
return(L)
Week 2 Page 11
Correctness follows from the invariant
Efficiency
• Outer loop iterates n times
• Inner loop: n - i steps to find minimum in L[i:]
• T(n) = n + n-1 + n-2 + … + 1 = n(n+1)/2
• T(n) is O(n^2)
Summary
Selection sort is an intuitive algorithm to sort a list
Repeatedly find the minimum (or maximum) and append to sorted list
Worst case complexity is O(n^2)
• Every input takes this much time
• No advantage even if list is arranged carefully before sorting
Week 2 Page 12
Inserting Sort
08 October 2021 10:40
Sorting a list
Strategy 2
Move the first paper to a new pile
Second paper
• Lower marks than first paper? Place below first paper in new pile
• Higher marks than first paper? Place above first paper in new pile
Third paper
• Insert into correct position with respect to first two
Do this for the remaining papers
• Insert each one into correct position in the second pile
Insertion sort
Start building a new sorted list
Pick next element and insert it into the sorted list
An iterative formulation
• Assume L[:i] is sorted
• Insert L[i] in L[:i]
def InsertionSort(L):
n = len(L)
if n < 1:
return(L)
for i in range(n):
# Assume L[:i] is sorted
# Move L[i] to correct position in L
j=i
while (j > 0 and L[j] < L[j-1]):
(L[j],L[j-1]) = (L[j-1],L[j])
j = j-1
# Now L[:i+1] is sorted
return(L)
A recursive formulation
• Inductively sort L[:i]
• Insert L[i] in L[:i]
def Insert(L,v):
n = len(L)
if n == 0:
return([v])
if v >= L[-1]:
return(L+[v])
else:
return(Insert(L[:-1],v)+L[-1:])
def Isort(L):
n = len(L)
if n < 1:
return(L)
Week 2 Page 13
return(L)
L = Insert(Isort(L[:-1]),L[-1])
return(L)
Week 2 Page 14
Merge Sort
08 October 2021 11:04
Strategy 3
Divide the list into two halves
Separately sort the left and right half
Combine the two sorted halves to get a fully sorted list
Combining two sorted lists
Combine two sorted lists A and B into a single sorted list C
• Compare first elements of A and B
• Move the smaller of the two to C
• Repeat till you exhaust A and B
Merging A and B
Merge sort
Let n be the length of L
Sort A[:n//2]
Sort A[n//2:]
Merge the sorted halves into B
How do we sort A[:n//2] and A[n//2:]?
• Recursively, same strategy!
Week 2 Page 15
Divide and Conquer
• Break up the problem into disjoint parts
• Solve each part separately
• Combine the solutions efficiently
Merge sort is a Divide and Conquer approach
Merging sorted lists
Combine two sorted lists A and B into C
• If A is empty, copy B into C
• If B is empty, copy A into C
• Otherwise, compare first elements of A and B
○ Move the smaller of the two to C
• Repeat till all elements of A and B have been moved
def merge(A,B):
(m,n) = (len(A),len(B))
(C,i,j,k) = ([],0,0,0)
while k < m+n:
if i == m:
C.extend(B[j:])
k = k + (n-j)
elif j == n:
C.extend(A[i:])
k = k + (n-i)
elif A[i] < B[j]:
C.append(A[i])
(i,k) = (i+1,k+1)
else:
C.append(B[j])
(j,k) = (j+1,k+1)
return(C)
Merge sort
To sort A and B, both of length n
If n <= 1, nothing to be done
Otherwise
• Sort A[:n//2] into L
• Sort A[n//2:] into R
• Merge L and R into B
def mergesort(A):
n = len(A)
if n <= 1:
return(A)
L = mergesort(A[:n//2])
Week 2 Page 16
L = mergesort(A[:n//2])
R = mergesort(A[n//2:])
B = merge(L,R)
return(B)
Summary
Merge sort using divide and conquer to sort a list
Divide the list into two halves
Sort each half
Merge the sorted halves
Next, we have to check that the complexity is less than O(n^2)
Week 2 Page 17
Analysis of Merge Sort
08 October 2021 15:23
Analysing merge
Merge A of length m, B of length n
Output list C has length m + n
In each iteration we add (at least) one element to C
Hence merge takes time O(m + n)
Recall that m + n <= 2(max(m,n))
If m = n, merge take time O(n)
Analysing mergesort
Let T(n) be the time taken for input of size n
• For simplicity, assume n = 2^k for some k
Recurrence
• T(0) = T(1) = 1
• T(n) = 2T(n/2) + n
○ Solve two subproblems of size n/2
○ Merge the solutions in time n/2 + n/2 = n
Unwind the recurrence to solve
Recurrence
• T(0) = T(1) = 1
• T(n) = 2T(n/2) + n
= 2[2T(n/4) + n/2] + n
= 2^2 T(n/2^2) + 2n
= 2^3 T(n/2^3) + 3n
…
= 2^k T(n/2^k) + kn
• When k = log n, T(n/2^k) = T(1) = 1
• T(n) = n + n log n
• Hence T(n) is O(n log n)
Summary
Merge sort takes time O(n log n) so can be used effectively on large inputs
Variations on merge are possible
• Union of two sorted lists - discard duplicates, if A[i] == B[j] move just one copy to C and increment
both i and j
• Intersection of two sorted lists - when A[i] == B[j], move one copy to C, otherwise discard the
smaller of A[i], B[j]
• List difference - elements in A but not in B
Merge needs to create a new list to hold merged elements
• No obvious way to efficiently merge two lists in place
• Extra storage can be costly
Inherently recursive
• Recursive calls and returns are expensive
Week 2 Page 18
Implementation of Searching and Sorting algorithms
08 October 2021 15:37
Week 2 Page 19
Programming assignments
08 October 2021 18:23
PPA 1
Solution:
def binarySearchPrintIndexAndComparisons(L, k):
s = len(L)
if(s < 1):
return (False, 0)
left = 0
right = s - 1
c=0
while(left <= right):
mid = (left + right)//2
c += 1
if (k == L[mid]):
return (True,c)
elif (k < L[mid]):
right = mid - 1
else:
left = mid + 1
return(False, c)
PPA 2
Week 2 Page 20
Solution:
def sortInRange(L, r):
# Create a dictionary with r keys for each integer in range r, initialize every value to 0
countDict = dict.fromkeys(range(r), 0)
# Iterate over the array and count each integer in the list
for num in L:
countDict[num] += 1
index=0
for key in countDict.keys():
for i in range(countDict[key]):
L[index] = key
index += 1
GrPA 1
Code:
def combinationSort(strList):
n = len(strList)
L1 = strList.copy()
L2 = strList.copy()
if n <= 1:
return (L1, L2)
Week 2 Page 21
return (L1, L2)
# sorting L1
for i in range(n):
j=i
while (j > 0 and L1[j][0] < L1[j - 1][0]):
(L1[j], L1[j - 1]) = (L1[j - 1], L1[j])
j -= 1
# sorting L2
for i in range(n):
mpos = i
for j in range(i + 1, n):
if L2[j][0] < L2[mpos][0]:
mpos = j
if L2[j][0] == L2[mpos][0] and int(L2[j][1:]) > int(L2[mpos][1:]):
mpos = j
Solution:
#python
import string
def combinationSort(strList):
# Create a dictionary with 26 keys from characters 'a' to 'z', each key has an empty list as value.
groups = {k: [] for k in string.ascii_lowercase}
strList=[]
# Recreate the list from all the strings in groups, iterating on groups from a to z.
for char in groups.keys():
for s in groups[char]:
strList.append(s)
Week 2 Page 22
return L1, strList
GrPA 2
Code:
def findLargest(L):
left = 0
right = len(L) - 1
if left == right:
return L[left]
else:
left = left + 1
return L[left]
Solution:
#python
def findLargest(L):
left = 0
s = len(L)
right = s-1
Week 2 Page 23
while (left<=right):
mid=(left+right)//2
GrPA 3
Week 2 Page 24
Code:
def InsertionSort(L):
n = len(L)
if n <= 1:
return L
for i in range(n):
j=i
while (j > 0 and L[j] < L[j - 1]):
L.swap(j, L, j - 1)
j -= 1
return L
(i, j) = (m - 1, 0)
else:
break
A = InsertionSort(A)
B = InsertionSort(B)
return (A, B)
Solution:
#python
def mergeInPlace(A, B):
Week 2 Page 25
def mergeInPlace(A, B):
m = len(A)
n = len(B)
if (m < 1 or n < 1):
return
Week 2 Page 26
Assignments
08 October 2021 19:09
Practice Assignment
1. Select the correct recurrence relation and time complexity for the given code.
a. O(n)
b. O(logn)
c. O(nlogn)
d. O(n^2)
b. f(n) is O(n^3)
c. f(n) is Ω(n^3)
d. f(n) is θ(n^3)
e. f(n) is Ω(n^4)
Week 2 Page 27
Which of the options below complete the missing lines 13 and 15 in the above program
respectively?
a. right = mid + 1 # line 13
left = mid - 1 # line 15
b. right = mid - 1 # line 13
left = mid + 1 # line 15
c. right = mid - left # line 13
left = mid + right # line 15
d. right = mid - right # line 13
left = mid + left # line 15
5. Given below is a Python function to merge two sorted lists of integers.
Function mergeLists() accepts two lists sorted in ascending order and returns a list which is also
sorted in ascending order.
For list A of size m and list B of size n, what is the time complexity of the above algorithm?
a. O(m), if m>n
O(n), if m < n
Week 2 Page 28
O(n), if m < n
b. O(m+n)
c. O(m*n)
d. O(log m + log n)
Graded assignments
1. What is the time complexity of the given function fun ?
a. O(n^6)
b. O(n^2)
c. O(n^3)
d. O(n^5)
2. Let B(n)B(n), A(n)A(n) and W(n)W(n) be best-case, average-case, and worst-case running time of
an algorithm, executed on an input size n. Which of the following is/are always True.
a. A(n)=O(B(n))
b. A(n)=O(W(n))
c. A(n)=Ω(W(n))
d. A(n)=Ω(B(n))
e. B(n)=O(W(n))
f. B(n) = O(A(n))
3. What is the asymptotic complexity of merge sort when the input is sorted in reverse order?
a. O(n log n^2)
b. O(n^2 log n)
c. O(n log n)
d. O(n^2)
4. What is the value of i when the list [9, 4, 5, 2, 3, 7, 6, 8, 1] becomes completely sorted for the first
Week 2 Page 29
4. What is the value of i when the list [9, 4, 5, 2, 3, 7, 6, 8, 1] becomes completely sorted for the first
time?
Accepted answer:
Numeric: 7
5. Which of the following statement(s) is/are correct with regard to the given insertion sort? [MSQ]
d. After m iterations of the for-loop, the first m elements in the list are in sorted order
e. After m iterations of the for-loop, the first m elements in the list are the m smallest
elements of the list
6. A program is written in 3 stages where the first stage is of O(n\log{n})O(nlogn), the second stage is
of O(n^2)O(n2) which is based on the result of the first stage, and the third stage is of O(n)O(n).
What will be asymptotic complexity of the entire program?
a. O(n)
b. O(n^2)
c. O(n^3)
d. O(n^2 log n)
7. A school wants to maintain a database of its students. Each student has a unique id and it is stored
along with other details. Adding a new student with a unique id, searching for a student using
their id, and removal of students are the frequent operations performed on the database. From
the options given below, choose the most efficient technique to store the data.
a. Maintain a sorted list with id. Whenever a new student is added, append the student details
at the end, and sort the entire list using selection sort.
Week 2 Page 30
b. Maintain a sorted list with id. Whenever a new student is added, append the student details
at the end, and sort the entire list using insertion sort.
c. Maintain a sorted list with id. Whenever a new student is added, append the student details
at the end, and sort the entire list using merge sort.
d. Maintain a sorted list with id. Whenever a new student is added, insert the student details
into the respective position in the sorted list by id.
a. O(sqrt(n))
b. O(log n / n^2)
c. O(log n)
d. O(log n / n)
a. Only I is true.
b. Only II is true.
Week 2 Page 31
a. f3(n), f4(n), f2(n), f1(n), f5(n)
Week 2 Page 32