Advanced_Algorithm_Lab_II_Experiment_CO mapping (1)
Advanced_Algorithm_Lab_II_Experiment_CO mapping (1)
2 Implement Depth First Search (DFS) and Breadth First Search (BFS) for a CO1
given graph using both recursive and non-recursive techniques.
3 Use linked lists to handle collisions and implement dictionary operations: CO2
insert, delete, and search.
4 Implement a hash table using open addressing with linear and quadratic CO2
probing. Compare collision resolution effectiveness.
6 Use disjoint sets to build a MST of a weighted undirected graph using CO3
Kruskal’s algorithm.
7 Implement and compare the performance (time and space) of Bubble CO4
Sort, Merge Sort, and Quick Sort.
8 Implement insertion and deletion operations in an AVL tree and display CO4
balancing at each step.
10 Implement the Rabin-Karp algorithm and demonstrate its efficiency with CO5
multiple patterns and large texts.
1.Implement in-order, pre-order, and post-order traversals of a binary
tree using both recursive and non-recursive (stack-based) approaches
Recursive Traversals:
1. In-order Traversal:
o Left, Root, Right
2. Pre-order Traversal:
o Root, Left, Right
3. Post-order Traversal:
o Left, Right, Root
Recursive Approach:
python
CopyEdit
def in_order_recursive(root):
if root is None:
return
# Visit left subtree
in_order_recursive(root.left)
# Visit root
print(root.value, end=' ')
# Visit right subtree
in_order_recursive(root.right)
def pre_order_recursive(root):
if root is None:
return
# Visit root
print(root.value, end=' ')
# Visit left subtree
pre_order_recursive(root.left)
# Visit right subtree
pre_order_recursive(root.right)
def post_order_recursive(root):
if root is None:
return
# Visit left subtree
post_order_recursive(root.left)
# Visit right subtree
post_order_recursive(root.right)
# Visit root
print(root.value, end=' ')
For non-recursive traversals, we use a stack to simulate the function call stack.
python
CopyEdit
def in_order_non_recursive(root):
stack = []
current = root
while current or stack:
# Reach the leftmost node of the current node
while current:
stack.append(current)
current = current.left
current = stack.pop()
print(current.value, end=' ')
python
CopyEdit
def pre_order_non_recursive(root):
if root is None:
return
stack = [root]
while stack:
node = stack.pop()
print(node.value, end=' ')
stack1 = [root]
stack2 = []
while stack1:
node = stack1.pop()
stack2.append(node)
Example Usage:
python
CopyEdit
# Create the binary tree
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)
root.right.left = Node(6)
root.right.right = Node(7)
In-order Non-recursive:
4 2 5 1 6 3 7
Pre-order Non-recursive:
1 2 4 5 3 6 7
Post-order Non-recursive:
4 5 2 6 7 3 1
Explanation:
In both recursive and non-recursive methods, the order of traversal remains the same. The stack-
based methods ensure that the traversal happens without explicit recursion, relying on a manual
stack to manage the nodes.
Graph Representation:
We'll represent the graph as an adjacency list using a dictionary, where the key is a node, and
the value is a list of adjacent nodes.
class Graph:
def __init__(self):
self.graph = {}
Recursive DFS:
In the recursive version, we traverse as far as possible along a branch, backtracking when we
reach a node with no unvisited neighbors.
The non-recursive version uses an explicit stack to simulate the function call stack.
while stack:
node = stack.pop()
if node not in visited:
print(node, end=" ")
visited.add(node)
# Add all unvisited neighbors to the stack (in reverse order for
correct traversal)
for neighbor in reversed(graph[node]):
if neighbor not in visited:
stack.append(neighbor)
Recursive BFS:
BFS is typically implemented iteratively using a queue, but here's a recursive approach using a
helper function:
if queue:
# Dequeue a node
node = queue.popleft()
The non-recursive version uses a queue, which ensures that we explore all nodes level by level.
while queue:
node = queue.popleft()
if node not in visited:
print(node, end=" ")
visited.add(node)
Example Usage:
Let's create a sample graph and perform both DFS and BFS traversals.
# DFS Recursive
print("DFS Recursive:")
dfs_recursive(graph.graph, 1) # Start from node 1
print()
# DFS Non-recursive
print("DFS Non-recursive:")
dfs_non_recursive(graph.graph, 1)
print()
# BFS Recursive
print("BFS Recursive:")
bfs_recursive(graph.graph, 1) # Start from node 1
print()
# BFS Non-recursive
print("BFS Non-recursive:")
bfs_non_recursive(graph.graph, 1)
Expected Output:
DFS Recursive:
1 2 4 5 3 6 7
DFS Non-recursive:
1 2 4 5 3 6 7
BFS Recursive:
1 2 3 4 5 6 7
BFS Non-recursive:
1 2 3 4 5 6 7
1. DFS Recursive: This depth-first search starts at node 1, visits the leftmost branch (node
2 and its children), then backtracks to node 1 and explores the right side (node 3 and its
children).
2. DFS Non-recursive: This version mimics the recursive DFS but uses an explicit stack.
The stack ensures that nodes are explored in depth-first order.
3. BFS Recursive: This version uses a queue to explore nodes level by level, starting from
node 1. We visit all the neighbors of a node before moving to the next level.
4. BFS Non-recursive: This is the standard BFS implementation, where we use a queue to
manage the nodes to visit next. It explores nodes in a breadth-first manner.
Differences between DFS and BFS:
DFS explores as deep as possible along a branch before backtracking, whereas BFS
explores all nodes at the current depth before moving on to nodes at the next depth level.
DFS is typically implemented using a stack (or recursion), and BFS is typically
implemented using a queue.
Both DFS and BFS can be applied to both directed and undirected graphs, and they can be
modified to handle weighted graphs or graphs with cycles.
We'll define a hash table where each bucket is a linked list, and each node in the list stores a key-
value pair.
Each node will store the key-value pair and a reference to the next node in the chain.
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.next = None
The hash table will have an array of linked list buckets, and we will define methods for inserting,
deleting, and searching elements.
class HashTable:
def __init__(self, size=10):
# Initialize the hash table with a fixed size and empty linked lists
self.size = size
self.table = [None] * self.size
current = self.table[index]
prev = None
current = self.table[index]
def display(self):
for i in range(self.size):
print(f"Bucket {i}:", end=" ")
current = self.table[i]
while current:
print(f"({current.key}: {current.value})", end=" -> ")
current = current.next
print("None")
Now we can test the insert, delete, and search operations in the hash table.
# Delete a key
print("\nDeleting 'banana':", hash_table.delete("banana")) # Should return
True
print("Deleting 'orange':", hash_table.delete("orange")) # Should return
False
Expected Output:
Hash Table after insertions:
Bucket 0: None
Bucket 1: (banana: 2) -> None
Bucket 2: (grape: 3) -> None
Bucket 3: (apple: 1) -> None
Bucket 4: (cherry: 4) -> (mango: 5) -> None
Explanation:
1. Insert:
o When inserting, the insert method hashes the key and checks the corresponding
bucket. If there's no collision, the key-value pair is directly inserted. If there's a
collision (i.e., the bucket is already occupied), it is inserted at the beginning or
end of the linked list in that bucket.
2. Delete:
o The delete method finds the node corresponding to the key and removes it from
the linked list by adjusting pointers. If the node is the first in the list, the pointer to
the bucket is updated to the next node.
3. Search:
o The search method finds the node corresponding to the key by traversing the
linked list at the hashed index and returns the value if found. If not, it returns
None.
4. Display:
o The display method prints the contents of the hash table for visualization
purposes. Each bucket is shown along with its linked list of key-value pairs.
Complexity:
This approach using linked lists for chaining is a simple and effective way to handle hash
collisions in a hash table.
Open addressing is a collision resolution technique where, upon a collision, the algorithm
searches for another open slot within the hash table by following a certain probing sequence. In
the case of linear probing and quadratic probing, we use different strategies to find the next
available slot.
1. Linear Probing: If a collision occurs at index i, check the next index i + 1, and if it's
occupied, check i + 2, and so on. Essentially, we move linearly through the table.
2. Quadratic Probing: If a collision occurs at index i, check the next index i + 1^2, then
i + 2^2, then i + 3^2, and so on. The idea is to probe at positions that grow
quadratically.
We will implement the hash table with both linear and quadratic probing methods and compare
their collision resolution effectiveness.
if self.table[idx] is None:
return None # Key not found
elif self.table[idx] is not self.deleted and self.table[idx][0] ==
key:
return self.table[idx][1] # Return the value
if self.table[idx] is None:
return False # Key not found
elif self.table[idx] is not self.deleted and self.table[idx][0] ==
key:
self.table[idx] = self.deleted # Mark the slot as deleted
return True
def display(self):
"""Display the current state of the hash table."""
for i, slot in enumerate(self.table):
if slot is None:
print(f"Bucket {i}: Empty")
elif slot is self.deleted:
print(f"Bucket {i}: Deleted")
else:
print(f"Bucket {i}: {slot}")
Step 3: Testing the Hash Table with Linear and Quadratic Probing
Let's create a hash table and perform insertions, deletions, and searches using both linear
probing and quadratic probing. We'll also compare the effectiveness by displaying the hash
table after each operation.
Expected Output:
Inserting with Linear Probing:
Bucket 0: (banana, 2)
Bucket 1: Empty
Bucket 2: (grape, 3)
Bucket 3: (apple, 1)
Bucket 4: (mango, 5)
Bucket 5: (cherry, 4)
Bucket 6: Empty
1. Linear Probing:
o When a collision occurs, the algorithm checks the next available index linearly (i
+ 1, i + 2, etc.). This can cause clustering if multiple keys hash to the same index.
o In the example above, keys are inserted into a hash table with linear probing, and
collisions are resolved by looking for the next available slot.
2. Quadratic Probing:
o In quadratic probing, if a collision occurs, the algorithm probes indices in a
quadratic sequence: i + 1^2, i + 2^2, i + 3^2, etc. This helps reduce clustering
compared to linear probing by spreading out the probe sequence.
o As seen in the example, quadratic probing spreads out the keys more effectively
and reduces the chance of clustering compared to linear probing.
Linear Probing: The table becomes prone to primary clustering (keys clustering
together), especially when there are several consecutive collisions. This can lead to
performance degradation.
Quadratic Probing: Quadratic probing helps reduce primary clustering by probing
slots that are further apart. However, it may still suffer from secondary clustering (keys
that hash to the same index might still collide), though it's less severe than with linear
probing.
Conclusion:
Linear Probing is simpler but can suffer from clustering, making it less efficient under
high collision scenarios.
Dijkstra's algorithm is a well-known shortest path algorithm that computes the shortest paths
from a source node to all other nodes in a graph with non-negative edge weights. The algorithm
works by progressively selecting the node with the smallest known distance and updating the
distances to its neighbors. It can be optimized using a priority queue (often implemented with a
min-heap) to efficiently get the node with the smallest tentative distance.
We will:
class Graph:
def __init__(self):
self.graph = {} # Adjacency list
self.graph[u].append((v, weight))
self.graph[v].append((u, weight)) # For undirected graph
while pq:
# Extract the node with the smallest distance from the priority
queue
current_distance, current_node = heapq.heappop(pq)
# Skip processing if the current node’s distance is outdated
if current_distance > distances[current_node]:
continue
# Example Usage:
# Create the graph and add edges
graph = Graph()
graph.add_edge("A", "B", 1)
graph.add_edge("A", "C", 4)
graph.add_edge("B", "C", 2)
graph.add_edge("B", "D", 5)
graph.add_edge("C", "D", 1)
1. Graph Representation:
o The graph is represented as an adjacency list stored in the self.graph dictionary.
Each key in the dictionary is a node, and the value is a list of tuples where each
tuple represents a neighbor node and the edge weight.
2. Dijkstra’s Algorithm:
o The dijkstra method takes a start node as input and computes the shortest
distance from this node to all other nodes using a priority queue (min-heap).
o The algorithm starts by initializing the distance to all nodes as infinity, except for
the source node which is set to 0. It uses a priority queue to extract the node with
the smallest tentative distance.
o For each node, it explores its neighbors and updates their tentative distances if a
shorter path is found.
o The algorithm uses a dictionary previous to keep track of the path for each node,
which is useful for reconstructing the shortest path later.
3. Shortest Path Reconstruction:
o The get_shortest_path method reconstructs the path from the start node to
the end node by following the previous pointers. The path is then reversed
because it’s reconstructed from the destination to the source.
Example Output:
Shortest distances from A: {'A': 0, 'B': 1, 'C': 3, 'D': 4}
Shortest path from A to D: ['A', 'B', 'C', 'D']
Distance: 4
Explanation of Output:
Shortest Distances:
o The distances from node "A" to all other nodes are calculated and stored in the
distances dictionary.
o For example, the shortest path from "A" to "B" is 1, and the shortest path from
"A" to "C" is 3.
Shortest Path:
o The shortest path from "A" to "D" is ['A', 'B', 'C', 'D'], and the total
distance is 4.
The priority queue (heapq) stores tuples of (distance, node), where distance is the
tentative shortest distance of the node, and node is the node itself.
heapq.heappop(pq) extracts the node with the smallest distance efficiently.
heapq.heappush(pq, (distance, neighbor)) adds neighbors to the queue with their
updated distances.
Time Complexity:
Conclusion:
This implementation of Dijkstra's algorithm with a priority queue is efficient and computes
the shortest path in a weighted graph with non-negative edge weights. The use of the priority
queue minimizes the time complexity for finding the node with the smallest tentative distance,
making the algorithm scalable to large graphs.
Kruskal's algorithm is a greedy algorithm used to find the Minimum Spanning Tree (MST) of a
connected, undirected graph with weighted edges. The key idea behind Kruskal’s algorithm is to
always pick the smallest edge that does not form a cycle, using a disjoint-set (union-find) data
structure to efficiently manage the merging of sets of nodes.
The union-find or disjoint-set data structure supports two main operations efficiently:
To improve performance, we use path compression (flattening the structure of the tree when
performing find) and union by rank/size (always attach the smaller tree under the larger tree to
keep the tree flat).
Kruskal's Algorithm with Disjoint Sets
Here is the implementation of Kruskal's algorithm using the disjoint-set data structure.
Python Code:
class DisjointSet:
def __init__(self, n):
self.parent = list(range(n)) # Parent pointer
self.rank = [0] * n # Rank of each node for union by rank
if root_x != root_y:
# Union by rank: attach the smaller tree under the larger tree
if self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
elif self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1
class KruskalMST:
def __init__(self, vertices):
self.V = vertices # Number of vertices
self.edges = [] # List to store all edges
def kruskal(self):
# Step 1: Sort edges by weight
self.edges.sort()
# Example Usage:
graph = KruskalMST(6)
graph.add_edge(0, 1, 4)
graph.add_edge(0, 2, 4)
graph.add_edge(1, 2, 2)
graph.add_edge(1, 3, 5)
graph.add_edge(2, 3, 3)
graph.add_edge(3, 4, 2)
graph.add_edge(4, 5, 6)
graph.add_edge(2, 5, 4)
1. DisjointSet Class:
o find(x): This function finds the root of the set containing x. Path compression is
used to make future queries faster by directly linking each node to its root.
o union(x, y): This function merges the sets containing x and y. It uses union by
rank, attaching the tree with smaller rank to the root of the tree with a larger rank,
thus keeping the tree height balanced.
2. KruskalMST Class:
o add_edge(u, v, weight): Adds an edge with the given vertices u, v and weight
to the graph.
o kruskal(): This is the main function to compute the MST. It:
1. Sorts all edges by their weight.
2. Initializes the disjoint-set data structure to track which vertices belong to
the same set.
3. Iterates over the sorted edges, adding each edge to the MST if it doesn't
form a cycle (i.e., if the vertices are not already in the same set).
4. Returns the list of edges in the MST and the total weight of the MST.
Example Graph:
Example Output:
Minimum Spanning Tree (MST):
Edge: 1 - 2, Weight: 2
Edge: 2 - 3, Weight: 3
Edge: 0 - 1, Weight: 4
Edge: 3 - 4, Weight: 2
Edge: 4 - 5, Weight: 6
Total weight of MST: 17
3. Step 2: The disjoint-set is initialized, with each vertex as its own set.
4. Step 3: The edges are processed one by one, adding them to the MST if they do not form
a cycle (i.e., if the two vertices of the edge are in different sets). The union operation is
used to merge the sets.
5. The algorithm stops when V−1V - 1 edges are added to the MST.
Time Complexity:
Sorting the edges takes O(ElogE)O(E \log E), where EE is the number of edges.
For each edge, the find and union operations take O(α(V))O(\alpha(V)), where α(V)\
alpha(V) is the inverse Ackermann function, which is nearly constant for practical values
of VV.
Overall, the time complexity of Kruskal’s algorithm is dominated by the sorting step:
O(ElogE)O(E \log E).
Conclusion:
Kruskal’s algorithm with a disjoint-set data structure is an efficient and elegant way to compute
the Minimum Spanning Tree (MST) of a weighted undirected graph. By using union by rank
and path compression, we can keep the union-find operations nearly constant, ensuring that
Kruskal’s algorithm runs efficiently even on large graphs.
We will implement the three sorting algorithms — Bubble Sort, Merge Sort, and Quick Sort
— and compare their performance in terms of both time complexity and space complexity.
Then, we'll measure the actual runtime for each algorithm on random arrays of different sizes to
evaluate their real-world performance.
1. Bubble Sort:
Bubble Sort is a simple sorting algorithm where each pass through the array compares adjacent
elements and swaps them if they are in the wrong order. This process repeats until no more
swaps are needed.
Time Complexity:
2. Merge Sort:
Merge Sort is a divide-and-conquer algorithm that splits the array into halves, recursively sorts
each half, and then merges the sorted halves.
Time Complexity:
Best, average, and worst case: O(nlogn)O(n \log n) (due to the divide and merge
process).
Space Complexity: O(n)O(n) (because it requires additional space to merge arrays).
Merge Sort Implementation:
def merge_sort(arr):
if len(arr) <= 1:
return arr
return result
3. Quick Sort:
Quick Sort is another divide-and-conquer algorithm that selects a "pivot" element and partitions
the array around the pivot. The pivot element is placed in its correct position, and the sub-arrays
are sorted recursively.
Time Complexity:
Best and average case: O(nlogn)O(n \log n) (when the pivot divides the array evenly).
Worst case: O(n2)O(n^2) (when the pivot always picks the smallest or largest element,
causing unbalanced partitions).
Space Complexity: O(logn)O(\log n) (due to recursive function calls).
Time Complexity:
Bubble Sort: O(n2)O(n^2) in the worst case, which makes it inefficient for large
datasets.
Merge Sort: O(nlogn)O(n \log n) in all cases (best, average, and worst).
Quick Sort: O(nlogn)O(n \log n) on average, but O(n2)O(n^2) in the worst case if the
pivot selection is poor.
Space Complexity:
Performance Measurement
Expected Output:
For small sizes (like 100), all algorithms should run quickly. For larger sizes (like 5000 or
10000), the difference in performance between Bubble Sort and the other two algorithms will be
noticeable. Here's an example of what the output might look like:
Summary of Results:
Bubble Sort:
o Time Complexity: O(n2)O(n^2), making it inefficient for larger datasets.
o Space Complexity: O(1)O(1), as it is an in-place algorithm.
o Performance: Becomes impractical for large datasets (e.g., for n=10000n =
10000, the time taken is very high).
Merge Sort:
o Time Complexity: O(nlogn)O(n \log n) in all cases.
o Space Complexity: O(n)O(n), requires extra space for merging.
o Performance: Performs well even on larger datasets (e.g., n=10000n = 10000).
Quick Sort:
o Time Complexity: O(nlogn)O(n \log n) on average, O(n2)O(n^2) in the worst
case (though it can be improved with randomized pivot selection).
o Space Complexity: O(logn)O(\log n) (due to the recursive stack).
o Performance: Generally fast and more efficient than Merge Sort for large
datasets. However, its worst-case performance can be a concern unless the pivot is
chosen wisely.
Conclusion:
Bubble Sort is only useful for small datasets or when simplicity is preferred.
Merge Sort and Quick Sort are both efficient and preferred for large datasets, with
Quick Sort usually performing better in practice due to lower overhead, despite its
worst-case complexity.
For practical usage on large datasets, Quick Sort is often the go-to algorithm, but in
cases where stability or guaranteed worst-case performance is required, Merge Sort is a
good alternative.
An AVL tree (Adelson-Velsky and Landis tree) is a self-balancing binary search tree (BST). It
maintains the balance factor of each node, which is defined as the difference between the
heights of the left and right subtrees. The balance factor must be between -1 and 1 for the tree to
remain balanced. When the balance factor of a node becomes less than -1 or greater than 1,
rotations are performed to restore balance.
Operations:
1. Insertion:
o Insert a node like in a regular BST.
o After insertion, check the balance factor of each ancestor node and perform
necessary rotations to maintain the AVL property.
2. Deletion:
o Remove a node like in a regular BST.
o After deletion, check the balance factor of each ancestor node and perform
necessary rotations to maintain the AVL property.
Balancing Rotations:
1. Left Rotation (Single rotation): Used when a right-heavy subtree becomes unbalanced.
2. Right Rotation (Single rotation): Used when a left-heavy subtree becomes unbalanced.
3. Left-Right Rotation (Double rotation): A combination of a left rotation followed by a
right rotation, used when the left child of the right subtree is unbalanced.
4. Right-Left Rotation (Double rotation): A combination of a right rotation followed by a
left rotation, used when the right child of the left subtree is unbalanced.
class AVLTree:
def __init__(self):
self.root = None
# Perform rotation
y.left = z
z.right = T2
# Update heights
z.height = max(self.get_height(z.left), self.get_height(z.right)) + 1
y.height = max(self.get_height(y.left), self.get_height(y.right)) + 1
# Perform rotation
x.right = y
y.left = T2
# Update heights
y.height = max(self.get_height(y.left), self.get_height(y.right)) + 1
x.height = max(self.get_height(x.left), self.get_height(x.right)) + 1
# Get the balance factor to check whether the node became unbalanced
balance = self.get_balance(node)
return node
# Get the balance factor to check whether the node became unbalanced
balance = self.get_balance(node)
return node
# Example Usage:
avl_tree = AVLTree()
Explanation:
Insert Method:
o We perform regular BST insertion first.
o After insertion, we update the height of the nodes and check the balance factor.
o If the balance factor is outside the range [-1, 1], we perform the appropriate
rotation to balance the tree.
Delete Method:
o We perform regular BST deletion (handle cases for nodes with 0, 1, or 2
children).
o After deletion, we update the height and balance of ancestor nodes and perform
rotations if necessary.
Rotations:
o The tree uses single or double rotations to maintain balance. Depending on the
balance factor of the node, we perform:
Right Rotation (LL imbalance),
Left Rotation (RR imbalance),
Left-Right Rotation (LR imbalance),
Right-Left Rotation (RL imbalance).
Example Output:
Inserting 20:
Pre-order traversal: 20
Inserting 15:
Left-Left Rotation at node 20
Pre-order traversal: 15 20
Inserting 25:
Pre-order traversal: 15 20 25
Inserting 10:
Pre-order traversal: 15 10 20 25
Inserting 5:
Pre-order traversal: 15 10 5 20 25
Inserting 1:
Pre-order traversal: 15 10 5 1 20 25
Inserting 30:
Pre-order
Steps:
1. Preprocessing the pattern: Create the partial match table (or the prefix function) that
stores the longest proper prefix of the pattern that is also a suffix for each position in the
pattern.
2. Searching the text: Use the preprocessed partial match table to skip unnecessary
comparisons while searching through the text.
Key Concepts:
Prefix Function: For each position in the pattern, we compute the length of the longest
prefix that is also a suffix up to that position. This helps in skipping unnecessary
comparisons during the search phase.
Matching Process: When a mismatch occurs, instead of starting over, the pattern is
shifted based on the information in the partial match table, so the algorithm skips
redundant checks.
KMP Algorithm
Here’s the implementation of the KMP algorithm to find all occurrences of a pattern in a given
text:
i = 1
while i < m:
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1] # fallback to previous valid prefix
length
else:
lps[i] = 0
i += 1
return lps
# Preprocess the pattern to get the LPS (longest prefix suffix) array
lps = compute_prefix_function(pattern)
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
# Found a match, append the starting index of the match (i - j)
occurrences.append(i - j)
j = lps[j - 1] # use the LPS array to skip unnecessary
comparisons
return occurrences
# Example usage:
text = "ABABDABACDABABCABAB"
pattern = "ABAB"
matches = KMPSearch(text, pattern)
Explanation:
1. compute_prefix_function:
This function computes the Longest Prefix Suffix (LPS) array for the given pattern.
The LPS array helps us avoid unnecessary comparisons when a mismatch occurs by
allowing us to shift the pattern efficiently.
2. KMPSearch:
This function searches for all occurrences of the pattern in the text.
It uses the precomputed LPS array to skip characters in the pattern and the text, speeding
up the search.
When a match is found, it appends the starting index of the match to the occurrences
list.
Example Walkthrough:
Consider the example with the text "ABABDABACDABABCABAB" and the pattern "ABAB".
o The value at each index in the LPS array represents the length of the longest
proper prefix of the substring ending at that index that is also a suffix.
3. Then, we perform the search using the KMP algorithm:
o We compare the text with the pattern, and each time a mismatch occurs, we shift
the pattern using the LPS array to avoid unnecessary comparisons.
o When a match is found, we record the index in the occurrences list.
4. Output:
5. Pattern found at indices: [0, 9, 15]
This means the pattern "ABAB" occurs at the 0th, 9th, and 15th indices in the text.
Time Complexity:
Preprocessing the pattern (LPS array) takes O(m)O(m), where mm is the length of the
pattern.
Searching the text takes O(n)O(n), where nn is the length of the text.
Space Complexity:
The KMP algorithm efficiently finds all occurrences of a pattern in a given text by
preprocessing the pattern to create a partial match table (LPS array) and using it to avoid
unnecessary comparisons during the search. The algorithm runs in linear time, making it much
faster than brute force string matching algorithms.
1. Hashing the Pattern(s): Compute a hash value for the pattern and store it.
2. Rolling Hash: Compute hash values for substrings of the text of the same length as the
pattern.
3. Compare Hashes: If the hash value of a substring matches the hash value of the pattern,
check the substring character-by-character to confirm the match.
4. Efficiency: The algorithm is efficient for multiple patterns, as it can calculate the hashes
of all substrings of the text and compare them with the pattern hashes in linear time.
Rolling Hash:
We use a rolling hash to compute the hash for a substring efficiently. As we slide the
window across the text, we can compute the hash for the next substring using the hash of
the previous one by removing the effect of the first character and adding the effect of the
new character.
The hash function typically uses a base (e.g., 256 for ASCII) and a large prime number
as a modulus.
1. Preprocessing:
o Calculate the hash of the pattern.
o Prepare the base and modulus for the rolling hash.
2. Search:
o Slide the window across the text, calculating hashes for substrings of the same
length as the pattern.
o Compare hashes and verify matches.
Rabin-Karp Algorithm Implementation:
def rabin_karp_search(text, pattern):
"""
Implements the Rabin-Karp algorithm for string matching.
Finds all occurrences of the pattern in the given text.
"""
n = len(text)
m = len(pattern)
# Calculate hash of the pattern and the first window of the text
pattern_hash = 0
window_hash = 0
h = 1
# Compute the hash for the pattern and the first window of the text
for i in range(m):
pattern_hash = (base * pattern_hash + ord(pattern[i])) % modulus
window_hash = (base * window_hash + ord(text[i])) % modulus
return matches
# Example usage:
text = "ABABDABACDABABCABAB"
patterns = ["ABAB", "DABAC", "ABCAB"]
print(f"Text: {text}\n")
for pattern in patterns:
matches = rabin_karp_search(text, pattern)
print(f"Pattern: {pattern} found at indices: {matches}")
Explanation of Code:
1. Hashing:
o The pattern_hash and window_hash store the hash values of the pattern and the
current window of the text, respectively.
o The rolling hash allows efficient calculation of the hash for the next substring by
removing the first character of the previous window and adding the new
character.
2. Hashing Details:
o The base (base = 256) represents the size of the character set (ASCII).
o The modulus (modulus = 101) is a large prime number to reduce the likelihood
of hash collisions.
3. Rolling Hash:
o The window hash is updated as the window slides over the text:
4. Character-by-character Comparison:
o After matching hashes, we perform a character-by-character comparison to
confirm that the substrings actually match, avoiding false positives from hash
collisions.
5. Sliding the Window:
o We slide the window one character at a time, updating the hash and checking for
matches.
Efficiency:
Time Complexity:
o Preprocessing: O(m)O(m), where mm is the length of the pattern.
o Search: O(n)O(n), where nn is the length of the text.
o Thus, the overall time complexity is O(n+m)O(n + m) for a single pattern search.
Space Complexity: O(m)O(m) for storing the pattern and hash values.
Test with Multiple Patterns and Large Text:
Output:
Pattern 'GATC' found at indices: [0, 4, 40, 56]
Pattern 'AGCT' found at indices: [6, 14, 46, 62]
Pattern 'CGAG' found at indices: [10, 50]
Pattern 'TACG' found at indices: [8, 42]
Pattern 'CTAG' found at indices: [12, 54]
Analysis:
The Rabin-Karp algorithm efficiently finds the occurrences of each pattern by using
hash values.
The rolling hash allows the algorithm to move through the text in linear time without
rehashing the entire substring for every window, which is a significant improvement over
brute-force approaches.
Even with large texts and multiple patterns, the algorithm remains efficient.
Conclusion:
The Rabin-Karp algorithm is well-suited for searching multiple patterns in large texts,
especially when hash collisions are minimal. While it is not as fast as other algorithms like KMP
or Boyer-Moore in the worst case (due to hash collisions), its simplicity and ability to handle
multiple patterns make it an effective choice for certain use cases.