Final Report SOS Long
Final Report SOS Long
Final Report
Varun Luhadia
July 29, 2024
CONTENTS
Abstract & Timeline 4
Chapter 1: Basics of Programming 5
1.1 Data Types 5
1. Definition 5
2. Importance 5
3. Basic Data Types 5
4. Complex Data Types 5
5. Application in Algorithms 6
1.2 If-Else Condition Statements 6
1. Definition 6
2. Importance 6
3. Basic Syntax 7
4. Use Cases 7
1.3 For and While Loops 7
1. Definition 7
2. Importance 7
3. Types of Loops 7
4. Basic Syntax 7
5. Use Cases 8
6. Implementation Code in C 8
1.4 Functions 8
1. Definition 8
2. Importance 8
3. Basic Syntax 8
4. Example in C 8
1.5 Arrays 9
1. Definition 9
2. Importance 9
3. Types of Arrays 9
4. Usage and Operations 9
5. Example Code in C 10
1.6 Strings 10
1. Initialization 10
2. Input and Output 10
3. Length Calculation 10
4. String Copying 10
5. String Concatenation 10
6. String Comparison 10
7. Substring Extraction 10
8. Searching 11
9. Tokenization 11
10. Case Conversion 11
11. String Formatting 11
12. Memory Management 11
13. Error Handling 11
1.7 Pointers 11
1
1. Definition 11
2. Importance 11
3. Basic Operations 11
4. Pointer Arithmetic 12
5. Arrays and Pointers 12
6. Dynamic Memory Allocation 12
7. Example Code in C 12
1.8 Sorting Algorithms 13
1. Bubble Sort 13
2. Insertion Sort 13
3. Merge Sort 14
4. Quick Sort 15
Chapter 2: Recursion, Backtracking & OOPS 17
2.1 Recursion 17
1. Definition 17
2. Types of Recursion 17
3. Importance 17
4. Example Code in C 17
2.2 Backtracking 18
1. Definition 18
2. Importance 18
3. Key Concepts 18
4. Implementation Example: N-Queens Problem 18
2.3 Object Oriented Programming System (OOPS) 19
1. Classes and Objects 20
2. Encapsulation 20
3. Inheritance 21
4. Polymorphism 21
5. Abstraction 22
Chapter 3: Linear Data Structures 23
3.1 Linked List 23
1. Introduction 23
2. Inserting and deleting nodes in a list 23
3. Output of the code 26
3.2 Stack 27
1. Definition 27
2. Types of Stack 27
3. Basic Operations on Stack 27
4. Implementation of Stack in C 28
3.3 Queue 30
1. Introduction 31
2. Types of Queue 31
3. Operating and managing a queue (Enqueue and Dequeue) 32
4. Output of the code 34
Chapter 4: Binary Tree & Heap 36
4.1 Binary Search Tree 36
1. Introduction 36
2. Implementation of Binary Search Tree in C 36
3. Output of the code 38
4. Explanation of the code 38
5. Duplicate Elimination 39
6. Binary Tree Search 39
7. Other Binary Tree Operations 39
4.2 Heap 39
1. Introduction 39
2
2. Types of Heaps 39
3. Heap Operations 39
4. Properties of Heap 40
5. Implementation of Heap in C++ 40
6. Time Complexity of Heap 44
7. Applications of Heaps 45
8. Advantages of Heaps 45
9. Disadvantages of Heaps 46
10. Comparison between Heap and Tree 46
11. Other Types of Heaps 47
Chapter 5: Hashing & Tries 49
5.1 Hashing 49
1. Introduction 49
2. Components of Hashing 49
3. Working of Hashing 49
4. Hash Function 50
5. Types of Hash Functions 50
6. Time and Space Complexity of a Hash Function 52
7. Collision in Hashing 52
8. Separate Chaining 52
9. Open Addressing 52
10. Load Factor in Hashing 56
11. Re-Hashing 57
12. Applications of Hash Data Structure 61
13. Advantages of Hash Data Structure 62
14. Disadvantages of Hash Data Structure 62
5.2 Tries 62
1. Introduction 62
2. Advantages of Trie over Hash Table 62
3. Properties of Trie 62
4. Working of Trie 63
5. Representation of Trie Node 63
6. Basic Operations on Trie 63
7. Implementation of Trie 66
8. Complexity Analysis of Trie 69
9. Applications of Trie 70
10. Advantages of Trie 70
11. Disadvantages of Trie 70
Chapter 6: Dynamic Programming & Greedy Algorithm 71
6.1 Dynamic Programming 71
1. Introduction 71
2. Working of DP 71
3. Characteristics of DP 71
4. Approaches of DP 72
5. Advantages of DP 72
6. Applications of DP 72
7. Difference between DP and Recursion 72
8. Difference between Tabulation and Memoization 73
9. Implementation of DP in C++ 74
10. Bitmasking and DP: Count ways to assign unique cap to every person 74
11. Sum over Subsets 77
12. DP Problem 1: Kadane’s Algorithm (Largest Sum Contiguous Subarray) 78
13. DP Problem 2: Assembly Line Scheduling Problem 80
6.2 Greedy Algorithm 82
1. Introduction 82
3
2. Steps for implementing a Greedy Algorithm 82
3. Characteristics of Greedy Algorithms 82
4. Difference between Greedy Algorithm and Dynamic Programming 83
5. Greedy Algorithm Problem 1: Fractional Knapsack Problem 83
6. Greedy Algorithm Problem 2: Huffman Coding 85
Chapter 7: Graphs 92
7.1 Introduction 92
7.2 Types of Graphs 92
7.3 Graph Representation 92
7.4 Graph Traversal Algorithms 93
7.5 Shortest Path Algorithms 93
7.6 Minimum Spanning Tree (MST) 93
7.7 Applications of Graph Theory 93
7.8 Advanced Topics 94
7.9 Complexity Analysis 94
7.10 Graphs Problem 1: Implementation of Kruskal’s Algorithm 94
7.11 Graphs Problem 2: Minimize the cash flow among a given set of friends who have borrowed money
from each other 96
Week Target
Week 1 Basics (Data Types, If-Else, Looping, Functions, Arrays, Strings, Pointers)
Week 5 Heap
Week 6 Hashmaps
4
Chapter 1
Basics of Programming
1.1 Data Types
1. Definition
● Data types specify the kind of data that can be stored and manipulated within a program.
2. Importance
● Integer (int):
○ Represents whole numbers without a fractional component.
○ Commonly used in counting, indexing, and loop iterations.
○ Example: 5, -3, 42.
● Floating Point (float):
○ Represents numbers with a fractional component.
○ Used in calculations involving precision, such as scientific computations.
○ Example: 3.14, -0.001, 2.0.
● Character (char):
○ Represents single characters.
○ Often used in algorithms that process textual data.
○ Example: 'a', 'Z', '#'.
● String (str):
○ Represents a sequence of characters.
○ Used in text manipulation, searching, and sorting algorithms.
○ Example: "Hello, World!", "Python123".
● Boolean (bool):
○ Represents truth values.
○ Crucial in decision-making processes in algorithms.
○ Example: True, False.
● Array:
○ Collection of elements of the same type, stored in contiguous memory locations.
○ Efficient for indexing and iterating over elements.
○ Example: [1, 2, 3], ['a', 'b', 'c'].
● List:
○ Ordered collection of elements, can contain mixed data types.
○ Dynamic in size, useful in algorithms requiring frequent insertion and deletion.
○ Example: [1, "two", 3.0, True].
● Tuple:
○ Ordered collection of elements, immutable.
○ Used for fixed collections of items.
○ Example: (1, 2, "three").
5
● Set:
○ Unordered collection of unique elements.
○ Useful in algorithms that require membership testing and eliminating duplicates.
○ Example: {1, 2, 3, 4}.
● Dictionary (dict):
○ Collection of key-value pairs.
○ Allows fast lookups, insertions, and deletions.
○ Example: {"name": "Alice", "age": 25}.
● Stack:
○ Collection of elements with Last In, First Out (LIFO) access pattern.
○ Used in algorithms for expression evaluation, backtracking, and function call
management.
○ Example operations: push, pop.
● Queue:
○ Collection of elements with First In, First Out (FIFO) access pattern.
○ Used in scheduling algorithms, breadth-first search (BFS).
○ Example operations: enqueue, dequeue.
● Tree:
○ Hierarchical data structure with nodes connected by edges.
○ Used in searching and sorting algorithms, such as binary search trees.
○ Example: binary tree, AVL tree.
● Graph:
○ Collection of nodes (vertices) and edges connecting pairs of nodes.
○ Used in network analysis, shortest path algorithms, and social network modeling.
○ Example: directed graph, undirected graph.
5. Application in Algorithms
● Sorting Algorithms:
○ Utilize arrays and lists to reorder elements based on specific criteria.
○ Example: Quick Sort, Merge Sort.
● Searching Algorithms:
○ Employ various data structures like arrays, trees, and graphs to locate specific
elements.
○ Example: Binary Search, Depth-First Search (DFS).
● Dynamic Programming:
○ Often uses arrays and matrices to store intermediate results for optimization
problems.
○ Example: Fibonacci sequence, Knapsack problem.
● Graph Algorithms:
○ Utilize graph data structures for traversal and pathfinding.
○ Example: Dijkstra's Algorithm, Kruskal's Algorithm.
● If-else statements are control structures used to execute different blocks of code based on
certain conditions.
2. Importance
6
3. Basic Syntax
● If Statement:
○ Evaluates a condition; if true, executes the associated block of code.
○ Example:
if (condition) {
// code to execute if condition is true
}
● Else Statement:
○ Follows an if statement; executes when the if condition is false.
○ Example:
if (condition) {
// code if condition is true
} else {
// code if condition is false
}
● Else If Statement:
○ Short for "else if"; checks additional conditions if previous if or else if conditions
are false.
○ Example:
if (condition1) {
// code if condition1 is true
} else if (condition2) {
// code if condition2 is true
} else {
// code if none of the above conditions are true
}
4. Use Cases
● Loops are control structures used to repeat a block of code multiple times.
2. Importance
3. Types of Loops
● For Loop:
○ Iterates a block of code a specific number of times.
○ Typically used when the number of iterations is known beforehand.
● While Loop:
○ Repeats a block of code as long as a specified condition is true.
○ Used when the number of iterations is not known in advance.
4. Basic Syntax
● For Loop:
7
○ Initializes a counter variable, checks a condition, and updates the counter after
each iteration.
● While Loop:
○ Continuously checks a condition before each iteration and stops when the
condition becomes false.
5. Use Cases
6. Implementation Code in C
int main() {
// For Loop Example
printf("For Loop:\n");
for (int i = 0; i < 5; i++) {
printf("Iteration %d\n", i);
}
// While Loop Example
printf("\nWhile Loop:\n");
int j = 0;
while (j < 5) {
printf("Iteration %d\n", j);
j++;
}
return 0;
}
1.4 Functions
1. Definition
● Functions are modular blocks of code that perform a specific task. They allow code
reusability and enhance program organization.
2. Importance
3. Basic Syntax
● Function Declaration:
○ Specifies the function name, return type, and parameters (if any).
● Function Definition:
○ Contains the actual implementation of the function.
● Function Call:
○ Executes the function’s code at a specific point in the program.
4. Example in C
#include <stdio.h>
// Function declaration
int add(int a, int b);
int main() {
int result = add(3, 5); // Function call
8
printf("Result: %d\n", result);
return 0;
}
// Function definition
int add(int a, int b) {
return a + b;
}
1.5 Arrays
1. Definition
2. Importance
● Arrays provide a convenient way to store and access multiple values of the same type
under a single name.
● They facilitate efficient data storage and retrieval, especially when dealing with a large
number of similar data items.
● Arrays are fundamental for implementing other data structures and algorithms, such as
lists, matrices, and sorting algorithms.
3. Types of Arrays
● One-dimensional Arrays:
○ Simplest form where elements are arranged in a single row or column.
○ Accessed using a single index.
○ Example:
int numbers[5]; // Declaration of an array of 5 integers
● Multi-dimensional Arrays:
○ Arrays of arrays where each element is itself an array.
○ Commonly used to represent matrices or tables.
○ Accessed using multiple indices.
○ Example:
int matrix[3][3]; // Declaration of a 3x3 matrix
● Initialization:
○ Arrays can be initialized at the time of declaration or later using assignment
statements.
● Accessing Elements:
○ Elements of an array are accessed using square brackets [] and the index of the
element.
○ Example:
int numbers[5] = {10, 20, 30, 40, 50};
printf("Third element: %d\n", numbers[2]); // Accessing the third element
● Iterating through Arrays:
○ Loops such as for and while are commonly used to iterate through arrays to
perform operations on each element.
● Manipulating Arrays:
○ Elements can be modified, sorted, or searched within an array using various
algorithms and techniques.
9
5. Example Code in C
#include <stdio.h>
int main() {
// Example of a one-dimensional array
int numbers[5] = {10, 20, 30, 40, 50};
// Accessing and printing elements of the array
printf("Elements of the array: ");
for (int i = 0; i < 5; i++) {
printf("%d ", numbers[i]);
}
printf("\n");
// Example of a multidimensional array (2D matrix)
int matrix[3][3] = {
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};
// Accessing and printing elements of the matrix
printf("Elements of the matrix:\n");
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
printf("%d ", matrix[i][j]);
}
printf("\n");
}
return 0;
}
1.6 Strings
1. Initialization
3. Length Calculation
4. String Copying
5. String Concatenation
6. String Comparison
● Compare strings using strcmp() to determine equality, greater than, or less than.
7. Substring Extraction
● Extract a substring from a string using strncpy() or manipulate using pointer arithmetic.
10
8. Searching
● Locate a substring within a string using strstr() or find individual characters using
strchr() or strrchr().
9. Tokenization
● Convert the case of characters within a string using toupper() and tolower().
● Allocate memory dynamically for strings using malloc(), calloc(), and manage it using
free().
● Handle errors during string operations using strerror() to retrieve error messages
associated with specific error codes.
1.7 Pointers
1. Definition
● A pointer is a variable that stores the memory address of another variable. Pointers are
declared using the * operator.
2. Importance
3. Basic Operations
● Declaration: Pointers are declared by specifying the type of data they point to, followed
by an asterisk (*). For example, int *p; declares a pointer to an integer.
● Initialization: Pointers are initialized by assigning them the address of a variable using
the address-of operator (&). For example, p = &x; assigns the address of x to p.
● Dereferencing: The value stored at the memory address pointed to by a pointer is
accessed using the dereference operator (*). For example, *p gives the value of x if p is
pointing to x.
11
4. Pointer Arithmetic
● Pointers support arithmetic operations like addition and subtraction. This is useful for
iterating through arrays. For example, incrementing a pointer (p++) moves it to the next
memory location of its type.
● Arrays and pointers are closely related. The name of an array is a pointer to its first
element. Accessing elements via pointers can be done using pointer arithmetic.
● Pointers are essential for dynamic memory allocation functions such as malloc(), calloc(),
realloc(), and free(). These functions manage memory at runtime, providing flexibility for
data structures whose size cannot be determined at compile time.
7. Example Code in C
#include <stdio.h>
#include <stdlib.h>
int main() {
// Pointer declaration and initialization
int x = 10;
int *p = &x;
return 0;
}
In this example:
Pointers are a powerful feature in C that provide significant control over memory and enable
efficient and flexible programming techniques. Understanding pointers is essential for advanced
C programming, especially in systems programming, embedded systems, and
performance-critical applications.
12
1.8 Sorting Algorithms
1. Bubble Sort
#include <stdio.h>
int main() {
int arr[] = {64, 34, 25, 12, 22, 11, 90};
int n = sizeof(arr)/sizeof(arr[0]);
bubbleSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}
2. Insertion Sort
#include <stdio.h>
// Move elements of arr[0..i-1], that are greater than key, to one position ahead
of their current position
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j = j - 1;
}
arr[j + 1] = key;
}
}
13
int main() {
int arr[] = {12, 11, 13, 5, 6};
int n = sizeof(arr) / sizeof(arr[0]);
insertionSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}
3. Merge Sort
#include <stdio.h>
#include <stdlib.h>
i = 0;
j = 0;
k = l;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = R[j];
j++;
}
k++;
}
mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);
14
merge(arr, l, m, r);
}
}
int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int arr_size = sizeof(arr) / sizeof(arr[0]);
4. Quick Sort
#include <stdio.h>
15
printf("%d ", arr[i]);
printf("\n");
}
int main() {
int arr[] = {10, 7, 8, 9, 1, 5};
int n = sizeof(arr) / sizeof(arr[0]);
quickSort(arr, 0, n - 1);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}
16
Chapter 2
2. Types of Recursion
● Functional Recursion:
○ Involves a function that calls itself to solve smaller instances of the same problem
until a base case is reached.
● Parameterized Recursion:
○ Extends functional recursion by passing additional parameters that change with
each recursive call, influencing the problem-solving process.
3. Importance
● Simplifies complex problems by breaking them down into smaller, more manageable
sub-problems.
● Facilitates elegant solutions in scenarios like tree traversal, factorial calculations, and
sorting algorithms.
4. Example Code in C
#include <stdio.h>
// Functional Recursion Example: Factorial
int factorial(int n) {
if (n <= 1)
return 1; // Base case: factorial of 0 and 1 is 1
else
return n * factorial(n - 1); // Recursive call
}
// Parameterized Recursion Example: Fibonacci Series
int fibonacci(int n, int a, int b) {
if (n == 0)
return a; // Base case: return first number of Fibonacci series
else if (n == 1)
return b; // Base case: return second number of Fibonacci series
else
return fibonacci(n - 1, b, a + b); // Recursive call with parameters updated
}
int main() {
// Example of Functional Recursion: Calculating Factorial
int num = 5;
printf("Factorial of %d is %d\n", num, factorial(num));
// Example of Parameterized Recursion: Generating Fibonacci Series
int terms = 7;
printf("Fibonacci series up to %d terms: ", terms);
for (int i = 0; i < terms; i++) {
printf("%d ", fibonacci(i, 0, 1));
}
17
printf("\n");
return 0;
}
2.2 Backtracking
1. Definition
2. Importance
● Useful for solving problems that involve finding a sequence of decisions or choices that
lead to a solution.
● Commonly used in constraint satisfaction problems like Sudoku, N-Queens, and graph
traversal.
3. Key Concepts
● Decision Space: Each step in backtracking involves making a decision and exploring all
possible choices.
● Recursion: The backtracking algorithm typically uses recursion to explore each decision
path.
● Backtracking: If a path does not lead to a solution, the algorithm backtracks to explore
other paths.
The N-Queens problem involves placing N queens on an N×N chessboard such that no two
queens threaten each other. Here’s an implementation of the N-Queens problem using
backtracking in C:
#include <stdio.h>
#include <stdbool.h>
18
for (int i = row, j = col; i >= 0 && j >= 0; i--, j--) {
if (board[i][j])
return false;
}
return true;
}
// Main function
int main() {
int board[N][N] = {0};
if (solveNQueens(board, 0)) {
printf("Solution found:\n");
printSolution(board);
} else {
printf("No solution exists for %dx%d board.\n", N, N);
}
return 0;
}
In this example:
19
1. Classes and Objects
● Definition: A class is a blueprint for creating objects that encapsulates data (attributes)
and behaviors (methods) into a single unit.
● Example:
// Class declaration
class Car {
private:
// Attributes
int speed;
string model;
public:
// Constructor
Car(int s, string m) : speed(s), model(m) {}
2. Encapsulation
● Definition: Encapsulation binds data (attributes) and methods (behaviors) that operate
on the data into a single unit (class), protecting the internal state of an object from
outside interference.
● Example:
class BankAccount {
private:
float balance;
public:
void deposit(float amount) {
balance += amount;
}
float getBalance() {
return balance;
}
};
20
3. Inheritance
● Definition: Inheritance allows one class (derived or child class) to inherit the attributes
and methods of another class (base or parent class), promoting code reuse and
establishing a hierarchy.
● Example:
// Base class
class Animal {
public:
void sound() {
cout << "Animal makes a sound" << endl;
}
};
// Derived class
class Dog : public Animal {
public:
void sound() {
cout << "Dog barks" << endl;
}
};
// Creating objects
Animal animal;
Dog dog;
// Calling methods
animal.sound(); // Output: Animal makes a sound
dog.sound(); // Output: Dog barks
4. Polymorphism
// Derived classes
class Circle : public Shape {
public:
void draw() override {
cout << "Drawing circle" << endl;
}
};
21
// Function using polymorphism
void drawShape(Shape *shape) {
shape->draw();
}
// Usage
Circle circle;
Rectangle rectangle;
drawShape(&circle); // Output: Drawing circle
drawShape(&rectangle); // Output: Drawing rectangle
5. Abstraction
// Concrete classes
class Circle : public Shape {
public:
void draw() override {
cout << "Drawing circle" << endl;
}
};
// Usage
Circle circle;
Rectangle rectangle;
circle.draw(); // Output: Drawing circle
rectangle.draw(); // Output: Drawing rectangle
22
Chapter 3
Linear Data Structures
3.1 Linked List
1. Introduction
A linked list is a linear collection of self-referential structures, called nodes, connected by pointer
links—hence, the term “linked” list. A linked list is accessed via a pointer to the first node of the
list. Subsequent nodes are accessed via the link pointer member stored in each node. By
convention, the link pointer in the last node of a list is set to NULL to mark the end of the list.
Data is stored in a linked list dynamically—each node is created as necessary. A node can contain
data of any type including other structs. Lists of data can be stored in arrays, but linked lists
provide several advantages. A linked list is appropriate when the number of data elements to be
represented in the data structure is unpredictable. Linked lists are dynamic, so the length of a
list can increase or decrease at execution time as necessary. The size of an array created at
compile time, however, cannot be altered. Arrays can become full. Linked lists become full only
when the system has insufficient memory to satisfy dynamic storage allocation requests.
// self-referential structure
struct listNode {
char data; // each listNode contains a character
struct listNode *nextPtr; // pointer to next node
};
// prototypes
void insert(ListNodePtr *sPtr, char value);
char delete(ListNodePtr *sPtr, char value);
int isEmpty(ListNodePtr sPtr);
void printList(ListNodePtr currentPtr);
void instructions(void);
int main(void) {
ListNodePtr startPtr = NULL; // initially there are no nodes
char item; // char entered by user
23
scanf("\n%c", &item);
insert(&startPtr, item); // insert item in list
printList(startPtr);
break;
case 2: // delete an element
// if list is not empty
if (!isEmpty(startPtr)) {
printf("%s", "Enter character to be deleted: ");
scanf("\n%c", &item);
puts("End of run.");
}
24
}
return '\0';
}
25
else {
puts("The list is:");
puts("NULL\n");
}
}
? 1
Enter a character: A
The list is:
A --> B --> NULL
? 1
Enter a character: C
The list is:
A --> B --> C --> NULL
? 2
Enter character to be deleted: D
D not found.
? 2
Enter character to be deleted: B
B deleted.
The list is:
A --> C --> NULL
? 2
Enter character to be deleted: C
C deleted.
The list is:
A --> NULL
? 2
Enter character to be deleted: A
A deleted.
List is empty.
? 4
Invalid choice.
26
End of run.
3.2 Stack
1. Definition
● Stack is a linear data structure based on LIFO Principle (Last In First Out) in which the
insertion of a new element and removal of an existing element takes place at the same
end represented as the top of the stack.
● To implement the stack, it is required to maintain the pointer to the top of the stack ,
which is the last element to be inserted because we can access the elements only on the
top of the stack.
● LIFO(Last In First Out) Principle in Stack Data Structure:
This strategy states that the element that is inserted last will come out first. You can
take a pile of plates kept on top of each other as a real-life example. The plate which we
put last is on the top and since we remove the plate that is at the top, we can say that the
plate that was put last comes out first.
2. Types of Stack
● Fixed Size Stack : As the name suggests, a fixed size stack has a fixed size and cannot
grow or shrink dynamically. If the stack is full and an attempt is made to add an element
to it, an overflow error occurs. If the stack is empty and an attempt is made to remove an
element from it, an underflow error occurs.
● Dynamic Size Stack : A dynamic size stack can grow or shrink dynamically. When the
stack is full, it automatically increases its size to accommodate the new element, and
when the stack is empty, it decreases its size. This type of stack is implemented using a
linked list, as it allows for easy resizing of the stack.
In order to make manipulations in a stack, there are certain operations provided to us.
● push() to insert an element into the stack
● pop() to remove an element from the stack
● top() Returns the top element of the stack.
● isEmpty() returns true if stack is empty else false.
● isFull() returns true if the stack is full else false.
(1) Push Operation in Stack Data Structure:
● Adds an item to the stack. If the stack is full, then it is said to be an Overflow condition.
● Algorithm for Push Operation:
Before pushing the element to the stack, we check if the stack is full .
If the stack is full (top == capacity-1) , then Stack Overflows and we cannot insert the
element to the stack.
Otherwise, we increment the value of top by 1 (top = top + 1) and the new value is
inserted at top position .
The elements can be pushed into the stack till we reach the capacity of the stack.
(2) Pop Operation in Stack Data Structure:
● Removes an item from the stack. The items are popped in the reversed order in which
they are pushed. If the stack is empty, then it is said to be an Underflow condition.
● Algorithm for Pop Operation:
Before popping the element from the stack, we check if the stack is empty .
If the stack is empty (top == -1), then Stack Underflows and we cannot remove any
element from the stack.
Otherwise, we store the value at top, decrement the value of top by 1 (top = top – 1) and
return the stored top value.
27
(3) Top or Peek Operation in Stack Data Structure:
● Returns the top element of the stack.
● Algorithm for Top Operation:
Before returning the top element from the stack, we check if the stack is empty.
If the stack is empty (top == -1), we simply print “Stack is empty”.
Otherwise, we return the element stored at index = top .
(4) isEmpty Operation in Stack Data Structure:
● Returns true if the stack is empty, else false.
● Algorithm for isEmpty Operation:
Check for the value of top in stack.
If (top == -1) , then the stack is empty so return true.
Otherwise, the stack is not empty so return false.
(5) isFull Operation in Stack Data Structure:
● Returns true if the stack is full, else false.
● Algorithm for isFull Operation:
Check for the value of top in stack.
If (top == capacity-1), then the stack is full so return true.
Otherwise, the stack is not full so return false.
4. Implementation of Stack in C
The basic operations that can be performed on a stack include push, pop, and peek. There are two
ways to implement a stack –
● Using Array
● Using Linked List
Code in C (Array Implementation):
// C program for array implementation of stack
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
28
// Function to add an item to stack. It increases top by 1
void push(struct Stack* stack, int item)
{
if (isFull(stack))
return;
stack->array[++stack->top] = item;
printf("%d pushed to stack\n", item);
}
push(stack, 10);
push(stack, 20);
push(stack, 30);
return 0;
}
Output:
10 pushed into stack
20 pushed into stack
30 pushed into stack
30 Popped from stack
Top element is : 20
Elements present in stack : 20 10
Code in C (Linked List Implementation):
// C program for linked list implementation of stack
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
29
stackNode->data = data;
stackNode->next = NULL;
return stackNode;
}
return popped;
}
int main()
{
struct StackNode* root = NULL;
push(&root, 10);
push(&root, 20);
push(&root, 30);
return 0;
}
Output:
10 pushed to stack
20 pushed to stack
30 pushed to stack
30 popped from stack
Top element is 20
Elements present in stack : 20 10
3.3 Queue
30
1. Introduction
A queue is similar to a checkout line in a grocery store—the first person in line is serviced first,
and other customers enter the line only at the end and wait to be serviced. Queue nodes are
removed only from the head of the queue and are inserted only at the tail of the queue. For this
reason, a queue is referred to as a first-in, first-out (FIFO) data structure. The insert and remove
operations are known as enqueue (pronounced “en-cue”) and dequeue (pronounced “dee-cue”),
respectively. Queues have many applications in computer systems. For computers that have only
a single processor, only one user at a time may be serviced. Entries for the other users are placed
in a queue. Each entry gradually advances to the front of the queue as users receive service. The
entry at the front of the queue is the next to receive service. Similarly, for today’s multicore
systems, there could be more users than there are processors, so the users not currently running
are placed in a queue until a currently busy processor becomes available.
2. Types of Queue
// self-referential structure
struct queueNode {
char data; // define data as a char
struct queueNode *nextPtr; // queueNode pointer
};
// function prototypes
void printQueue(QueueNodePtr currentPtr);
int isEmpty(QueueNodePtr headPtr);
char dequeue(QueueNodePtr *headPtr, QueueNodePtr *tailPtr);
void enqueue(QueueNodePtr *headPtr, QueueNodePtr *tailPtr, char value);
void instructions(void);
32
else {
puts("Queue is empty.\n");
}
break;
default:
puts("Invalid choice.\n");
instructions();
break;
}
puts("End of run.");
}
*tailPtr = newPtr;
}
else {
printf("%c not inserted. No memory available.\n", value);
}
}
value = (*headPtr)->data;
tempPtr = *headPtr;
*headPtr = (*headPtr)->nextPtr;
// if queue is empty
if (*headPtr == NULL) {
*tailPtr = NULL;
33
}
puts("NULL\n");
}
}
? 1
Enter a character: B
The queue is:
A --> B --> NULL
? 1
Enter a character: C
The queue is:
A --> B --> C --> NULL
? 2
A has been dequeued.
The queue is:
B --> C --> NULL
? 2
B has been dequeued.
The queue is:
C --> NULL
34
? 2
C has been dequeued.
Queue is empty.
? 2
Queue is empty.
? 4
Invalid choice.
35
Chapter 4
Binary Tree & Heap
4.1 Binary Search Tree
1. Introduction
Linked lists, stacks, and queues are examples of linear data structures. In contrast, a tree is a
nonlinear, two-dimensional data structure with distinct properties. Tree nodes have two or more
links, and this section focuses on binary trees. Binary trees are a type of tree where each node
contains two links, which may be NULL, partially filled, or fully filled.
The root node, the starting point of a tree, has links that refer to its child nodes. The left child is
the initial node of the left subtree, and the right child is the initial node of the right subtree.
Nodes with the same parent are known as siblings. A node without any children is termed a leaf
node. Unlike natural trees, computer scientists typically depict trees with the root node at the
top.
This section specifically addresses the creation of a binary search tree (BST), a special type of
binary tree. In a binary search tree, each node follows the rule that all values in the left subtree
are less than the node’s value, and all values in the right subtree are greater. Additionally, binary
search trees do not contain duplicate values. The structure of a binary search tree for a given set
of data can vary based on the sequence in which values are inserted into the tree.
The code creates a binary search tree and traverses it three ways—inorder, preorder and
postorder. The program generates 10 random numbers and inserts each in the tree, except that
duplicate values are discarded.
// self-referential structure
struct treeNode {
struct treeNode *leftPtr; // pointer to left subtree
int data; // node value
struct treeNode *rightPtr; // pointer to right subtree
};
// prototypes
void insertNode(TreeNodePtr *treePtr, int value);
void inOrder(TreeNodePtr treePtr);
void preOrder(TreeNodePtr treePtr);
void postOrder(TreeNodePtr treePtr);
36
srand(time(NULL));
puts("The numbers being placed in the tree are:");
return 0;
}
37
}
}
Function insertNode:
The functions used in Fig. 12.19 to create a binary search tree and traverse it are recursive.
Function insertNode (lines 53–83) receives the address of the tree and an integer to be stored in
the tree as arguments. A node can be inserted only as a leaf node in a binary search tree. The
steps for inserting a node in a binary search tree are as follows:
1. If *treePtr is NULL (line 56), create a new node (line 57). Call malloc, assign the allocated
memory to *treePtr, assign to (*treePtr)->data the integer to be stored (line 61), assign to
(*treePtr)->leftPtr and (*treePtr)->rightPtr the value NULL (lines 62–63, and return control to
the caller (either main or a previous call to insertNode).
2. If the value of *treePtr is not NULL and the value to be inserted is less than (*treePtr)->data,
function insertNode is called with the address of (*treePtr)->leftPtr (line 72) to insert the node in
the left subtree of the node pointed to by treePtr. If the value to be inserted is greater than
(*treePtr)-> data, function insertNode is called with the address of (*treePtr)->rightPtr (line 77)
to insert the node in the right subtree of the node pointed to by treePtr.
The recursive steps continue until a NULL pointer is found, then Step 1 inserts the new node.
38
The inOrder traversal of a binary search tree prints the node values in ascending order. The
process of creating a binary search tree actually sorts the data—and thus this process is called
the binary tree sort.
The steps for a preOrder traversal are:
1. Process the value in the node.
2. Traverse the left subtree preOrder.
3. Traverse the right subtree preOrder.
5. Duplicate Elimination
The binary search tree facilitates duplicate elimination. As the tree is being created, an attempt
to insert a duplicate value will be recognized because a duplicate will follow the same “go left” or
“go right” decisions on each comparison as the original value did. Thus, the duplicate will
eventually be compared with a node in the tree containing the same value. The duplicate value
may simply be discarded at this point.
Searching a binary tree for a value that matches a key value is also fast. If the tree is tightly
packed, each level contains about twice as many elements as the previous level. So a binary
search tree with n elements would have a maximum of log2n levels, and thus a maximum of
log2n comparisons would have to be made either to find a match or to determine that no match
exists. This means, for example, that when searching a (tightly packed) 1,000- element binary
search tree, no more than 10 comparisons need to be made because 210 > 1,000. When searching
a (tightly packed) 1,000,000-element binary search tree, no more than 20 comparisons need to be
made because 220 > 1,000,000.
Algorithms are presented for several other binary tree operations such as printing a binary tree
in a two-dimensional tree format and performing a level order traversal of a binary tree. The
level order traversal visits the nodes of the tree row-by-row starting at the root node level. On
each level of the tree, the nodes are visited from left to right. Other binary tree algorithms
include allowing a binary search tree to contain duplicate values, inserting string values in a
binary tree and determining how many levels are contained in a binary tree
4.2 Heap
1. Introduction
A heap is a binary tree-based data structure that satisfies the heap property: the value of each
node is greater than or equal to the value of its children. This property makes sure that the root
node contains the maximum or minimum value (depending on the type of heap), and the values
decrease or increase as you move down the tree.
2. Types of Heaps
3. Heap Operations
39
● Complete Binary Tree: A heap tree is a complete binary tree, meaning all levels of the tree
are fully filled except possibly the last level, which is filled from left to right. This property
ensures that the tree is efficiently represented using an array.
● Heap Property: This property ensures that the minimum (or maximum) element is always at
the root of the tree according to the heap type.
● Parent-Child Relationship: The relationship between a parent node at index ‘i’ and its
children is given by the formulas: left child at index 2i+1 and right child at index 2i+2 for
0-based indexing of node numbers.
● Efficient Insertion and Removal: Insertion and removal operations in heap trees are efficient.
New elements are inserted at the next available position in the bottom-rightmost level, and
the heap property is restored by comparing the element with its parent and swapping if
necessary. Removal of the root element involves replacing it with the last element and
heapifying down.
● Efficient Access to Extremal Elements: The minimum or maximum element is always at the
root of the heap, allowing constant-time access.
4. Properties of Heap
Heapify:
It is the process to rearrange the elements to maintain the property of heap data structure. It is
done when a certain node creates an imbalance in the heap due to some operations on that node.
It takes O(log N) to balance the tree.
● For max-heap, it balances in such a way that the maximum element is the root of that binary
tree and
● For min-heap, it balances in such a way that the minimum element is the root of that binary
tree.
Insertion:
● If we insert a new element into the heap, it will distort the properties of the heap. Therefore,
we need to perform the heapify operation to maintain the heap's properties.
This operation also takes O(logN) time.
Deletion:
● If we delete the element from the heap it always deletes the root element of the tree and
replaces it with the last element of the tree.
● Since we delete the root element from the heap it will distort the properties of the heap so we
need to perform heapify operations so that it maintains the property of the heap.
It takes O(logN) time.
getMax (For max-heap) OR getMin (For min-heap):
It finds the maximum element or minimum element for max-heap and min-heap respectively and
as we know minimum and maximum elements will always be the root node itself for min-heap
and max-heap respectively. It takes O(1) time.
removeMin or removeMax:
This operation returns and deletes the maximum element and minimum element from the
max-heap and min-heap respectively. In short, it deletes the root element of the heap binary tree.
maxHeapify is the function responsible for restoring the property of the Max Heap. It arranges
the node i, and its subtrees accordingly so that the heap property is maintained.
1. Suppose we are given an array, arr[] representing the complete binary tree. The left and the
right child of ith node are in indices 2*i+1 and 2*i+2.
2. We set the index of the current element, i, as the ‘MAXIMUM’.
3. If arr[2 * i + 1] > arr[i], i.e., the left child is larger than the current value, it is set as
‘MAXIMUM’.
4. Similarly if arr[2 * i + 2] > arr[i], i.e., the right child is larger than the current value, it is set
as ‘MAXIMUM’.
5. Swap the ‘MAXIMUM’ with the current element.
40
6. Repeat steps 2 to 5 till the property of the heap is restored.
The following code shows the implementation of a max-heap.
#include <bits/stdc++.h>
using namespace std;
public:
// Constructor function.
MaxHeap(int maxSize);
41
}
int curSize()
{
return heapSize;
}
42
{
// Checking whether the heap array
// is empty or not.
if (heapSize <= 0)
return INT_MIN;
if (heapSize == 1) {
heapSize--;
return arr[0];
}
return root;
}
43
h.insertKey(8);
h.insertKey(2);
h.insertKey(14);
return 0;
}
44
For finding the Time Complexity of building a heap, we must know the number of nodes having
𝑛
height h. For this we use the fact that, A heap of size n has at most ℎ+1 nodes with height h.
2
To derive the time complexity, we express the total cost of Build-Heap as-
𝑙𝑜𝑔(𝑛) ∞
𝑛 ℎ
𝑇(𝑛) = ∑ ℎ+1 * 𝑂(ℎ) = 𝑂(𝑛 * ∑ ℎ )
ℎ=0 2 ℎ=0 2
Step 2 uses the properties of the Big-Oh notation to ignore the ceiling function and the constant
ℎ+1
2(2 ). Similarly in Step 3, the upper limit of the summation can be increased to infinity since
we are using Big-Oh notation. Sum of infinite G.P. (x < 1)
∞
𝑛 1
∑ 𝑥 = 1−𝑥
𝑛=0
On differentiating both sides and multiplying by x, we get
∞
𝑛 𝑥
∑ 𝑛𝑥 = 2
𝑛=0 (1 − 𝑥)
Putting the result obtained in (3) back in our derivation (1), we get
1
2
𝑇(𝑛) = 𝑂(𝑛 * 1 2
) = 𝑂(2𝑛) = 𝑂(𝑛)
(1 − 2
)
Hence proved that the Time complexity for Building a Binary Heap is O(n).
7. Applications of Heaps
● Priority queues: The heap data structure is commonly used to implement priority queues,
where elements are stored in a heap and ordered based on their priority. This allows
constant-time access to the highest-priority element, making it an efficient data structure for
managing tasks or events that require prioritization.
● Heapsort algorithm: The heap data structure is the basis for the heapsort algorithm, which is
an efficient sorting algorithm with a worst-case time complexity of O(n log n). The heapsort
algorithm is used in various applications, including database indexing and numerical
analysis.
● Memory management: The heap data structure is used in memory management systems to
allocate and deallocate memory dynamically. The heap is used to store the memory blocks,
and the heap data structure is used to efficiently manage the memory blocks and allocate
them to programs as needed.
● Graph algorithms: The heap data structure is used in various graph algorithms, including
Dijkstra’s algorithm, Prim’s algorithm, and Kruskal’s algorithm. These algorithms require
efficient priority queue implementation, which can be achieved using the heap data
structure.
● Job scheduling: The heap data structure is used in job scheduling algorithms, where tasks
are scheduled based on their priority or deadline. The heap data structure allows efficient
access to the highest-priority task, making it a useful data structure for job scheduling
applications.
Real-Time Application of Heap:
● Patient treatment: In a hospital, an emergency patient, or the patient with more injury is
treated first. Here the priority is the degree of injury.
● Systems concerned with security use heap sort, like the Linux kernel.
8. Advantages of Heaps
45
● Efficient priority queue:
The heap data structure is commonly used to implement a priority queue, where the highest
priority element is always at the top of the heap. The heap allows constant-time access to the
highest priority element, making it an efficient data structure for implementing priority
queues.
● Guaranteed access to the maximum or minimum element:
In a max-heap, the top element is always the maximum element, and in a min-heap, the top
element is always the minimum element. This provides guaranteed access to the maximum
or minimum element in the heap, making it useful in algorithms that require access to the
extreme values.
● Space efficiency:
The heap data structure requires less memory compared to other data structures, such as
linked lists or arrays, as it stores elements in a complete binary tree structure.
● Heap-sort algorithm:
The heap data structure forms the basis for the heap-sort algorithm, which is an efficient
sorting algorithm that has a worst-case time complexity of O(n log n).
9. Disadvantages of Heaps
● Lack of flexibility: The heap data structure is not very flexible, as it is designed to maintain a
specific order of elements. This means that it may not be suitable for some applications that
require more flexible data structures.
● Not ideal for searching: While the heap data structure allows efficient access to the top
element, it is not ideal for searching for a specific element in the heap. Searching for an
element in a heap requires traversing the entire tree, which has a time complexity of O(n).
● Not a stable data structure: The heap data structure is not a stable data structure, which
means that the relative order of equal elements may not be preserved when the heap is
constructed or modified.
● Memory management: The heap data structure requires dynamic memory allocation, which
can be a challenge in some systems with limited memory. In addition, managing the memory
allocated to the heap can be complex and error-prone.
● Complexity: While the heap data structure allows efficient insertion, deletion, and priority
queue implementation, it has a worst-case time complexity of O(n log n), which may not be
optimal for some applications that require faster algorithms.
S. No Heap Tree
46
Finding Min/Max value in Heap is Finding Min/Max value in BST is
5
O(1) in the respective Min/Max heap. O(log(N)) and Binary Tree is O(N).
Heap can be built in linear time BST: O(N * log(N)) and Binary Tree:
7
complexity. O(N).
Binomial Heap:
A Binomial Tree of order 0 has 1 node. A Binomial Tree of order k can be constructed by taking
two binomial trees of order k-1 and making one the leftmost child of the other.
A Binomial Tree of order k has the following properties.
● It has exactly 2k nodes.
● It has depth as k.
● There are exactly kaiCi nodes at depth i for i = 0, 1, . . . , k.
● The root has degree k and children of the root are themselves Binomial Trees with order k-1,
k-2,.. 0 from left to right.
Fibonacci Heap:
A Fibonacci heap is a data structure used for implementing priority queues. It is a type of heap
data structure, but with several improvements over the traditional binary heap and binomial
heap data structures.
The key advantage of a Fibonacci heap over other heap data structures is its fast amortized
running time for operations such as insert, merge and extract-min, making it one of the most
efficient data structures for these operations. The running time of these operations in a Fibonacci
heap is O(1) for insert, O(log n) for extract-min and O(1) amortized for merge.
A Fibonacci heap is a collection of trees, where each tree is a heap-ordered multi-tree, meaning
that each tree has a single root node with its children arranged in a heap-ordered manner. The
trees in a Fibonacci heap are organized in such a way that the root node with the smallest key is
always at the front of the list of trees.
In a Fibonacci heap, when a new element is inserted, it is added as a singleton tree. When two
heaps are merged, the root list of one heap is simply appended to the root list of the other heap.
When the extract-min operation is performed, the tree with the minimum root node is removed
from the root list and its children are added to the root list.
One unique feature of a Fibonacci heap is the use of lazy consolidation, which is a technique for
improving the efficiency of the merge operation. In lazy consolidation, the merging of trees is
postponed until it is necessary, rather than performed immediately. This allows for the merging
of trees to be performed more efficiently in batches, rather than one at a time.
Leftist Heap:
A leftist tree, also known as a leftist heap, is a type of binary heap data structure used for
implementing priority queues. Like other heap data structures, it is a complete binary tree,
meaning that all levels are fully filled except possibly the last level, which is filled from left to
right.
1. In a leftist tree, the priority of the node is determined by its key value, and the node with the
smallest key value is designated as the root node. The left subtree of a node in a leftist tree is
always larger than the right subtree, based on the number of nodes in each subtree. This is
known as the “leftist property.”
47
2. One of the key features of a leftist tree is the calculation and maintenance of the “null path
length” of each node, which is defined as the distance from the node to the nearest null
(empty) child. The root node of a leftist tree has the shortest null path length of any node in
the tree.
3. The main operations performed on a leftist tree include insert, extract-min and merge. The
insert operation simply adds a new node to the tree, while the extract-min operation removes
the root node and updates the tree structure to maintain the leftist property. The merge
operation combines two leftist trees into a single leftist tree by linking the root nodes and
maintaining the leftist property.
In summary, a leftist tree is a type of binary heap data structure used for implementing priority
queues. Its key features include the leftist property, which ensures that the left subtree of a node
is always larger than the right subtree, and the calculation and maintenance of the null path
length, which is used to maintain the efficiency of operations such as extract-min and merge.
K-ary Heap:
K-ary heaps are a generalization of binary heaps (K=2) in which each node has K children
instead of 2. Just like binary heap, it follows two properties:
1. Nearly complete binary tree, with all levels having maximum number of nodes except the
last, which is filled in left to right manner.
2. Like Binary Heap, it can be divided into two categories:
○ Max k-ary heap (key at root is greater than all descendants and same is recursively true
for all nodes).
○ Min k-ary heap (key at root is lesser than all descendants and same is recursively true for
all nodes)
48
Chapter 5
Hashing & Tries
5.1 Hashing
1. Introduction
Hashing in Data Structures refers to the process of transforming a given key to another value. It
involves mapping data to a specific index in a hash table using a hash function that enables fast
retrieval of information based on its key. The transformation of a key to the corresponding value
is done using a Hash Function and the value obtained from the hash function is called Hash
Code.
Need for Hash Data Structure
Every day, the data on the internet is increasing multifold and it is always a struggle to store this
data efficiently. In day-to-day programming, this amount of data might not be that big, but still,
it needs to be stored, accessed, and processed easily and efficiently. A very common data
structure that is used for such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new data structure!
The answer to this is in the word ” efficiency “. Though storing in Array takes O(1) time,
searching in it takes at least O(log n) time. This time appears to be small, but for a large data set,
it can cause a lot of problems and this, in turn, makes the Array data structure inefficient.
So now we are looking for a data structure that can store the data and search in it in constant
time, i.e. in O(1) time. This is how Hashing data structure came into play. With the introduction
of the Hash data structure, it is now possible to easily store data in constant time and retrieve it
in constant time as well.
2. Components of Hashing
3. Working of Hashing
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a table.
Our main objective here is to search or update the values stored in the table quickly in O(1) time
and we are not concerned about the ordering of strings in the table. So the given set of strings
can act as a key and the string itself will act as the value of the string but how to store the value
corresponding to the key?
● Step 1: We know that hash functions (which is some mathematical formula) are used to
calculate the hash value which acts as the index of the data structure where the value will be
stored.
● Step 2: So, let’s assign
○ “a” = 1,
○ “b”=2, .. etc, to all alphabetical characters.
● Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,
49
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
● Step 4: Now, assume that we have a table of size 7 to store these strings. The hash function
that is used here is the sum of the characters in key mod Table size . We can compute the
location of the string in the array by taking the sum(string) mod 7 .
● Step 5: So we will then store
○ “ab” in 3 mod 7 = 3,
○ “cd” in 7 mod 7 = 0, and
○ “efg” in 18 mod 7 = 4.
The above technique enables us to calculate the location of a given string by using a simple hash
function and rapidly find the value that is stored in that location. Therefore the idea of hashing
seems like a great way to store (key, value) pairs of the data in a table.
4. Hash Function
The hash function creates a mapping between key and value, this is done through the use of
mathematical formulas known as hash functions. The result of the hash function is referred to as
a hash value or hash. The hash value is a representation of the original string of characters but
usually smaller than the original.
For example: Consider an array as a Map where the key is the index and the value is the value at
that index. So for an array A if we have index i which will be treated as the key then we can find
the value by simply looking at the value at A[i].
Key Properties of Hash Function
● Deterministic: A hash function must consistently produce the same output for the same
input.
● Fixed Output Size: The output of a hash function should have a fixed size, regardless of the
size of the input.
● Efficiency: The hash function should be able to process input quickly.
● Uniformity: The hash function should distribute the hash values uniformly across the output
space to avoid clustering.
● Preimage Resistance: It should be computationally infeasible to reverse the hash function,
i.e., to find the original input given a hash value.
● Collision Resistance: It should be difficult to find two different inputs that produce the same
hash value.
● Avalanche Effect: A small change in the input should produce a significantly different hash
value.
Properties of Good Hash Function
A good hash function should have the following properties:
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position is equally likely for each.
3. Should minimize collisions.
4. Should have a low load factor(number of items in the table divided by the size of the table).
1. Division Method
The division method involves dividing the key by a prime number and using the remainder as the
hash value.
h(k) = k mod m
Where k is the key and m is a prime number.
Advantages:
● Simple to implement.
● Works well when m is a prime number.
Disadvantages:
● Poor distribution if m is not chosen wisely.
2. Multiplication Method
50
In the multiplication method, a constant A (0 < A < 1) is used to multiply the key. The fractional
part of the product is then multiplied by 𝑚m to get the hash value.
h(k)=⌊m(kAmod1)⌋
Where ⌊ ⌋ denotes the floor function.
Advantages:
● Less sensitive to the choice of m.
Disadvantages:
● More complex than the division method.
3. Mid-Square Method
In the mid-square method, the key is squared, and the middle digits of the result are taken as the
hash value.
Steps:
1. Square the key.
2. Extract the middle digits of the squared value.
Advantages:
● Produces a good distribution of hash values.
Disadvantages:
● May require more computational effort.
4. Folding Method
The folding method involves dividing the key into equal parts, summing the parts, and then
taking the modulo with respect to 𝑚m.
Steps:
1. Divide the key into parts.
2. Sum the parts.
3. Take the modulo 𝑚m of the sum.
Advantages:
● Simple and easy to implement.
Disadvantages:
● Depends on the choice of partitioning scheme.
5. Cryptographic Hash Functions
Cryptographic hash functions are designed to be secure and are used in cryptography. Examples
include MD5, SHA-1, and SHA-256.
Characteristics:
● Preimage resistance.
● Second pre-image resistance.
● Collision resistance.
Advantages:
● High security.
Disadvantages:
● Computationally intensive.
6. Universal Hashing
Universal hashing uses a family of hash functions to minimize the chance of collision for any
given set of inputs.
h(k)=((a⋅k+b)modp)modm
Where a and b are randomly chosen constants, p is a prime number greater than m, and k is the
key.
Advantages:
● Reduces the probability of collisions.
Disadvantages:
● Requires more computation and storage.
7. Perfect Hashing
Perfect hashing aims to create a collision-free hash function for a static set of keys. It guarantees
that no two keys will hash to the same value.
Types:
51
● Minimal Perfect Hashing: Ensures that the range of the hash function is equal to the
number of keys.
● Non-minimal Perfect Hashing: The range may be larger than the number of keys.
Advantages:
● No collisions.
Disadvantages:
● Complex to construct.
7. Collision in Hashing
Collision in Hashing occurs when two different keys map to the same hash value. Hash collisions
can be intentionally created for many hash algorithms. The probability of a hash collision
depends on the size of the algorithm, the distribution of hash values and the efficiency of the
Hash function.
The hashing process generates a small number for a big key, so there is a possibility that two
keys could produce the same value. The situation where the newly inserted key maps to an
already occupied, and it must be handled using some collision handling technology.
How to handle Collisions?
There are mainly two methods to handle collision:
1. Separate Chaining
2. Open Addressing
8. Separate Chaining
The idea behind separate chaining is to implement the array as a linked list called a chain.
The linked list data structure is used to implement this technique. So what happens is, when
multiple elements are hashed into the same slot index, then these elements are inserted into a
singly-linked list which is known as a chain.
Here, all those elements that hash into the same slot index are inserted into a linked list. Now,
we can use a key K to search in the linked list by just linearly traversing. If the intrinsic key for
any entry is equal to K then it means that we have found our entry. If we have reached the end of
the linked list and yet we haven’t found our entry then it means that the entry does not exist.
Hence, the conclusion is that in separate chaining, if two different elements have the same hash
value then we store both the elements in the same linked list one after the other.
Performance of Chaining:
Performance of hashing can be evaluated under the assumption that each key is equally likely to
be hashed to any slot of the table (simple uniform hashing).
m = Number of slots in hash table
n = Number of keys to be inserted in hash table
Load factor α = n/m
Expected time to search = O(1 + α)
Expected time to delete = O(1 + α)
Time to insert = O(1)
Time complexity of search insert and delete is O(1) if α is O(1)
9. Open Addressing
Open Addressing is a method for handling collisions. In Open Addressing, all elements are stored
in the hash table itself. So at any point, the size of the table must be greater than or equal to the
total number of keys (Note that we can increase table size by copying old data if needed). This
52
approach is also known as closed hashing. This entire procedure is based upon probing. We will
understand the types of probing ahead:
● Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.
● Search(k): Keep probing until the slot’s key doesn’t become equal to k or an empty slot is
reached.
● Delete(k): Delete operation is interesting. If we simply delete a key, then the search may fail.
So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.
Different ways of Open Addressing:
1. Linear Probing:
In linear probing, the hash table is searched sequentially that starts from the original location of
the hash. If the location that we get is already occupied, then we check for the next location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
For example, The typical gap between two probes is 1 as seen in the example below:
Let hash(x) be the slot index computed using a hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that are
to be inserted are 50, 70, 76, 85, 93.
2. Quadratic Probing:
If you observe carefully, then you will understand that the interval between probes will increase
proportionally to the hash value. Quadratic probing is a method with the help of which we can
solve the problem of clustering that was discussed above. This method is also known as the
mid-square method. In this method, we look for the i2‘th slot in the ith iteration. We always start
from the original hash location. If only the location is occupied then we check the other slots.
let hash(x) be the slot index computed using the hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision resolution
strategy to be f(i) = i2 . Insert = 22, 30, and 50.
3. Double Hashing:
The intervals that lie between probes are computed by another hash function. Double hashing is
a technique that reduces clustering in an optimized way. In this technique, the increments for the
probing sequence are computed by using another hash function. We use another hash function
hash2(x) and look for the i*hash2(x) slot in the ith rotation.
let hash(x) be the slot index computed using the hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-function
is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
Code for Double Hashing:
/*
** Handling of collision via open addressing
** Method for Probing: Double Hashing
*/
#include <iostream>
#include <vector>
#include <bitset>
using namespace std;
#define MAX_SIZE 10000001ll
class doubleHash {
53
int TABLE_SIZE, keysPresent, PRIME;
vector<int> hashTable;
bitset<MAX_SIZE> isPrime;
public:
doubleHash(int n){
__setSieve();
TABLE_SIZE = n;
/* Find the largest prime number smaller than hash table's size. */
PRIME = TABLE_SIZE - 1;
while(isPrime[PRIME] == 1)
PRIME--;
keysPresent = 0;
if(isFull()){
cout<<("ERROR : Hash Table Full\n");
return;
}
54
int probe = hash1(value), offset = hash2(value); // in linear probing offset
= 1;
while(hashTable[probe] != -1){
if(-2 == hashTable[probe])
break; //
insert at deleted element's location
probe = (probe+offset) % TABLE_SIZE;
}
hashTable[probe] = value;
keysPresent += 1;
}
while(hashTable[probe] != -1)
if(hashTable[probe] == value){
hashTable[probe] = -2; // mark element as
deleted (rather than unvisited(-1)).
keysPresent--;
return;
}
else
probe = (probe + offset) % TABLE_SIZE;
while(1){
if(hashTable[probe] == -1) // Stop search if
-1 is encountered.
break;
else if(hashTable[probe] == value) // Stop search after
finding the element.
return true;
else if(probe == initialPos && !firstItr) // Stop search if one
complete traversal of hash table is completed.
return false;
else
probe = ((probe + offset) % TABLE_SIZE); // if none of the
above cases occur then update the index and check at it.
firstItr = false;
}
return false;
}
55
};
int main(){
doubleHash myHash(13); // creates an empty hash table of size 13
/*
** Searches for random element in the hash table,
** and prints them if found.
*/
return 0;
}
Output of the code:
Status of hash table after initial insertions : -1, 66, -1, -1, -1, -1, 123, -1, -1, 87,
-1, 115, 12,
● For the first step, the time taken depends on the K and the hash function.
For example, if the key is a string “abcd”, then its hash function may depend on the length of
the string. But for very large values of n, the number of entries into the map, and length of
the keys is almost negligible in comparison to n so hash computation can be considered to
take place in constant time, i.e, O(1).
● For the second step, traversal of the list of K-V pairs present at that index needs to be done.
For this, the worst case may be that all the n entries are at the same index. So, time
56
complexity would be O(n). But, enough research has been done to make hash functions
uniformly distribute the keys in the array so this almost never happens.
● So, on an average, if there are n entries and b is the size of the array there would be n/b
entries on each index. This value n/b is called the load factor that represents the load that is
there on our map.
● This Load Factor needs to be kept low, so that the number of entries at one index is less and
so is the complexity almost constant, i.e., O(1).
11. Re-Hashing
Rehashing is the process of increasing the size of a hashmap and redistributing the elements to
new buckets based on their new hash values. It is done to improve the performance of the
hashmap and to prevent collisions caused by a high load factor.
When a hashmap becomes full, the load factor (i.e., the ratio of the number of elements to the
number of buckets) increases. As the load factor increases, the number of collisions also
increases, which can lead to poor performance. To avoid this, the hashmap can be resized and the
elements can be rehashed to new buckets, which decreases the load factor and reduces the
number of collisions.
During rehashing, all elements of the hashmap are iterated and their new bucket positions are
calculated using the new hash function that corresponds to the new size of the hashmap. This
process can be time-consuming but it is necessary to maintain the efficiency of the hashmap.
Why Rehashing?
Rehashing is needed in a hashmap to prevent collision and to maintain the efficiency of the data
structure.
As elements are inserted into a hashmap, the load factor (i.e., the ratio of the number of elements
to the number of buckets) increases. If the load factor exceeds a certain threshold (often set to
0.75), the hashmap becomes inefficient as the number of collisions increases. To avoid this, the
hashmap can be resized and the elements can be rehashed to new buckets, which decreases the
load factor and reduces the number of collisions. This process is known as rehashing.
Rehashing can be costly in terms of time and space, but it is necessary to maintain the efficiency
of the hashmap.
How is Rehashing done?
Rehashing can be done as follows:
● For each addition of a new entry to the map, check the load factor.
● If it’s greater than its pre-defined value (or default value of 0.75 if not given), then Rehash.
● For Rehash, make a new array of double the previous size and make it the new bucket array.
● Then traverse to each element in the old bucketArray and call the insert() for each so as to
insert it into the new larger bucket array.
Program to implement Rehashing:
#include <iostream>
#include <vector>
#include <functional>
class Map {
private:
class MapNode {
public:
int key;
int value;
MapNode* next;
57
// The bucket array where
// the nodes containing K-V pairs are stored
std::vector<MapNode*> buckets;
// Default loadFactor
double DEFAULT_LOAD_FACTOR = 0.75;
public:
Map() {
numBuckets = 5;
buckets.resize(numBuckets);
58
buckets[bucketInd] = newElementNode;
std::cout << "Pair(" << key << ", " << value << ") inserted successfully."
<< std::endl;
// Incrementing size
// as new K-V pair is added to the map
size++;
std::cout << "Current Load factor = " << loadFactor << std::endl;
// Rehash
rehash();
std::cout << "New Size of Map: " << numBuckets << std::endl;
}
std::cout << "Number of pairs in the Map: " << size << std::endl;
}
void rehash() {
std::cout << "\n***Rehashing Started***\n" << std::endl;
59
std::cout << "***Rehashing Done***\n" << std::endl;
}
};
int main() {
Map map;
// Inserting elements
map.insert(1, 1);
map.insert(2, 2);
map.insert(3, 3);
map.insert(4, 4);
map.insert(5, 5);
map.insert(6, 6);
map.insert(7, 7);
map.insert(8, 8);
map.insert(9, 9);
map.insert(10, 10);
return 0;
}
Output of the code:
HashMap created
Number of pairs in the Map: 0
Size of Map: 5
Default Load Factor : 0.75
Current HashMap:
key = 1, val = Geeks
Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A
60
***Rehashing Started***
***Rehashing Ended***
Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A
key = 4, val = Computer
Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A
key = 4, val = Computer
key = 5, val = Portal
61
● Hash is used in cryptography as a message digest.
● Rabin-Karp algorithm for pattern matching in a string.
● Calculating the number of different substrings of a string.
5.2 Tries
1. Introduction
Trie data structure is defined as a Tree based data structure that is used for storing some
collection of strings and performing efficient search operations on them. The word Trie is derived
from reTRIEval, which means finding something or obtaining it.
Trie data structure follows some property that If two strings have a common prefix then they
will have the same ancestor in the trie. A trie data structure can be used to sort a collection of
strings alphabetically as well as search whether a string with a given prefix is present in the trie
or not.
Need of Trie
A Trie data structure is used for storing and retrieval of data and the same operations could be
done using another data structure which is Hash Table but Trie data structure can perform these
operations more efficiently than a Hash Table. Moreover, Trie has its own advantage over the
Hash table. A Trie data structure can be used for prefix-based searching whereas a Hash table
can’t be used in the same way.
The trie data structure has the following advantages over a hash table:
● We can efficiently do prefix search (or auto-complete) with Trie.
● We can easily print all words in alphabetical order which is not easily possible with hashing.
● There is no overhead of Hash functions in a Trie data structure.
● Searching for a String even in the large collection of strings in a Trie data structure can be
done in O(L) Time complexity, Where L is the number of words in the query string. This
searching time could be even less than O(L) if the query string does not exist in the trie.
3. Properties of Trie
Trie data structure can contain any number of characters including alphabets, numbers, and
special characters. But for this article, we will discuss strings with characters a-z. Therefore, only
26 pointers are needed for every node, where the 0th index represents ‘a’ and the 25th index
represents ‘z’ characters.
Any lowercase English word can start with a-z, then the next letter of the word could be a-z, the
third letter of the word again could be a-z, and so on. So for storing a word, we need to take an
array (container) of size 26 and initially, all the characters are empty as there are no words and it
will look as shown below.
Let’s see how a word “and” and “ant” is stored in the Trie data structure:
1. Store “and” in Trie data structure:
● The word “and” starts with “a“, So we will mark the position “a” as filled in the Trie
node, which represents the use of “a”.
● After placing the first character, for the second character again there are 26
possibilities, So from “a“, again there is an array of size 26, for storing the 2nd
character.
● The second character is “n“, So from “a“, we will move to “n” and mark “n” in the 2nd
array as used.
● After “n“, the 3rd character is “d“, So mark the position “d” as used in the respective
array.
2. Store “ant” in the Trie data structure:
● The word “ant” starts with “a” and the position of “a” in the root node has already
been filled. So, no need to fill it again, just move to the node ‘a‘ in Trie.
● For the second character ‘n‘ we can observe that the position of ‘n’ in the ‘a’ node has
already been filled. So, no need to fill it again, just move to node ‘n’ in Trie.
● For the last character ‘t‘ of the word, The position for ‘t‘ in the ‘n‘ node is not filled. So,
fill the position of ‘t‘ in the ‘n‘ node and move to ‘t‘ node.
Every Trie node consists of a character pointer array or hashmap and a flag to represent if the
word is ending at that node or not. But if the words contain only lower-case letters (i.e. a-z), then
we can define Trie Node with an array instead of a hashmap.
Code in C++:
struct TrieNode {
struct TrieNode* children[ALPHABET_SIZE];
1. Insertion:
This operation is used to insert new strings into the Trie data structure.
Algorithm:
1. Define a function insert(TrieNode *root, string &word) which will take two parameters one
for the root and the other for the string that we want to insert in the Trie data structure.
2. Now take another pointer currentNode and initialize it with the root node.
3. Iterate over the length of the given string and check if the value is NULL or not in the array
of pointers at the current character of the string.
● If It’s NULL then, make a new node and point the current character to this newly created
node.
● Move the current to the newly created node.
63
4. Finally, increment the wordCount of the last currentNode, this implies that there is a string
ending currentNode.
Below is the implementation of the above algorithm:
void insert_key(TrieNode* root, string& key)
{
// Initialize the currentNode pointer
// with the root node
TrieNode* currentNode = root;
// Now, move the current node pointer to the newly created node.
currentNode = currentNode->childNode[c - 'a'];
}
2. Searching:
Search operation in Trie is performed in a similar way as the insertion operation but the only
difference is that whenever we find that the array of pointers in curr node does not point to the
current character of the word then return false instead of creating a new node for that current
character of the word.
This operation is used to search whether a string is present in the Trie data structure or not.
There are two search approaches in the Trie data structure.
1. Find whether the given word exists in Trie.
2. Find whether any word that starts with the given prefix exists in Trie.
There is a similar search pattern in both approaches. The first step in searching a given word in
Trie is to convert the word to characters and then compare every character with the trie node
from the root node. If the current character is present in the node, move forward to its children.
Repeat this process until all characters are found.
64
// Check if the node exist for the current
// character in the Trie.
if (currentNode->childNode[c - 'a'] == NULL) {
3. Deletion:
bool delete_key(TrieNode* root, string& word)
{
TrieNode* currentNode = root;
TrieNode* lastBranchNode = NULL;
char lastBrachChar = 'a';
65
if (count > 1) {
lastBranchNode = currentNode;
lastBrachChar = c;
}
currentNode = currentNode->childNode[c - 'a'];
}
}
int count = 0;
for (int i = 0; i < 26; i++) {
if (currentNode->childNode[i] != NULL)
count++;
}
7. Implementation of Trie
#include <bits/stdc++.h>
using namespace std;
struct TrieNode {
TrieNode()
{
// constructor
// initialize the wordCnt variable with 0
// initialize every index of childNode array with
// NULL
wordCount = 0;
for (int i = 0; i < 26; i++) {
childNode[i] = NULL;
}
}
};
66
TrieNode* currentNode = root;
67
}
else {
int count = 0;
for (int i = 0; i < 26; i++) {
if (currentNode->childNode[i] != NULL)
count++;
}
if (count > 1) {
lastBranchNode = currentNode;
lastBrachChar = c;
}
currentNode = currentNode->childNode[c - 'a'];
}
}
int count = 0;
for (int i = 0; i < 26; i++) {
if (currentNode->childNode[i] != NULL)
count++;
}
// Driver code
int main()
{
// Make a root node for the Trie
TrieNode* root = new TrieNode();
68
= { "do", "geek", "bat" };
return 0;
}
69
Insertion O(n) O(n * m)
9. Applications of Trie
1. Autocomplete Feature: Autocomplete provides suggestions based on what you type in the
search box. Trie data structure is used to implement autocomplete functionality.
2. Spell Checkers: If the word typed does not appear in the dictionary, then it shows suggestions
based on what you typed.
It is a 3-step process that includes :
1. Checking for the word in the data dictionary.
2. Generating potential suggestions.
3. Sorting the suggestions with higher priority on top.
Trie stores the data dictionary and makes it easier to build an algorithm for searching the word
from the dictionary and provides the list of valid words for the suggestion.
3. Longest Prefix Matching Algorithm(Maximum Prefix Length Match): This algorithm is used in
networking by the routing devices in IP networking. Optimization of network routes requires
contiguous masking that bound the complexity of lookup time to O(n), where n is the length of
the URL address in bits.
To speed up the lookup process, Multiple Bit trie schemes were developed that perform the
lookups of multiple bits faster.
● Trie allows us to input and finds strings in O(l) time, where l is the length of a single word. It
is faster as compared to both hash tables and binary search trees.
● It provides alphabetical filtering of entries by the key of the node and hence makes it easier
to print all words in alphabetical order.
● Trie takes less space when compared to BST because the keys are not explicitly saved;
instead, each key requires just an amortized fixed amount of space to be stored.
● Prefix search/Longest prefix matching can be efficiently done with the help of trie data
structure.
● Since trie doesn’t need any hash function for its implementation, they are generally faster
than hash tables for small keys like integers and pointers.
● Tries support ordered iteration whereas iteration in a hash table will result in pseudorandom
order given by the hash function which is usually more cumbersome.
● Deletion is also a straightforward algorithm with O(l) as its time complexity, where l is the
length of the word to be deleted.
● The main disadvantage of the trie is that it takes a lot of memory to store all the strings. For
each node, we have too many node pointers which are equal to the number of characters in
the worst case.
● An efficiently constructed hash table(i.e. a good hash function and a reasonable load factor)
has O(1) as lookup time which is way faster than O(l) in the case of a trie, where l is the
length of the string.
70
Chapter 6
Dynamic Programming & Greedy
Algorithm
6.1 Dynamic Programming
1. Introduction
Dynamic Programming (DP) is a method used in mathematics and computer science to solve
complex problems by breaking them down into simpler subproblems. By solving each subproblem
only once and storing the results, it avoids redundant computations, leading to more efficient
solutions for a wide range of problems.
2. Working of DP
● Identify Subproblems: Divide the main problem into smaller, independent subproblems.
● Store Solutions: Solve each subproblem and store the solution in a table or array.
● Build Up Solutions: Use the stored solutions to build up the solution to the main problem.
● Avoid Redundancy: By storing solutions, DP ensures that each subproblem is solved only
once, reducing computation time.
Examples of Dynamic Programming (DP)
Example 1: Consider the problem of finding the Fibonacci Sequence:
Fibonacci Sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, …
Brute Force Approach:
To find the nth Fibonacci number using a brute force approach, you would simply add the (n-1)th
and (n-2)th Fibonacci numbers. This would work, but it would be inefficient for large values of n,
as it would require calculating all the previous Fibonacci numbers.
Dynamic Programming Approach:
Fibonacci Series using Dynamic Programming
● Subproblems: F(0), F(1), F(2), F(3), …
● Store Solutions: Create a table to store the values of F(n) as they are calculated.
● Build Up Solutions: For F(n), look up F(n-1) and F(n-2) in the table and add them.
● Avoid Redundancy: The table ensures that each subproblem (e.g., F(2)) is solved only once.
By using DP, we can efficiently calculate the Fibonacci sequence without having to recompute
subproblems.
3. Characteristics of DP
Dynamic programming is an optimization technique used when solving problems that consists of
the following characteristics:
A. Optimal Substructure:
Optimal substructure means that we combine the optimal results of subproblems to achieve the
optimal result of the bigger problem.
Example:
Consider the problem of finding the minimum cost path in a weighted graph from a source node to
a destination node. We can break this problem down into smaller subproblems:
● Find the minimum cost path from the source node to each intermediate node.
● Find the minimum cost path from each intermediate node to the destination node.
The solution to the larger problem (finding the minimum cost path from the source node to the
destination node) can be constructed from the solutions to these smaller subproblems.
B. Overlapping Subproblems:
The same subproblems are solved repeatedly in different parts of the problem.
71
Example:
Consider the problem of computing the Fibonacci series. To compute the Fibonacci number at
index n, we need to compute the Fibonacci numbers at indices n-1 and n-2. This means that the
subproblem of computing the Fibonacci number at index n-1 is used twice in the solution to the
larger problem of computing the Fibonacci number at index n.
4. Approaches of DP
5. Advantages of DP
6. Applications of DP
● In dynamic programming, problems are solved by breaking them down into smaller ones to
solve the larger ones, while recursion is when a function is called and executed by itself.
While dynamic programming can function without making use of recursion techniques, since
the purpose of dynamic programming is to optimize and accelerate the process, programmers
usually make use of recursion techniques to accelerate and turn the process efficiently.
● When a function can execute a specific task by calling itself, receive the name of the recursive
function. In order to perform and accomplish the work, this function calls itself when it has to
be executed.
● Using dynamic programming, you can break a problem into smaller parts, called
subproblems, to solve it. Dynamic programming involves solving the problem for the first
time, then using memoization to store the solutions.
72
● Therefore, the main difference between the two techniques is their intended use; recursion is
used to automate a function, whereas dynamic programming is an optimization technique
used to solve problems.
● Recursive functions recognize when they are needed, execute themselves, then stop working.
When the function identifies the moment it is needed, it calls itself and is executed; this is
called a recursive case. As a result, the function must stop once the task is completed, known
as the base case.
● By establishing states, dynamic programming recognizes the problem and divides it into
sub-problems in order to solve the whole scene. After solving these sub-problems, or
variables, the programmer must establish a mathematical relationship between them. Last
but not least, these solutions and results are stored as algorithms, so they can be accessed in
the future without having to solve the whole problem again.
73
9. Implementation of DP in C++
A. Memoization
Code:
#include <iostream>
#include <unordered_map>
int fibonacci(int n)
{
if (n == 0) {
return 0;
}
else if (n == 1) {
return 1;
}
else {
std::vector<int> table(n + 1, 0);
table[0] = 0;
table[1] = 1;
for (int i = 2; i <= n; i++) {
table[i] = table[i - 1] + table[i - 2];
}
return table[n];
}
}
10. Bitmasking and DP: Count ways to assign unique cap to every person
// This is used for base case, it has all the N bits set
// so, it tells whether all N persons are wearing a cap.
int allmask;
// So, assign one by one ith cap to all the possible persons
// and recur for remaining caps.
for (int j = 0; j < size; j++)
75
{
// if person capList[i][j] is already wearing a cap so continue as
// we cannot assign him this cap
if (mask & (1 << capList[i][j])) continue;
// Else assign him this cap and recur for remaining caps with
// new updated mask vector
else ways += countWaysUtil(mask | (1 << capList[i][j]), i+1);
ways %= MOD;
}
// Driver Program
int main()
{
int n; // number of persons in every test case
cin >> n;
countWays(n);
return 0;
}
76
11. Sum over Subsets
Consider the following problem where we will use Sum over subset Dynamic Programming to
solve it.
Given an array of 2n integers, we need to calculate the function F(x) = ? such that x&i==i for all
x. i.e, i is a bitwise subset of x. i will be a bitwise subset of mask x, if x&i==i.
Examples:
Input: A[] = {7, 12, 14, 16} , n = 2
Output: 7, 19, 21, 49
Explanation: There will be 4 values of x: 0,1,2,3
So, we need to calculate F(0),F(1),F(2),F(3).
Now, F(0) = A0 = 7
F(1) = A0 + A1 = 19
F(2) = A0 + A2 = 21
F(3) = A0 + A1 + A2 + A3 = 49
// if i is a bitwise subset of x
if ((x & i) == i)
sos[x] += a[i];
}
}
// Driver Code
int main() {
int a[] = {7, 12, 14, 16};
77
int n = 2;
SumOverSubsets(a, n);
return 0;
}
2. Sub Optimal Approach:
The brute-force algorithm can be easily improved by just iterating over bitwise subsets. Instead
of iterating for every i, we can simply iterate for the bitwise subsets only. Iterating backward for
i=(i-1)&x gives us every bitwise subset, where i starts from x and ends at 1. If the mask x has k
𝑘 𝑘
set bits, we do 2 iterations. A number of k set bits will have 2 bitwise subsets. Therefore the
𝑛
total number of masks x with k set bits is 𝐶𝑘. Therefore the total number of iterations is
𝑛 𝑘 𝑛
𝐶𝑘 . 2 = 3
𝑛
Time Complexity: O(3 )
Below is the implementation of above idea:
// CPP program for sub-optimal
// approach of SumOverSubsets DP
#include <bits/stdc++.h>
using namespace std;
// Driver Code
int main() {
int a[] = {7, 12, 14, 16};
int n = 2;
SumOverSubsets(a, n);
return 0;
}
Given an array arr[] of size N. The task is to find the sum of the contiguous subarray within an
arr[] with the largest sum.
Example:
Input: arr = {-2,-3,4,-1,-2,1,5,-3}
Output: 7
Explanation: The subarray {4,-1, -2, 1, 5} has the largest sum 7.
Input: arr = {2}
Output: 2
Explanation: The subarray {2} has the largest sum 1.
Input: arr = {5,4,1,7,8}
Output: 25
Explanation: The subarray {5,4,1,7,8} has the largest sum 25.
78
The idea of Kadane’s algorithm is to maintain a variable max_ending_here that stores the
maximum sum contiguous subarray ending at current index and a variable max_so_far stores the
maximum sum of contiguous subarray found so far, Everytime there is a positive-sum value in
max_ending_here compare it with max_so_far and update max_so_far if it is greater than
max_so_far.
// C++ program to print largest contiguous array sum
#include <bits/stdc++.h>
using namespace std;
if (max_ending_here < 0)
max_ending_here = 0;
}
return max_so_far;
}
// Driver Code
int main()
{
int a[] = { -2, -3, 4, -1, -2, 1, 5, -3 };
int n = sizeof(a) / sizeof(a[0]);
// Function Call
int max_sum = maxSubArraySum(a, n);
cout << "Maximum contiguous sum is " << max_sum;
return 0;
}
To print the subarray with the maximum sum the idea is to maintain start index of
maximum_sum_ending_here at current index so that whenever maximum_sum_so_far is updated
with maximum_sum_ending_here then start index and end index of subarray can be updated with
start and current index.
// C++ program to print largest contiguous array sum
#include <climits>
#include <iostream>
using namespace std;
if (max_ending_here < 0) {
max_ending_here = 0;
s = i + 1;
79
}
}
cout << "Maximum contiguous sum is " << max_so_far
<< endl;
cout << "Starting index " << start << endl
<< "Ending index " << end << endl;
}
A car factory has two assembly lines, each with n stations. A station is denoted by Si,j where i is
either 1 or 2 and indicates the assembly line the station is on, and j indicates the number of the
station. The time taken per station is denoted by ai,j. Each station is dedicated to some sort of
work like engine fitting, body fitting, painting, and so on. So, a car chassis must pass through
each of the n stations in order before exiting the factory. The parallel stations of the two assembly
lines perform the same task. After it passes through station Si,j, it will continue to station Si,j+1
unless it decides to transfer to the other line. Continuing on the same line incurs no extra cost,
but transferring from line i at station j – 1 to station j on the other line takes time ti,j. Each
assembly line takes an entry time ei and exit time xi which may be different for the two lines.
Give an algorithm for computing the minimum time it will take to build a car chassis.
The below figure presents the problem in a clear picture:
The following information can be extracted from the problem statement to make it simpler:
● Two assembly lines, 1 and 2, each with stations from 1 to n.
● A car chassis must pass through all stations from 1 to n in order(in any of the two assembly
lines). i.e. it cannot jump from station i to station j if they are not at one move distance.
80
● The car chassis can move one station forward in the same line, or one station diagonally in
the other line. It incurs an extra cost ti, j to move to station j from line i. No cost is incurred
for movement in the same line.
● The time taken in station j on line i is ai, j.
● Si, j represents a station j on line i.
Breaking the problem into smaller sub-problems:
We can easily find the i’th factorial if (i-1)th factorial is known. Can we apply the similar funda
here?
If the minimum time taken by the chassis to leave station Si, j-1 is known, the minimum time
taken to leave station Si, j can be calculated quickly by combining ai, j and ti, j.
T1(j) indicates the minimum time taken by the car chassis to leave station j on assembly line 1.
T2(j) indicates the minimum time taken by the car chassis to leave station j on assembly line 2.
Base cases:
The entry time e_i comes into picture only when the car chassis enters the car factory.
Time taken to leave the first station in line 1 is given by:
T1(1) = Entry time in Line 1 + Time spent in station S1,1
T1(1) = e1 + a1,1
Similarly, time taken to leave the first station in line 2 is given by:
T2(1) = e2 + a2,1
Recursive Relations:
If we look at the problem statement, it quickly boils down to the below observations:
The car chassis at station S1,j can come either from station S1, j-1 or station S2, j-1.
Case #1: Its previous station is S1, j-1
The minimum time to leave station S1,j is given by:
T1(j) = Minimum time taken to leave station S1, j-1 + Time spent in station S1, j
T1(j) = T1(j-1) + a1, j
Case #2: Its previous station is S2, j-1
The minimum time to leave station S1, j is given by:
T1(j) = Minimum time taken to leave station S2, j-1 + Extra cost incurred to change the assembly
line + Time spent in station S1, j
T1(j) = T2(j-1) + t2, j + a1, j
The minimum time T1(j) is given by the minimum of the two obtained in cases #1 and #2.
T1(j) = min((T1(j-1) + a1, j), (T2(j-1) + t2, j + a1, j))
Similarly, the minimum time to reach station S2, j is given by:
T2(j) = min((T2(j-1) + a2, j), (T1(j-1) + t1, j + a2, j))
The total minimum time taken by the car chassis to come out of the factory is given by:
Tmin = min(Time taken to leave station Si,n + Time taken to exit the car factory)
Tmin = min(T1(n) + x1, T2(n) + x2)
Code:
#include <iostream>
#include <vector>
using namespace std;
81
int diff = fun(a, t, !cl, cs + 1, x1, x2, n)
+ a[!cl][cs + 1] + t[cl][cs + 1];
int e1 = 10;
int e2 = 12;
int x1 = 18;
int x2 = 7;
// entry from 1st line
int x = fun(a, t, 0, 0, x1, x2, n) + e1 + a[0][0];
// entry from 2nd line
int y = fun(a, t, 1, 0, x1, x2, n) + e2 + a[1][0];
cout << min(x, y) << endl;
}
A greedy algorithm is a type of optimization algorithm that makes locally optimal choices at each
step to find a globally optimal solution. It operates on the principle of “taking the best option
now” without considering the long-term consequences.
Given the weights and profits of N items, in the form of {profit, weight} put these items in a
knapsack of capacity W to get the maximum total profit in the knapsack. In Fractional
Knapsack, we can break items for maximizing the total value of the knapsack.
An efficient solution is to use the Greedy approach.
83
The basic idea of the greedy approach is to calculate the ratio profit/weight for each item and sort
the item on the basis of this ratio. Then take the item with the highest ratio and add them as much
as we can (can be the whole element or a fraction of it).
This will always give the maximum profit because, in each step it adds an element such that this
is the maximum possible profit for that much weight.
Follow the given steps to solve the problem using the above approach:
● Calculate the ratio (profit/weight) for each item.
● Sort all the items in decreasing order of the ratio.
● Initialize res = 0, curr_cap = given_cap.
● Do the following for every item i in the sorted order:
○ If the weight of the current item is less than or equal to the remaining capacity then add
the value of that item into the result
○ Else add the current item as much as we can and break out of the loop.
● Return res.
Below is the implementation of the above approach:
// C++ program to solve fractional Knapsack Problem
#include <bits/stdc++.h>
using namespace std;
// Constructor
Item(int profit, int weight)
{
this->profit = profit;
this->weight = weight;
}
};
84
// add fractional part of it
else {
finalvalue
+= arr[i].profit
* ((double)W / (double)arr[i].weight);
break;
}
}
// Driver code
int main()
{
int W = 50;
Item arr[] = { { 60, 10 }, { 100, 20 }, { 120, 30 } };
int N = sizeof(arr) / sizeof(arr[0]);
// Function call
cout << fractionalKnapsack(W, arr, N);
return 0;
}
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length
codes to input characters; lengths of the assigned codes are based on the frequencies of
corresponding characters.
The variable-length codes assigned to input characters are Prefix Codes, meaning the codes (bit
sequences) are assigned in such a way that the code assigned to one character is not the prefix of
code assigned to any other character. This is how Huffman Coding makes sure that there is no
ambiguity when decoding the generated bitstream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d,
and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity
because code assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream
is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.
There are mainly two major parts in Huffman Coding
● Build a Huffman Tree from input characters.
● Traverse the Huffman Tree and assign codes to characters.
Algorithm of Huffman Coding
Algorithm Huffman (c)
{
n= |c|
Q = c
for i<-1 to n-1
do
{
85
}
86
temp->left = temp->right = NULL;
temp->data = data;
temp->freq = freq;
return temp;
}
// current size is 0
minHeap->size = 0;
minHeap->capacity = capacity;
// A utility function to
// swap two min heap nodes
void swapMinHeapNode(struct MinHeapNode** a,
struct MinHeapNode** b)
if (smallest != idx) {
swapMinHeapNode(&minHeap->array[smallest],
&minHeap->array[idx]);
minHeapify(minHeap, smallest);
}
}
87
// A utility function to check
// if size of heap is 1 or not
int isSizeOne(struct MinHeap* minHeap)
{
--minHeap->size;
minHeapify(minHeap, 0);
return temp;
}
++minHeap->size;
int i = minHeap->size - 1;
while (i
&& minHeapNode->freq
< minHeap->array[(i - 1) / 2]->freq) {
minHeap->array[i] = minHeapNode;
}
int n = minHeap->size - 1;
int i;
88
cout << "\n";
}
minHeap->size = size;
buildMinHeap(minHeap);
return minHeap;
}
{
struct MinHeapNode *left, *right, *top;
89
top->left = left;
top->right = right;
insertMinHeap(minHeap, top);
}
arr[top] = 0;
printCodes(root->left, arr, top + 1);
}
arr[top] = 1;
printCodes(root->right, arr, top + 1);
}
{
// Construct Huffman Tree
struct MinHeapNode* root
= buildHuffmanTree(data, freq, size);
// Driver code
int main()
{
90
char arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
int freq[] = { 5, 9, 12, 13, 16, 45 };
return 0;
}
Output
f: 0
c: 100
d: 101
a: 1100
b: 1101
e: 111
91
Chapter 7
Graphs
7.1 Introduction
Graph Theory is a fundamental area of study in computer science and mathematics that deals
with the study of graphs, which are mathematical structures used to model pairwise
relationships between objects. Graphs consist of vertices (nodes) and edges (connections between
nodes). In this comprehensive overview, we'll cover the major topics in Graph Theory, including
basic definitions, types of graphs, representations, traversal algorithms, shortest path
algorithms, and applications.
Graphs can represent various real-world scenarios such as social networks, road networks,
computer networks, etc.
● Undirected Graph: Edges have no direction. If there's an edge between vertices u and v,
it can be traversed in both directions.
● Directed Graph (Digraph): Edges have a direction. If there's a directed edge from
vertex u to vertex v, it can only be traversed from u to v.
2. Weighted Graphs:
● Edges have weights or costs associated with them, which represent the cost, distance, or
capacity between vertices.
● Acyclic Graph: Contains no cycles (a path that starts and ends at the same vertex).
● Cyclic Graph: Contains at least one cycle.
● A 2D array A of size |𝑉| * |𝑉|, where A[i][j] is non-zero if there's an edge from vertex i to
vertex j.
2
● Suitable for dense graphs but consumes O(𝑉 ) space.
2. Adjacency List:
● An array of lists (or hashmap of lists), where each element iii contains a list of vertices
adjacent to vertex iii.
● Suitable for sparse graphs and consumes O(V + E) space.
92
7.4 Graph Traversal Algorithms
1. Depth-First Search (DFS):
● Explores all neighbors at the present depth level before moving on to nodes at the next
depth level.
● Uses a queue.
● Finds the shortest paths from a source vertex to all other vertices in a graph with
non-negative edge weights.
● Uses a priority queue (min-heap).
2. Bellman-Ford Algorithm:
● Finds the shortest paths from a single source vertex to all other vertices, even in the
presence of negative edge weights.
● Uses relaxation technique over all edges |𝑉| − 1 times.
3. Floyd-Warshall Algorithm:
● Finds shortest paths between all pairs of vertices in a graph, including negative edge
weights.
● Uses dynamic programming.
● Finds an MST by sorting all edges and adding the smallest edge that doesn’t form a cycle
until |𝑉| − 1 edges are added.
● Uses Union-Find data structure for cycle detection.
2. Prim's Algorithm:
● Finds an MST by starting from an arbitrary vertex and greedily adding the minimum
weight edge that connects the tree to an outside vertex.
● Uses a priority queue (min-heap).
93
7.8 Advanced Topics
1. Strongly Connected Components (SCC):
● Maximal subgraphs where each vertex is reachable from every other vertex in the
subgraph.
● Eulerian Path/Cycle: Visit each edge exactly once (Cycle returns to the starting vertex).
● Hamiltonian Path/Cycle: Visit each vertex exactly once (Cycle returns to the starting
vertex).
Code in C++
// C++ program for the above approach
#include <bits/stdc++.h>
using namespace std;
public:
DSU(int n)
{
parent = new int[n];
rank = new int[n];
94
// Find function
int find(int i)
{
if (parent[i] == -1)
return i;
// Union function
void unite(int x, int y)
{
int s1 = find(x);
int s2 = find(y);
if (s1 != s2) {
if (rank[s1] < rank[s2]) {
parent[s1] = s2;
}
else if (rank[s1] > rank[s2]) {
parent[s2] = s1;
}
else {
parent[s2] = s1;
rank[s1] += 1;
}
}
}
};
class Graph {
vector<vector<int> > edgelist;
int V;
public:
Graph(int V) { this->V = V; }
void kruskals_mst()
{
// Sort all edges
sort(edgelist.begin(), edgelist.end());
95
s.unite(x, y);
ans += w;
cout << x << " -- " << y << " == " << w
<< endl;
}
}
cout << "Minimum Cost Spanning Tree: " << ans;
}
};
// Driver code
int main()
{
Graph g(4);
g.addEdge(0, 1, 10);
g.addEdge(1, 3, 15);
g.addEdge(2, 3, 4);
g.addEdge(2, 0, 6);
g.addEdge(0, 3, 5);
// Function call
g.kruskals_mst();
return 0;
}
7.11 Graphs Problem 2: Minimize the cash flow among a given
set of friends who have borrowed money from each other
Given a number of friends who have to give or take some amount of money from one another.
Design an algorithm by which the total cash flow among all the friends is minimized.
The idea is to use Greedy Algorithm where at every step, settle all amounts of one person and
recur for remaining n-1 persons.
How to pick the first person? To pick the first person, calculate the net amount for every person
where net amount is obtained by subtracting all debts (amounts to pay) from all credits (amounts
to be paid). Once the net amount for every person is evaluated, find two persons with maximum
and minimum net amounts. These two persons are the most creditors and debtors. The person
with a minimum of two is our first person to be settled and removed from the list. Let the
minimum of two amounts be x. We pay ‘x’ amount from the maximum debtor to maximum
creditor and settle one person. If x is equal to the maximum debit, then maximum debtor is
settled, else maximum creditor is settled.
The following is a detailed algorithm.
Do the following for every person Pi where i is from 0 to n-1.
1. Compute the net amount for every person. The net amount for person ‘i’ can be computed by
subtracting sum of all debts from sum of all credits.
2. Find the two persons that are maximum creditor and maximum debtor. Let the maximum
amount to be credited maximum creditor be maxCredit and maximum amount to be debited
from maximum debtor be maxDebit. Let the maximum debtor be Pd and maximum creditor
be Pc.
3. Find the minimum of maxDebit and maxCredit. Let a minimum of two be x. Debit ‘x’ from Pd
and credit this amount to Pc
4. If x is equal to maxCredit, then remove Pc from the set of persons and recur for remaining
(n-1) persons.
5. If x is equal to maxDebit, then remove Pd from the set of persons and recur for remaining
(n-1) persons.
Code in C++ 14
// C++ program to find maximum cash flow among a set of persons
#include <bits/stdc++.h>
using namespace std;
96
// Number of persons (or vertices in the graph)
#define N 3
97
// either amount[mxCredit] or amount[mxDebit] becomes 0
minCashFlowRec(amount);
}
minCashFlowRec(amount);
}
98