0% found this document useful (0 votes)
19 views98 pages

Final Report SOS Long

Uploaded by

luhadiav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views98 pages

Final Report SOS Long

Uploaded by

luhadiav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Data Structures and Algorithms

Final Report

Varun Luhadia
July 29, 2024

CONTENTS
Abstract & Timeline 4
Chapter 1: Basics of Programming 5
1.1 Data Types 5
1. Definition 5
2. Importance 5
3. Basic Data Types 5
4. Complex Data Types 5
5. Application in Algorithms 6
1.2 If-Else Condition Statements 6
1. Definition 6
2. Importance 6
3. Basic Syntax 7
4. Use Cases 7
1.3 For and While Loops 7
1. Definition 7
2. Importance 7
3. Types of Loops 7
4. Basic Syntax 7
5. Use Cases 8
6. Implementation Code in C 8
1.4 Functions 8
1. Definition 8
2. Importance 8
3. Basic Syntax 8
4. Example in C 8
1.5 Arrays 9
1. Definition 9
2. Importance 9
3. Types of Arrays 9
4. Usage and Operations 9
5. Example Code in C 10
1.6 Strings 10
1. Initialization 10
2. Input and Output 10
3. Length Calculation 10
4. String Copying 10
5. String Concatenation 10
6. String Comparison 10
7. Substring Extraction 10
8. Searching 11
9. Tokenization 11
10. Case Conversion 11
11. String Formatting 11
12. Memory Management 11
13. Error Handling 11
1.7 Pointers 11

1
1. Definition 11
2. Importance 11
3. Basic Operations 11
4. Pointer Arithmetic 12
5. Arrays and Pointers 12
6. Dynamic Memory Allocation 12
7. Example Code in C 12
1.8 Sorting Algorithms 13
1. Bubble Sort 13
2. Insertion Sort 13
3. Merge Sort 14
4. Quick Sort 15
Chapter 2: Recursion, Backtracking & OOPS 17
2.1 Recursion 17
1. Definition 17
2. Types of Recursion 17
3. Importance 17
4. Example Code in C 17
2.2 Backtracking 18
1. Definition 18
2. Importance 18
3. Key Concepts 18
4. Implementation Example: N-Queens Problem 18
2.3 Object Oriented Programming System (OOPS) 19
1. Classes and Objects 20
2. Encapsulation 20
3. Inheritance 21
4. Polymorphism 21
5. Abstraction 22
Chapter 3: Linear Data Structures 23
3.1 Linked List 23
1. Introduction 23
2. Inserting and deleting nodes in a list 23
3. Output of the code 26
3.2 Stack 27
1. Definition 27
2. Types of Stack 27
3. Basic Operations on Stack 27
4. Implementation of Stack in C 28
3.3 Queue 30
1. Introduction 31
2. Types of Queue 31
3. Operating and managing a queue (Enqueue and Dequeue) 32
4. Output of the code 34
Chapter 4: Binary Tree & Heap 36
4.1 Binary Search Tree 36
1. Introduction 36
2. Implementation of Binary Search Tree in C 36
3. Output of the code 38
4. Explanation of the code 38
5. Duplicate Elimination 39
6. Binary Tree Search 39
7. Other Binary Tree Operations 39
4.2 Heap 39
1. Introduction 39

2
2. Types of Heaps 39
3. Heap Operations 39
4. Properties of Heap 40
5. Implementation of Heap in C++ 40
6. Time Complexity of Heap 44
7. Applications of Heaps 45
8. Advantages of Heaps 45
9. Disadvantages of Heaps 46
10. Comparison between Heap and Tree 46
11. Other Types of Heaps 47
Chapter 5: Hashing & Tries 49
5.1 Hashing 49
1. Introduction 49
2. Components of Hashing 49
3. Working of Hashing 49
4. Hash Function 50
5. Types of Hash Functions 50
6. Time and Space Complexity of a Hash Function 52
7. Collision in Hashing 52
8. Separate Chaining 52
9. Open Addressing 52
10. Load Factor in Hashing 56
11. Re-Hashing 57
12. Applications of Hash Data Structure 61
13. Advantages of Hash Data Structure 62
14. Disadvantages of Hash Data Structure 62
5.2 Tries 62
1. Introduction 62
2. Advantages of Trie over Hash Table 62
3. Properties of Trie 62
4. Working of Trie 63
5. Representation of Trie Node 63
6. Basic Operations on Trie 63
7. Implementation of Trie 66
8. Complexity Analysis of Trie 69
9. Applications of Trie 70
10. Advantages of Trie 70
11. Disadvantages of Trie 70
Chapter 6: Dynamic Programming & Greedy Algorithm 71
6.1 Dynamic Programming 71
1. Introduction 71
2. Working of DP 71
3. Characteristics of DP 71
4. Approaches of DP 72
5. Advantages of DP 72
6. Applications of DP 72
7. Difference between DP and Recursion 72
8. Difference between Tabulation and Memoization 73
9. Implementation of DP in C++ 74
10. Bitmasking and DP: Count ways to assign unique cap to every person 74
11. Sum over Subsets 77
12. DP Problem 1: Kadane’s Algorithm (Largest Sum Contiguous Subarray) 78
13. DP Problem 2: Assembly Line Scheduling Problem 80
6.2 Greedy Algorithm 82
1. Introduction 82

3
2. Steps for implementing a Greedy Algorithm 82
3. Characteristics of Greedy Algorithms 82
4. Difference between Greedy Algorithm and Dynamic Programming 83
5. Greedy Algorithm Problem 1: Fractional Knapsack Problem 83
6. Greedy Algorithm Problem 2: Huffman Coding 85
Chapter 7: Graphs 92
7.1 Introduction 92
7.2 Types of Graphs 92
7.3 Graph Representation 92
7.4 Graph Traversal Algorithms 93
7.5 Shortest Path Algorithms 93
7.6 Minimum Spanning Tree (MST) 93
7.7 Applications of Graph Theory 93
7.8 Advanced Topics 94
7.9 Complexity Analysis 94
7.10 Graphs Problem 1: Implementation of Kruskal’s Algorithm 94
7.11 Graphs Problem 2: Minimize the cash flow among a given set of friends who have borrowed money
from each other 96

Abstract & Timeline


This mid-term report aims to highlight the basics of Data Structures and Algorithms consisting
of introductory theory and some generic codes. My learning so far is entirely based on a series of
YouTube videos named DSA Playlist by Striver (Raj Vikramaditya) and C How to Program with
an Introduction to C++ by Paul Deitel and Harvey Deitel.

Detailed Timeline followed during the course is:

Week Target

Week 1 Basics (Data Types, If-Else, Looping, Functions, Arrays, Strings, Pointers)

Week 2 Recursion, Backtracking, Object Oriented Programming

Week 3 Linked List, Stack, Queue

Week 4 Binary Search Tree

25th June Mid Term Report Submission

Week 5 Heap

Week 6 Hashmaps

Week 7 Tries, Dynamic Programming

Week 8 Greedy Algorithm, Graph Theory

29th July Final Report and Video Submission

4
Chapter 1
Basics of Programming
1.1 Data Types
1. Definition

● Data types specify the kind of data that can be stored and manipulated within a program.

2. Importance

● Ensures efficient use of memory.


● Helps in preventing errors during data manipulation.
● Facilitates proper data interpretation by the compiler or interpreter.
● Provides a foundation for creating and using data structures effectively.

3. Basic Data Types

● Integer (int):
○ Represents whole numbers without a fractional component.
○ Commonly used in counting, indexing, and loop iterations.
○ Example: 5, -3, 42.
● Floating Point (float):
○ Represents numbers with a fractional component.
○ Used in calculations involving precision, such as scientific computations.
○ Example: 3.14, -0.001, 2.0.
● Character (char):
○ Represents single characters.
○ Often used in algorithms that process textual data.
○ Example: 'a', 'Z', '#'.
● String (str):
○ Represents a sequence of characters.
○ Used in text manipulation, searching, and sorting algorithms.
○ Example: "Hello, World!", "Python123".
● Boolean (bool):
○ Represents truth values.
○ Crucial in decision-making processes in algorithms.
○ Example: True, False.

4. Complex Data Types

● Array:
○ Collection of elements of the same type, stored in contiguous memory locations.
○ Efficient for indexing and iterating over elements.
○ Example: [1, 2, 3], ['a', 'b', 'c'].
● List:
○ Ordered collection of elements, can contain mixed data types.
○ Dynamic in size, useful in algorithms requiring frequent insertion and deletion.
○ Example: [1, "two", 3.0, True].
● Tuple:
○ Ordered collection of elements, immutable.
○ Used for fixed collections of items.
○ Example: (1, 2, "three").

5
● Set:
○ Unordered collection of unique elements.
○ Useful in algorithms that require membership testing and eliminating duplicates.
○ Example: {1, 2, 3, 4}.
● Dictionary (dict):
○ Collection of key-value pairs.
○ Allows fast lookups, insertions, and deletions.
○ Example: {"name": "Alice", "age": 25}.
● Stack:
○ Collection of elements with Last In, First Out (LIFO) access pattern.
○ Used in algorithms for expression evaluation, backtracking, and function call
management.
○ Example operations: push, pop.
● Queue:
○ Collection of elements with First In, First Out (FIFO) access pattern.
○ Used in scheduling algorithms, breadth-first search (BFS).
○ Example operations: enqueue, dequeue.
● Tree:
○ Hierarchical data structure with nodes connected by edges.
○ Used in searching and sorting algorithms, such as binary search trees.
○ Example: binary tree, AVL tree.
● Graph:
○ Collection of nodes (vertices) and edges connecting pairs of nodes.
○ Used in network analysis, shortest path algorithms, and social network modeling.
○ Example: directed graph, undirected graph.

5. Application in Algorithms

● Sorting Algorithms:
○ Utilize arrays and lists to reorder elements based on specific criteria.
○ Example: Quick Sort, Merge Sort.
● Searching Algorithms:
○ Employ various data structures like arrays, trees, and graphs to locate specific
elements.
○ Example: Binary Search, Depth-First Search (DFS).
● Dynamic Programming:
○ Often uses arrays and matrices to store intermediate results for optimization
problems.
○ Example: Fibonacci sequence, Knapsack problem.
● Graph Algorithms:
○ Utilize graph data structures for traversal and pathfinding.
○ Example: Dijkstra's Algorithm, Kruskal's Algorithm.

1.2 If-Else Condition Statements


1. Definition

● If-else statements are control structures used to execute different blocks of code based on
certain conditions.

2. Importance

● Facilitates decision-making in programs.


● Enables execution of specific code paths depending on dynamic conditions.

6
3. Basic Syntax

● If Statement:
○ Evaluates a condition; if true, executes the associated block of code.
○ Example:
if (condition) {
// code to execute if condition is true
}
● Else Statement:
○ Follows an if statement; executes when the if condition is false.
○ Example:
if (condition) {
// code if condition is true
} else {
// code if condition is false
}
● Else If Statement:
○ Short for "else if"; checks additional conditions if previous if or else if conditions
are false.
○ Example:
if (condition1) {
// code if condition1 is true
} else if (condition2) {
// code if condition2 is true
} else {
// code if none of the above conditions are true
}

4. Use Cases

● Decision making in algorithms.


● Handling edge cases and exceptions.
● Implementing branching logic in applications.

1.3 For and While Loops


1. Definition

● Loops are control structures used to repeat a block of code multiple times.

2. Importance

● Automates repetitive tasks.


● Reduces code redundancy.
● Enhances code efficiency and readability.

3. Types of Loops

● For Loop:
○ Iterates a block of code a specific number of times.
○ Typically used when the number of iterations is known beforehand.
● While Loop:
○ Repeats a block of code as long as a specified condition is true.
○ Used when the number of iterations is not known in advance.

4. Basic Syntax

● For Loop:

7
○ Initializes a counter variable, checks a condition, and updates the counter after
each iteration.
● While Loop:
○ Continuously checks a condition before each iteration and stops when the
condition becomes false.

5. Use Cases

● Iterating through arrays or collections.


● Implementing algorithms that require repeated actions.
● Processing user inputs until a specific condition is met.

6. Implementation Code in C
int main() {
// For Loop Example
printf("For Loop:\n");
for (int i = 0; i < 5; i++) {
printf("Iteration %d\n", i);
}
// While Loop Example
printf("\nWhile Loop:\n");
int j = 0;
while (j < 5) {
printf("Iteration %d\n", j);
j++;
}
return 0;
}

1.4 Functions
1. Definition

● Functions are modular blocks of code that perform a specific task. They allow code
reusability and enhance program organization.

2. Importance

● Encapsulates logic into reusable units.


● Reduces redundancy and promotes code maintenance.
● Enhances readability and debugging.

3. Basic Syntax

● Function Declaration:
○ Specifies the function name, return type, and parameters (if any).
● Function Definition:
○ Contains the actual implementation of the function.
● Function Call:
○ Executes the function’s code at a specific point in the program.

4. Example in C
#include <stdio.h>
// Function declaration
int add(int a, int b);
int main() {
int result = add(3, 5); // Function call

8
printf("Result: %d\n", result);
return 0;
}
// Function definition
int add(int a, int b) {
return a + b;
}

1.5 Arrays
1. Definition

● An array is a data structure that stores a fixed-size sequential collection of elements of


the same type. Each element is accessed by its index, which represents its position in the
array.

2. Importance

● Arrays provide a convenient way to store and access multiple values of the same type
under a single name.
● They facilitate efficient data storage and retrieval, especially when dealing with a large
number of similar data items.
● Arrays are fundamental for implementing other data structures and algorithms, such as
lists, matrices, and sorting algorithms.

3. Types of Arrays

● One-dimensional Arrays:
○ Simplest form where elements are arranged in a single row or column.
○ Accessed using a single index.
○ Example:
int numbers[5]; // Declaration of an array of 5 integers
● Multi-dimensional Arrays:
○ Arrays of arrays where each element is itself an array.
○ Commonly used to represent matrices or tables.
○ Accessed using multiple indices.
○ Example:
int matrix[3][3]; // Declaration of a 3x3 matrix

4. Usage and Operations

● Initialization:
○ Arrays can be initialized at the time of declaration or later using assignment
statements.
● Accessing Elements:
○ Elements of an array are accessed using square brackets [] and the index of the
element.
○ Example:
int numbers[5] = {10, 20, 30, 40, 50};
printf("Third element: %d\n", numbers[2]); // Accessing the third element
● Iterating through Arrays:
○ Loops such as for and while are commonly used to iterate through arrays to
perform operations on each element.
● Manipulating Arrays:
○ Elements can be modified, sorted, or searched within an array using various
algorithms and techniques.

9
5. Example Code in C
#include <stdio.h>
int main() {
// Example of a one-dimensional array
int numbers[5] = {10, 20, 30, 40, 50};
// Accessing and printing elements of the array
printf("Elements of the array: ");
for (int i = 0; i < 5; i++) {
printf("%d ", numbers[i]);
}
printf("\n");
// Example of a multidimensional array (2D matrix)
int matrix[3][3] = {
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};
// Accessing and printing elements of the matrix
printf("Elements of the matrix:\n");
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
printf("%d ", matrix[i][j]);
}
printf("\n");
}
return 0;
}

1.6 Strings
1. Initialization

● Strings are initialized as character arrays terminated by a null character ('\0').

2. Input and Output

● Input: Reading strings from user input using scanf() or fgets().


● Output: Printing strings to the console using printf().

3. Length Calculation

● Determine the length of a string using strlen().

4. String Copying

● Copy one string to another using strcpy().

5. String Concatenation

● Append one string to another using strcat().

6. String Comparison

● Compare strings using strcmp() to determine equality, greater than, or less than.

7. Substring Extraction

● Extract a substring from a string using strncpy() or manipulate using pointer arithmetic.

10
8. Searching

● Locate a substring within a string using strstr() or find individual characters using
strchr() or strrchr().

9. Tokenization

● Split a string into tokens using strtok() based on delimiter characters.

10. Case Conversion

● Convert the case of characters within a string using toupper() and tolower().

11. String Formatting

● Format strings for specific output using sprintf() or snprintf().

12. Memory Management

● Allocate memory dynamically for strings using malloc(), calloc(), and manage it using
free().

13. Error Handling

● Handle errors during string operations using strerror() to retrieve error messages
associated with specific error codes.

1.7 Pointers
1. Definition

● A pointer is a variable that stores the memory address of another variable. Pointers are
declared using the * operator.

2. Importance

● Memory Management: Pointers provide direct access to memory, allowing efficient


manipulation of arrays, dynamic memory allocation, and complex data structures like
linked lists.
● Performance: Using pointers can lead to more efficient code, as it avoids copying large
amounts of data.
● Flexibility: Pointers enable functions to modify variables and structures outside their
local scope, allowing the implementation of more flexible and powerful functions.

3. Basic Operations

● Declaration: Pointers are declared by specifying the type of data they point to, followed
by an asterisk (*). For example, int *p; declares a pointer to an integer.
● Initialization: Pointers are initialized by assigning them the address of a variable using
the address-of operator (&). For example, p = &x; assigns the address of x to p.
● Dereferencing: The value stored at the memory address pointed to by a pointer is
accessed using the dereference operator (*). For example, *p gives the value of x if p is
pointing to x.

11
4. Pointer Arithmetic

● Pointers support arithmetic operations like addition and subtraction. This is useful for
iterating through arrays. For example, incrementing a pointer (p++) moves it to the next
memory location of its type.

5. Arrays and Pointers

● Arrays and pointers are closely related. The name of an array is a pointer to its first
element. Accessing elements via pointers can be done using pointer arithmetic.

6. Dynamic Memory Allocation

● Pointers are essential for dynamic memory allocation functions such as malloc(), calloc(),
realloc(), and free(). These functions manage memory at runtime, providing flexibility for
data structures whose size cannot be determined at compile time.

7. Example Code in C
#include <stdio.h>
#include <stdlib.h>

int main() {
// Pointer declaration and initialization
int x = 10;
int *p = &x;

// Accessing value using pointer (dereferencing)


printf("Value of x: %d\n", *p);

// Dynamic memory allocation


int *arr = (int *)malloc(5 * sizeof(int));
for (int i = 0; i < 5; i++) {
arr[i] = i + 1;
}

// Accessing array elements using pointers


for (int i = 0; i < 5; i++) {
printf("arr[%d] = %d\n", i, *(arr + i));
}

// Freeing allocated memory


free(arr);

return 0;
}

In this example:

● A pointer to an integer is declared and initialized.


● The pointer is used to access and modify the value of the integer.
● Dynamic memory allocation is demonstrated, with an array of integers allocated and
accessed using pointers.
● Finally, the allocated memory is freed to avoid memory leaks.

Pointers are a powerful feature in C that provide significant control over memory and enable
efficient and flexible programming techniques. Understanding pointers is essential for advanced
C programming, especially in systems programming, embedded systems, and
performance-critical applications.

12
1.8 Sorting Algorithms
1. Bubble Sort
#include <stdio.h>

void bubbleSort(int arr[], int n) {


for (int i = 0; i < n-1; i++) {
for (int j = 0; j < n-i-1; j++) {
if (arr[j] > arr[j+1]) {
// Swap arr[j] and arr[j+1]
int temp = arr[j];
arr[j] = arr[j+1];
arr[j+1] = temp;
}
}
}
}

void printArray(int arr[], int size) {


for (int i = 0; i < size; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}

int main() {
int arr[] = {64, 34, 25, 12, 22, 11, 90};
int n = sizeof(arr)/sizeof(arr[0]);
bubbleSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}

2. Insertion Sort
#include <stdio.h>

void insertionSort(int arr[], int n) {


for (int i = 1; i < n; i++) {
int key = arr[i];
int j = i - 1;

// Move elements of arr[0..i-1], that are greater than key, to one position ahead
of their current position
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j = j - 1;
}
arr[j + 1] = key;
}
}

void printArray(int arr[], int size) {


for (int i = 0; i < size; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}

13
int main() {
int arr[] = {12, 11, 13, 5, 6};
int n = sizeof(arr) / sizeof(arr[0]);
insertionSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}

3. Merge Sort
#include <stdio.h>
#include <stdlib.h>

void merge(int arr[], int l, int m, int r) {


int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;

int L[n1], R[n2];

for (i = 0; i < n1; i++)


L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1 + j];

i = 0;
j = 0;
k = l;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = R[j];
j++;
}
k++;
}

while (i < n1) {


arr[k] = L[i];
i++;
k++;
}

while (j < n2) {


arr[k] = R[j];
j++;
k++;
}
}

void mergeSort(int arr[], int l, int r) {


if (l < r) {
int m = l + (r - l) / 2;

mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);

14
merge(arr, l, m, r);
}
}

void printArray(int A[], int size) {


for (int i = 0; i < size; i++)
printf("%d ", A[i]);
printf("\n");
}

int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int arr_size = sizeof(arr) / sizeof(arr[0]);

printf("Given array is \n");


printArray(arr, arr_size);

mergeSort(arr, 0, arr_size - 1);

printf("\nSorted array is \n");


printArray(arr, arr_size);
return 0;
}

4. Quick Sort
#include <stdio.h>

void swap(int* a, int* b) {


int t = *a;
*a = *b;
*b = t;
}

int partition(int arr[], int low, int high) {


int pivot = arr[high];
int i = (low - 1);

for (int j = low; j <= high - 1; j++) {


if (arr[j] < pivot) {
i++;
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i + 1], &arr[high]);
return (i + 1);
}

void quickSort(int arr[], int low, int high) {


if (low < high) {
int pi = partition(arr, low, high);

quickSort(arr, low, pi - 1);


quickSort(arr, pi + 1, high);
}
}

void printArray(int arr[], int size) {


for (int i = 0; i < size; i++)

15
printf("%d ", arr[i]);
printf("\n");
}

int main() {
int arr[] = {10, 7, 8, 9, 1, 5};
int n = sizeof(arr) / sizeof(arr[0]);
quickSort(arr, 0, n - 1);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}

16
Chapter 2

Recursion, Backtracking & OOPS


2.1 Recursion
1. Definition

● Recursion is a programming technique where a function calls itself directly or indirectly


to solve a problem.

2. Types of Recursion

● Functional Recursion:
○ Involves a function that calls itself to solve smaller instances of the same problem
until a base case is reached.
● Parameterized Recursion:
○ Extends functional recursion by passing additional parameters that change with
each recursive call, influencing the problem-solving process.

3. Importance

● Simplifies complex problems by breaking them down into smaller, more manageable
sub-problems.
● Facilitates elegant solutions in scenarios like tree traversal, factorial calculations, and
sorting algorithms.

4. Example Code in C
#include <stdio.h>
// Functional Recursion Example: Factorial
int factorial(int n) {
if (n <= 1)
return 1; // Base case: factorial of 0 and 1 is 1
else
return n * factorial(n - 1); // Recursive call
}
// Parameterized Recursion Example: Fibonacci Series
int fibonacci(int n, int a, int b) {
if (n == 0)
return a; // Base case: return first number of Fibonacci series
else if (n == 1)
return b; // Base case: return second number of Fibonacci series
else
return fibonacci(n - 1, b, a + b); // Recursive call with parameters updated
}
int main() {
// Example of Functional Recursion: Calculating Factorial
int num = 5;
printf("Factorial of %d is %d\n", num, factorial(num));
// Example of Parameterized Recursion: Generating Fibonacci Series
int terms = 7;
printf("Fibonacci series up to %d terms: ", terms);
for (int i = 0; i < terms; i++) {
printf("%d ", fibonacci(i, 0, 1));
}

17
printf("\n");
return 0;
}

2.2 Backtracking
1. Definition

● Backtracking is a technique for solving problems recursively by attempting to build a


solution incrementally, one piece at a time, and backing out (backtracking) when a
dead-end is reached.

2. Importance

● Useful for solving problems that involve finding a sequence of decisions or choices that
lead to a solution.
● Commonly used in constraint satisfaction problems like Sudoku, N-Queens, and graph
traversal.

3. Key Concepts

● Decision Space: Each step in backtracking involves making a decision and exploring all
possible choices.
● Recursion: The backtracking algorithm typically uses recursion to explore each decision
path.
● Backtracking: If a path does not lead to a solution, the algorithm backtracks to explore
other paths.

4. Implementation Example: N-Queens Problem

The N-Queens problem involves placing N queens on an N×N chessboard such that no two
queens threaten each other. Here’s an implementation of the N-Queens problem using
backtracking in C:
#include <stdio.h>
#include <stdbool.h>

#define N 8 // Size of the chessboard (8x8 for this example)

// Function to print the chessboard with queens placed


void printSolution(int board[N][N]) {
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
printf("%c ", board[i][j] ? 'Q' : '.');
}
printf("\n");
}
printf("\n");
}

// Function to check if a queen can be placed safely at board[row][col]


bool isSafe(int board[N][N], int row, int col) {
// Check if there's a queen in the same column
for (int i = 0; i < row; i++) {
if (board[i][col])
return false;
}

// Check upper diagonal on left side

18
for (int i = row, j = col; i >= 0 && j >= 0; i--, j--) {
if (board[i][j])
return false;
}

// Check upper diagonal on right side


for (int i = row, j = col; i >= 0 && j < N; i--, j++) {
if (board[i][j])
return false;
}

return true;
}

// Recursive function to solve N-Queens problem


bool solveNQueens(int board[N][N], int row) {
if (row >= N)
return true; // All queens are placed

for (int col = 0; col < N; col++) {


if (isSafe(board, row, col)) {
board[row][col] = 1; // Place queen

if (solveNQueens(board, row + 1))


return true; // If placing queen leads to a solution

board[row][col] = 0; // Backtrack if no solution found


}
}

return false; // No solution found for this row


}

// Main function
int main() {
int board[N][N] = {0};

if (solveNQueens(board, 0)) {
printf("Solution found:\n");
printSolution(board);
} else {
printf("No solution exists for %dx%d board.\n", N, N);
}

return 0;
}

In this example:

● solveNQueens() is a recursive function that attempts to place queens on the chessboard


row by row.
● isSafe() function checks whether a queen can be placed safely at a given position on the
board.
● printSolution() function prints the board with queens represented by 'Q' and empty
squares by '.'.

2.3 Object Oriented Programming System (OOPS)

19
1. Classes and Objects

● Definition: A class is a blueprint for creating objects that encapsulates data (attributes)
and behaviors (methods) into a single unit.
● Example:
// Class declaration
class Car {
private:
// Attributes
int speed;
string model;

public:
// Constructor
Car(int s, string m) : speed(s), model(m) {}

// Method to display information


void displayInfo() {
cout << "Model: " << model << ", Speed: " << speed << " km/h" << endl;
}
};

// Creating objects of class Car


Car car1(120, "Toyota");
Car car2(150, "Honda");

// Accessing object methods


car1.displayInfo(); // Output: Model: Toyota, Speed: 120 km/h
car2.displayInfo(); // Output: Model: Honda, Speed: 150 km/h

2. Encapsulation

● Definition: Encapsulation binds data (attributes) and methods (behaviors) that operate
on the data into a single unit (class), protecting the internal state of an object from
outside interference.
● Example:
class BankAccount {
private:
float balance;

public:
void deposit(float amount) {
balance += amount;
}

void withdraw(float amount) {


if (amount <= balance) {
balance -= amount;
} else {
cout << "Insufficient balance." << endl;
}
}

float getBalance() {
return balance;
}
};

20
3. Inheritance

● Definition: Inheritance allows one class (derived or child class) to inherit the attributes
and methods of another class (base or parent class), promoting code reuse and
establishing a hierarchy.
● Example:
// Base class
class Animal {
public:
void sound() {
cout << "Animal makes a sound" << endl;
}
};

// Derived class
class Dog : public Animal {
public:
void sound() {
cout << "Dog barks" << endl;
}
};

// Creating objects
Animal animal;
Dog dog;

// Calling methods
animal.sound(); // Output: Animal makes a sound
dog.sound(); // Output: Dog barks

4. Polymorphism

● Definition: Polymorphism allows objects of different classes to be treated as objects of a


common superclass, enabling different behaviors to be invoked through the same
interface.
● Example:
// Base class
class Shape {
public:
virtual void draw() {
cout << "Drawing shape" << endl;
}
};

// Derived classes
class Circle : public Shape {
public:
void draw() override {
cout << "Drawing circle" << endl;
}
};

class Rectangle : public Shape {


public:
void draw() override {
cout << "Drawing rectangle" << endl;
}
};

21
// Function using polymorphism
void drawShape(Shape *shape) {
shape->draw();
}

// Usage
Circle circle;
Rectangle rectangle;
drawShape(&circle); // Output: Drawing circle
drawShape(&rectangle); // Output: Drawing rectangle

5. Abstraction

● Definition: Abstraction focuses on hiding the complex implementation details of a class


and providing only the essential features to the outside world.
● Example:
// Abstract class
class Shape {
public:
virtual void draw() = 0; // Pure virtual function
};

// Concrete classes
class Circle : public Shape {
public:
void draw() override {
cout << "Drawing circle" << endl;
}
};

class Rectangle : public Shape {


public:
void draw() override {
cout << "Drawing rectangle" << endl;
}
};

// Usage
Circle circle;
Rectangle rectangle;
circle.draw(); // Output: Drawing circle
rectangle.draw(); // Output: Drawing rectangle

Object-Oriented Programming (OOP) promotes modular, reusable, and maintainable code by


organizing data and behaviors into classes and objects. It facilitates better software design,
implementation, and scalability, making it a fundamental concept in modern programming
paradigms like C++.

22
Chapter 3
Linear Data Structures
3.1 Linked List
1. Introduction

A linked list is a linear collection of self-referential structures, called nodes, connected by pointer
links—hence, the term “linked” list. A linked list is accessed via a pointer to the first node of the
list. Subsequent nodes are accessed via the link pointer member stored in each node. By
convention, the link pointer in the last node of a list is set to NULL to mark the end of the list.
Data is stored in a linked list dynamically—each node is created as necessary. A node can contain
data of any type including other structs. Lists of data can be stored in arrays, but linked lists
provide several advantages. A linked list is appropriate when the number of data elements to be
represented in the data structure is unpredictable. Linked lists are dynamic, so the length of a
list can increase or decrease at execution time as necessary. The size of an array created at
compile time, however, cannot be altered. Arrays can become full. Linked lists become full only
when the system has insufficient memory to satisfy dynamic storage allocation requests.

2. Inserting and deleting nodes in a list


// Inserting and deleting nodes in a list
#include <stdio.h>
#include <stdlib.h>

// self-referential structure
struct listNode {
char data; // each listNode contains a character
struct listNode *nextPtr; // pointer to next node
};

typedef struct listNode ListNode; // synonym for struct listNode


typedef ListNode *ListNodePtr; // synonym for ListNode*

// prototypes
void insert(ListNodePtr *sPtr, char value);
char delete(ListNodePtr *sPtr, char value);
int isEmpty(ListNodePtr sPtr);
void printList(ListNodePtr currentPtr);
void instructions(void);

int main(void) {
ListNodePtr startPtr = NULL; // initially there are no nodes
char item; // char entered by user

instructions(); // display the menu


printf("%s", "? ");
unsigned int choice; // user's choice
scanf("%u", &choice);

// loop while user does not choose 3


while (choice != 3) {
switch (choice) {
case 1:
printf("%s", "Enter a character: ");

23
scanf("\n%c", &item);
insert(&startPtr, item); // insert item in list
printList(startPtr);
break;
case 2: // delete an element
// if list is not empty
if (!isEmpty(startPtr)) {
printf("%s", "Enter character to be deleted: ");
scanf("\n%c", &item);

// if character is found, remove it


if (delete(&startPtr, item)) { // remove item
printf("%c deleted.\n", item);
printList(startPtr);
}
else {
printf("%c not found.\n\n", item);
}
}
else {
puts("List is empty.\n");
}
break;
default:
puts("Invalid choice.\n");
instructions();
break;
}

printf("%s", "? ");


scanf("%u", &choice);
}

puts("End of run.");
}

// display program instructions to user


void instructions(void) {
puts("Enter your choice:\n"
" 1 to insert an element into the list.\n"
" 2 to delete an element from the list.\n"
" 3 to end.");
}

// insert a new value into the list in sorted order


void insert(ListNodePtr *sPtr, char value) {
ListNodePtr newPtr = malloc(sizeof(ListNode)); // create node

if (newPtr != NULL) { // is space available?


newPtr->data = value; // place value in node
newPtr->nextPtr = NULL; // node does not link to another node

ListNodePtr previousPtr = NULL;


ListNodePtr currentPtr = *sPtr;

// loop to find the correct location in the list


while (currentPtr != NULL && value > currentPtr->data) {
previousPtr = currentPtr; // walk to ...
currentPtr = currentPtr->nextPtr; // ... next node

24
}

// insert new node at beginning of list


if (previousPtr == NULL) {
newPtr->nextPtr = *sPtr;
*sPtr = newPtr;
}
else { // insert new node between previousPtr and currentPtr
previousPtr->nextPtr = newPtr;
newPtr->nextPtr = currentPtr;
}
}
else {
printf("%c not inserted. No memory available.\n", value);
}
}

// delete a list element


char delete(ListNodePtr *sPtr, char value) {
// delete first node if a match is found
if (value == (*sPtr)->data) {
ListNodePtr tempPtr = *sPtr; // hold onto node being removed
*sPtr = (*sPtr)->nextPtr; // de-thread the node
free(tempPtr); // free the de-threaded node
return value;
}
else {
ListNodePtr previousPtr = *sPtr;
ListNodePtr currentPtr = (*sPtr)->nextPtr;

// loop to find the correct location in the list


while (currentPtr != NULL && currentPtr->data != value) {
previousPtr = currentPtr; // walk to ...
currentPtr = currentPtr->nextPtr; // ... next node
}

// delete node at currentPtr


if (currentPtr != NULL) {
ListNodePtr tempPtr = currentPtr;
previousPtr->nextPtr = currentPtr->nextPtr;
free(tempPtr); // free the de-threaded node
return value;
}
}

return '\0';
}

// return 1 if the list is empty, 0 otherwise


int isEmpty(ListNodePtr sPtr) {
return sPtr == NULL;
}

// print the list


void printList(ListNodePtr currentPtr) {
// if list is empty
if (isEmpty(currentPtr)) {
puts("List is empty.\n");
}

25
else {
puts("The list is:");

// while not the end of the list


while (currentPtr != NULL) {
printf("%c --> ", currentPtr->data);
currentPtr = currentPtr->nextPtr;
}

puts("NULL\n");
}
}

3. Output of the code

Enter your choice:


1 to insert an element into the list.
2 to delete an element from the list.
3 to end.
? 1
Enter a character: B
The list is:
B --> NULL

? 1
Enter a character: A
The list is:
A --> B --> NULL

? 1
Enter a character: C
The list is:
A --> B --> C --> NULL

? 2
Enter character to be deleted: D
D not found.

? 2
Enter character to be deleted: B
B deleted.
The list is:
A --> C --> NULL

? 2
Enter character to be deleted: C
C deleted.
The list is:
A --> NULL

? 2
Enter character to be deleted: A
A deleted.
List is empty.

? 4
Invalid choice.

Enter your choice:


1 to insert an element into the list.
2 to delete an element from the list.
3 to end.
? 3

26
End of run.

3.2 Stack
1. Definition

● Stack is a linear data structure based on LIFO Principle (Last In First Out) in which the
insertion of a new element and removal of an existing element takes place at the same
end represented as the top of the stack.
● To implement the stack, it is required to maintain the pointer to the top of the stack ,
which is the last element to be inserted because we can access the elements only on the
top of the stack.
● LIFO(Last In First Out) Principle in Stack Data Structure:
This strategy states that the element that is inserted last will come out first. You can
take a pile of plates kept on top of each other as a real-life example. The plate which we
put last is on the top and since we remove the plate that is at the top, we can say that the
plate that was put last comes out first.

2. Types of Stack

● Fixed Size Stack : As the name suggests, a fixed size stack has a fixed size and cannot
grow or shrink dynamically. If the stack is full and an attempt is made to add an element
to it, an overflow error occurs. If the stack is empty and an attempt is made to remove an
element from it, an underflow error occurs.
● Dynamic Size Stack : A dynamic size stack can grow or shrink dynamically. When the
stack is full, it automatically increases its size to accommodate the new element, and
when the stack is empty, it decreases its size. This type of stack is implemented using a
linked list, as it allows for easy resizing of the stack.

3. Basic Operations on Stack

In order to make manipulations in a stack, there are certain operations provided to us.
● push() to insert an element into the stack
● pop() to remove an element from the stack
● top() Returns the top element of the stack.
● isEmpty() returns true if stack is empty else false.
● isFull() returns true if the stack is full else false.
(1) Push Operation in Stack Data Structure:
● Adds an item to the stack. If the stack is full, then it is said to be an Overflow condition.
● Algorithm for Push Operation:
Before pushing the element to the stack, we check if the stack is full .
If the stack is full (top == capacity-1) , then Stack Overflows and we cannot insert the
element to the stack.
Otherwise, we increment the value of top by 1 (top = top + 1) and the new value is
inserted at top position .
The elements can be pushed into the stack till we reach the capacity of the stack.
(2) Pop Operation in Stack Data Structure:
● Removes an item from the stack. The items are popped in the reversed order in which
they are pushed. If the stack is empty, then it is said to be an Underflow condition.
● Algorithm for Pop Operation:
Before popping the element from the stack, we check if the stack is empty .
If the stack is empty (top == -1), then Stack Underflows and we cannot remove any
element from the stack.
Otherwise, we store the value at top, decrement the value of top by 1 (top = top – 1) and
return the stored top value.

27
(3) Top or Peek Operation in Stack Data Structure:
● Returns the top element of the stack.
● Algorithm for Top Operation:
Before returning the top element from the stack, we check if the stack is empty.
If the stack is empty (top == -1), we simply print “Stack is empty”.
Otherwise, we return the element stored at index = top .
(4) isEmpty Operation in Stack Data Structure:
● Returns true if the stack is empty, else false.
● Algorithm for isEmpty Operation:
Check for the value of top in stack.
If (top == -1) , then the stack is empty so return true.
Otherwise, the stack is not empty so return false.
(5) isFull Operation in Stack Data Structure:
● Returns true if the stack is full, else false.
● Algorithm for isFull Operation:
Check for the value of top in stack.
If (top == capacity-1), then the stack is full so return true.
Otherwise, the stack is not full so return false.

4. Implementation of Stack in C

The basic operations that can be performed on a stack include push, pop, and peek. There are two
ways to implement a stack –
● Using Array
● Using Linked List
Code in C (Array Implementation):
// C program for array implementation of stack
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

// A structure to represent a stack


struct Stack {
int top;
unsigned capacity;
int* array;
};

// function to create a stack of given capacity. It initializes size of


// stack as 0
struct Stack* createStack(unsigned capacity)
{
struct Stack* stack = (struct Stack*)malloc(sizeof(struct Stack));
stack->capacity = capacity;
stack->top = -1;
stack->array = (int*)malloc(stack->capacity * sizeof(int));
return stack;
}

// Stack is full when top is equal to the last index


int isFull(struct Stack* stack)
{
return stack->top == stack->capacity - 1;
}

// Stack is empty when top is equal to -1


int isEmpty(struct Stack* stack)
{
return stack->top == -1;
}

28
// Function to add an item to stack. It increases top by 1
void push(struct Stack* stack, int item)
{
if (isFull(stack))
return;
stack->array[++stack->top] = item;
printf("%d pushed to stack\n", item);
}

// Function to remove an item from stack. It decreases top by 1


int pop(struct Stack* stack)
{
if (isEmpty(stack))
return INT_MIN;
return stack->array[stack->top--];
}

// Function to return the top from stack without removing it


int peek(struct Stack* stack)
{
if (isEmpty(stack))
return INT_MIN;
return stack->array[stack->top];
}

// Driver program to test above functions


int main()
{
struct Stack* stack = createStack(100);

push(stack, 10);
push(stack, 20);
push(stack, 30);

printf("%d popped from stack\n", pop(stack));

return 0;
}
Output:
10 pushed into stack
20 pushed into stack
30 pushed into stack
30 Popped from stack
Top element is : 20
Elements present in stack : 20 10
Code in C (Linked List Implementation):
// C program for linked list implementation of stack
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

// A structure to represent a stack


struct StackNode {
int data;
struct StackNode* next;
};

struct StackNode* newNode(int data)


{
struct StackNode* stackNode =
(struct StackNode*)
malloc(sizeof(struct StackNode));

29
stackNode->data = data;
stackNode->next = NULL;
return stackNode;
}

int isEmpty(struct StackNode* root)


{
return !root;
}

void push(struct StackNode** root, int data)


{
struct StackNode* stackNode = newNode(data);
stackNode->next = *root;
*root = stackNode;
printf("%d pushed to stack\n", data);
}

int pop(struct StackNode** root)


{
if (isEmpty(*root))
return INT_MIN;
struct StackNode* temp = *root;
*root = (*root)->next;
int popped = temp->data;
free(temp);

return popped;
}

int peek(struct StackNode* root)


{
if (isEmpty(root))
return INT_MIN;
return root->data;
}

int main()
{
struct StackNode* root = NULL;

push(&root, 10);
push(&root, 20);
push(&root, 30);

printf("%d popped from stack\n", pop(&root));

printf("Top element is %d\n", peek(root));

return 0;
}
Output:
10 pushed to stack
20 pushed to stack
30 pushed to stack
30 popped from stack
Top element is 20
Elements present in stack : 20 10

3.3 Queue

30
1. Introduction

A queue is similar to a checkout line in a grocery store—the first person in line is serviced first,
and other customers enter the line only at the end and wait to be serviced. Queue nodes are
removed only from the head of the queue and are inserted only at the tail of the queue. For this
reason, a queue is referred to as a first-in, first-out (FIFO) data structure. The insert and remove
operations are known as enqueue (pronounced “en-cue”) and dequeue (pronounced “dee-cue”),
respectively. Queues have many applications in computer systems. For computers that have only
a single processor, only one user at a time may be serviced. Entries for the other users are placed
in a queue. Each entry gradually advances to the front of the queue as users receive service. The
entry at the front of the queue is the next to receive service. Similarly, for today’s multicore
systems, there could be more users than there are processors, so the users not currently running
are placed in a queue until a currently busy processor becomes available.

2. Types of Queue

Simple Queue (FIFO Queue)


● Definition: A standard queue where the first element added is the first one to be removed
(First In, First Out).
● Operations: Enqueue (add to the back), Dequeue (remove from the front), Peek (view the
front element without removing it).
Circular Queue
● Definition: A queue where the last position is connected back to the first position to make
a circle.
● Operations: Similar to a simple queue, but the rear and front pointers wrap around to the
beginning of the array when they reach the end.
● Use Case: Efficient utilization of storage, particularly useful in buffering and resource
scheduling.
Priority Queue
● Definition: A queue where each element is assigned a priority, and elements are
dequeued based on their priority rather than their position in the queue.
● Operations: Enqueue (insert based on priority), Dequeue (remove the highest priority
element), Peek (view the highest priority element).
● Use Case: Task scheduling, simulation systems, and algorithms like Dijkstra’s shortest
path.
Double-Ended Queue (Deque)
● Definition: A queue where elements can be added or removed from both ends (front and
rear).
● Operations: EnqueueFront (add to the front), EnqueueRear (add to the back),
DequeueFront (remove from the front), DequeueRear (remove from the back), PeekFront,
PeekRear.
● Use Case: Palindrome checking, implementing other data structures like stacks and
queues.
Input-Restricted Deque
● Definition: A deque where insertion is restricted to one end (either front or rear), but
deletion can occur from both ends.
● Operations: Enqueue (at one end only), DequeueFront, DequeueRear, PeekFront,
PeekRear.
● Use Case: Scenarios requiring more control over insertion points.
Output-Restricted Deque
● Definition: A deque where deletion is restricted to one end (either front or rear), but
insertion can occur at both ends.
● Operations: EnqueueFront, EnqueueRear, Dequeue (at one end only), PeekFront,
PeekRear.
● Use Case: Similar to input-restricted deque but with a focus on controlled deletion points.
Double-Ended Priority Queue (DEPQ)
31
● Definition: A combination of a priority queue and a double-ended queue where you can
enqueue elements with priorities and dequeue the highest or lowest priority element from
both ends.
● Operations: Enqueue with priority, DequeueHighestPriority, DequeueLowestPriority,
PeekHighestPriority, PeekLowestPriority.
● Use Case: Complex scheduling and simulation scenarios requiring flexible priority
handling from both ends.

3. Operating and managing a queue (Enqueue and Dequeue)


// Operating and maintaining a queue
#include <stdio.h>
#include <stdlib.h>

// self-referential structure
struct queueNode {
char data; // define data as a char
struct queueNode *nextPtr; // queueNode pointer
};

typedef struct queueNode QueueNode;


typedef QueueNode *QueueNodePtr;

// function prototypes
void printQueue(QueueNodePtr currentPtr);
int isEmpty(QueueNodePtr headPtr);
char dequeue(QueueNodePtr *headPtr, QueueNodePtr *tailPtr);
void enqueue(QueueNodePtr *headPtr, QueueNodePtr *tailPtr, char value);
void instructions(void);

// function main begins program execution


int main(void) {
QueueNodePtr headPtr = NULL; // initialize headPtr
QueueNodePtr tailPtr = NULL; // initialize tailPtr
char item; // char input by user

instructions(); // display the menu


printf("%s", "? ");
unsigned int choice; // user's menu choice
scanf("%u", &choice);

// while user does not enter 3


while (choice != 3) {
switch(choice) {
// enqueue value
case 1:
printf("%s", "Enter a character: ");
scanf("\n%c", &item);
enqueue(&headPtr, &tailPtr, item);
printQueue(headPtr);
break;
// dequeue value
case 2:
// if queue is not empty
if (!isEmpty(headPtr)) {
item = dequeue(&headPtr, &tailPtr);
printf("%c has been dequeued.\n", item);
printQueue(headPtr);
}

32
else {
puts("Queue is empty.\n");
}
break;
default:
puts("Invalid choice.\n");
instructions();
break;
}

printf("%s", "? ");


scanf("%u", &choice);
}

puts("End of run.");
}

// display program instructions to user


void instructions(void) {
printf ("Enter your choice:\n"
" 1 to add an item to the queue\n"
" 2 to remove an item from the queue\n"
" 3 to end\n");
}

// insert a node at queue tail


void enqueue(QueueNodePtr *headPtr, QueueNodePtr *tailPtr, char value) {
QueueNodePtr newPtr = malloc(sizeof(QueueNode));
if (newPtr != NULL) { // is space available?
newPtr->data = value;
newPtr->nextPtr = NULL;

// if empty, insert node at head


if (isEmpty(*headPtr)) {
*headPtr = newPtr;
}
else {
(*tailPtr)->nextPtr = newPtr;
}

*tailPtr = newPtr;
}
else {
printf("%c not inserted. No memory available.\n", value);
}
}

// remove node from queue head


char dequeue(QueueNodePtr *headPtr, QueueNodePtr *tailPtr) {
char value; // node value
QueueNodePtr tempPtr; // temporary node pointer

value = (*headPtr)->data;
tempPtr = *headPtr;
*headPtr = (*headPtr)->nextPtr;

// if queue is empty
if (*headPtr == NULL) {
*tailPtr = NULL;

33
}

free(tempPtr); // reclaim memory


return value;
}

// return 1 if the queue is empty, 0 otherwise


int isEmpty(QueueNodePtr headPtr) {
return headPtr == NULL;
}

// print the queue


void printQueue(QueueNodePtr currentPtr) {
// if queue is empty
if (currentPtr == NULL) {
puts("Queue is empty.\n");
}
else {
puts("The queue is:");

// while not end of queue


while (currentPtr != NULL) {
printf("%c --> ", currentPtr->data);
currentPtr = currentPtr->nextPtr;
}

puts("NULL\n");
}
}

4. Output of the code


Enter your choice:
1 to add an item to the queue
2 to remove an item from the queue
3 to end
? 1
Enter a character: A
The queue is:
A --> NULL

? 1
Enter a character: B
The queue is:
A --> B --> NULL

? 1
Enter a character: C
The queue is:
A --> B --> C --> NULL

? 2
A has been dequeued.
The queue is:
B --> C --> NULL

? 2
B has been dequeued.
The queue is:
C --> NULL

34
? 2
C has been dequeued.
Queue is empty.

? 2
Queue is empty.

? 4
Invalid choice.

Enter your choice:


1 to add an item to the queue
2 to remove an item from the queue
3 to end
? 3
End of run.

35
Chapter 4
Binary Tree & Heap
4.1 Binary Search Tree
1. Introduction

Linked lists, stacks, and queues are examples of linear data structures. In contrast, a tree is a
nonlinear, two-dimensional data structure with distinct properties. Tree nodes have two or more
links, and this section focuses on binary trees. Binary trees are a type of tree where each node
contains two links, which may be NULL, partially filled, or fully filled.

The root node, the starting point of a tree, has links that refer to its child nodes. The left child is
the initial node of the left subtree, and the right child is the initial node of the right subtree.
Nodes with the same parent are known as siblings. A node without any children is termed a leaf
node. Unlike natural trees, computer scientists typically depict trees with the root node at the
top.

This section specifically addresses the creation of a binary search tree (BST), a special type of
binary tree. In a binary search tree, each node follows the rule that all values in the left subtree
are less than the node’s value, and all values in the right subtree are greater. Additionally, binary
search trees do not contain duplicate values. The structure of a binary search tree for a given set
of data can vary based on the sequence in which values are inserted into the tree.

2. Implementation of Binary Search Tree in C

The code creates a binary search tree and traverses it three ways—inorder, preorder and
postorder. The program generates 10 random numbers and inserts each in the tree, except that
duplicate values are discarded.

// Implementation of Binary Search Tree in C


// Creating and traversing a binary tree
// preorder, inorder, and postorder
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

// self-referential structure
struct treeNode {
struct treeNode *leftPtr; // pointer to left subtree
int data; // node value
struct treeNode *rightPtr; // pointer to right subtree
};

typedef struct treeNode TreeNode; // synonym for struct treeNode


typedef TreeNode *TreeNodePtr; // synonym for TreeNode*

// prototypes
void insertNode(TreeNodePtr *treePtr, int value);
void inOrder(TreeNodePtr treePtr);
void preOrder(TreeNodePtr treePtr);
void postOrder(TreeNodePtr treePtr);

// function main begins program execution


int main(void) {
TreeNodePtr rootPtr = NULL; // tree initially empty

36
srand(time(NULL));
puts("The numbers being placed in the tree are:");

// insert random values between 0 and 14 in the tree


for (unsigned int i = 1; i <= 10; ++i) {
int item = rand() % 15;
printf("%3d", item);
insertNode(&rootPtr, item);
}

// traverse the tree preOrder


puts("\n\nThe preOrder traversal is:");
preOrder(rootPtr);

// traverse the tree inOrder


puts("\n\nThe inOrder traversal is:");
inOrder(rootPtr);

// traverse the tree postOrder


puts("\n\nThe postOrder traversal is:");
postOrder(rootPtr);

return 0;
}

// insert node into tree


void insertNode(TreeNodePtr *treePtr, int value) {
// if tree is empty
if (*treePtr == NULL) {
*treePtr = malloc(sizeof(TreeNode));

// if memory was allocated, then assign data


if (*treePtr != NULL) {
(*treePtr)->data = value;
(*treePtr)->leftPtr = NULL;
(*treePtr)->rightPtr = NULL;
}
else {
printf("%d not inserted. No memory available.\n", value);
}
}
else { // tree is not empty
// data to insert is less than data in current node
if (value < (*treePtr)->data) {
insertNode(&((*treePtr)->leftPtr), value);
}
// data to insert is greater than data in current node
else if (value > (*treePtr)->data) {
insertNode(&((*treePtr)->rightPtr), value);
}
else { // duplicate data value ignored
printf("%s", "dup");
}
}
}

// begin inorder traversal of tree


void inOrder(TreeNodePtr treePtr) {
if (treePtr != NULL) {
inOrder(treePtr->leftPtr);
printf("%3d", treePtr->data);
inOrder(treePtr->rightPtr);

37
}
}

// begin preorder traversal of tree


void preOrder(TreeNodePtr treePtr) {
if (treePtr != NULL) {
printf("%3d", treePtr->data);
preOrder(treePtr->leftPtr);
preOrder(treePtr->rightPtr);
}
}

// begin postorder traversal of tree


void postOrder(TreeNodePtr treePtr) {
if (treePtr != NULL) {
postOrder(treePtr->leftPtr);
postOrder(treePtr->rightPtr);
printf("%3d", treePtr->data);
}
}

3. Output of the code

The numbers being placed in the tree are:


6 7 4 12 7dup 2 2dup 5 7dup 11
The preOrder traversal is:
6 4 2 5 7 12 11
The inOrder traversal is:
2 4 5 6 7 11 12
The postOrder traversal is:
2 5 4 11 12 7 6

4. Explanation of the code

Function insertNode:
The functions used in Fig. 12.19 to create a binary search tree and traverse it are recursive.
Function insertNode (lines 53–83) receives the address of the tree and an integer to be stored in
the tree as arguments. A node can be inserted only as a leaf node in a binary search tree. The
steps for inserting a node in a binary search tree are as follows:
1. If *treePtr is NULL (line 56), create a new node (line 57). Call malloc, assign the allocated
memory to *treePtr, assign to (*treePtr)->data the integer to be stored (line 61), assign to
(*treePtr)->leftPtr and (*treePtr)->rightPtr the value NULL (lines 62–63, and return control to
the caller (either main or a previous call to insertNode).
2. If the value of *treePtr is not NULL and the value to be inserted is less than (*treePtr)->data,
function insertNode is called with the address of (*treePtr)->leftPtr (line 72) to insert the node in
the left subtree of the node pointed to by treePtr. If the value to be inserted is greater than
(*treePtr)-> data, function insertNode is called with the address of (*treePtr)->rightPtr (line 77)
to insert the node in the right subtree of the node pointed to by treePtr.
The recursive steps continue until a NULL pointer is found, then Step 1 inserts the new node.

Functions inOrder, preOrder, postOrder:


Functions inOrder (lines 86–94), preOrder (lines 97–105) and postOrder (lines 108– 116) each
receive a tree (i.e., the pointer to the root node of the tree) and traverse the tree.
The steps for an inOrder traversal are:
1. Traverse the left subtree inOrder.
2. Process the value in the node.
3. Traverse the right subtree inOrder.
The value in a node is not processed until the values in its left subtree are processed.

38
The inOrder traversal of a binary search tree prints the node values in ascending order. The
process of creating a binary search tree actually sorts the data—and thus this process is called
the binary tree sort.
The steps for a preOrder traversal are:
1. Process the value in the node.
2. Traverse the left subtree preOrder.
3. Traverse the right subtree preOrder.

5. Duplicate Elimination

The binary search tree facilitates duplicate elimination. As the tree is being created, an attempt
to insert a duplicate value will be recognized because a duplicate will follow the same “go left” or
“go right” decisions on each comparison as the original value did. Thus, the duplicate will
eventually be compared with a node in the tree containing the same value. The duplicate value
may simply be discarded at this point.

6. Binary Tree Search

Searching a binary tree for a value that matches a key value is also fast. If the tree is tightly
packed, each level contains about twice as many elements as the previous level. So a binary
search tree with n elements would have a maximum of log2n levels, and thus a maximum of
log2n comparisons would have to be made either to find a match or to determine that no match
exists. This means, for example, that when searching a (tightly packed) 1,000- element binary
search tree, no more than 10 comparisons need to be made because 210 > 1,000. When searching
a (tightly packed) 1,000,000-element binary search tree, no more than 20 comparisons need to be
made because 220 > 1,000,000.

7. Other Binary Tree Operations

Algorithms are presented for several other binary tree operations such as printing a binary tree
in a two-dimensional tree format and performing a level order traversal of a binary tree. The
level order traversal visits the nodes of the tree row-by-row starting at the root node level. On
each level of the tree, the nodes are visited from left to right. Other binary tree algorithms
include allowing a binary search tree to contain duplicate values, inserting string values in a
binary tree and determining how many levels are contained in a binary tree

4.2 Heap
1. Introduction

A heap is a binary tree-based data structure that satisfies the heap property: the value of each
node is greater than or equal to the value of its children. This property makes sure that the root
node contains the maximum or minimum value (depending on the type of heap), and the values
decrease or increase as you move down the tree.

2. Types of Heaps

There are two main types of heaps:


● Max Heap: The root node contains the maximum value, and the values decrease as you move
down the tree.
● Min Heap: The root node contains the minimum value, and the values increase as you move
down the tree.

3. Heap Operations

Heap has the following Properties:

39
● Complete Binary Tree: A heap tree is a complete binary tree, meaning all levels of the tree
are fully filled except possibly the last level, which is filled from left to right. This property
ensures that the tree is efficiently represented using an array.
● Heap Property: This property ensures that the minimum (or maximum) element is always at
the root of the tree according to the heap type.
● Parent-Child Relationship: The relationship between a parent node at index ‘i’ and its
children is given by the formulas: left child at index 2i+1 and right child at index 2i+2 for
0-based indexing of node numbers.
● Efficient Insertion and Removal: Insertion and removal operations in heap trees are efficient.
New elements are inserted at the next available position in the bottom-rightmost level, and
the heap property is restored by comparing the element with its parent and swapping if
necessary. Removal of the root element involves replacing it with the last element and
heapifying down.
● Efficient Access to Extremal Elements: The minimum or maximum element is always at the
root of the heap, allowing constant-time access.

4. Properties of Heap

Heapify:
It is the process to rearrange the elements to maintain the property of heap data structure. It is
done when a certain node creates an imbalance in the heap due to some operations on that node.
It takes O(log N) to balance the tree.
● For max-heap, it balances in such a way that the maximum element is the root of that binary
tree and
● For min-heap, it balances in such a way that the minimum element is the root of that binary
tree.
Insertion:
● If we insert a new element into the heap, it will distort the properties of the heap. Therefore,
we need to perform the heapify operation to maintain the heap's properties.
This operation also takes O(logN) time.
Deletion:
● If we delete the element from the heap it always deletes the root element of the tree and
replaces it with the last element of the tree.
● Since we delete the root element from the heap it will distort the properties of the heap so we
need to perform heapify operations so that it maintains the property of the heap.
It takes O(logN) time.
getMax (For max-heap) OR getMin (For min-heap):
It finds the maximum element or minimum element for max-heap and min-heap respectively and
as we know minimum and maximum elements will always be the root node itself for min-heap
and max-heap respectively. It takes O(1) time.
removeMin or removeMax:
This operation returns and deletes the maximum element and minimum element from the
max-heap and min-heap respectively. In short, it deletes the root element of the heap binary tree.

5. Implementation of Heap in C++

maxHeapify is the function responsible for restoring the property of the Max Heap. It arranges
the node i, and its subtrees accordingly so that the heap property is maintained.
1. Suppose we are given an array, arr[] representing the complete binary tree. The left and the
right child of ith node are in indices 2*i+1 and 2*i+2.
2. We set the index of the current element, i, as the ‘MAXIMUM’.
3. If arr[2 * i + 1] > arr[i], i.e., the left child is larger than the current value, it is set as
‘MAXIMUM’.
4. Similarly if arr[2 * i + 2] > arr[i], i.e., the right child is larger than the current value, it is set
as ‘MAXIMUM’.
5. Swap the ‘MAXIMUM’ with the current element.

40
6. Repeat steps 2 to 5 till the property of the heap is restored.
The following code shows the implementation of a max-heap.

// C++ code to depict the implementation of a max heap.

#include <bits/stdc++.h>
using namespace std;

// A class for Max Heap.


class MaxHeap {
// A pointer pointing to the elements
// in the array in the heap.
int* arr;

// Maximum possible size of


// the Max Heap.
int maxSize;

// Number of elements in the


// Max heap currently.
int heapSize;

public:
// Constructor function.
MaxHeap(int maxSize);

// Heapifies a sub-tree taking the


// given index as the root.
void MaxHeapify(int);

// Returns the index of the parent


// of the element at ith index.
int parent(int i)
{
return (i - 1) / 2;
}

// Returns the index of the left child.


int lChild(int i)
{
return (2 * i + 1);
}

// Returns the index of the


// right child.
int rChild(int i)
{
return (2 * i + 2);
}

// Removes the root which in this


// case contains the maximum element.
int removeMax();

// Increases the value of the key


// given by index i to some new value.
void increaseKey(int i, int newVal);

// Returns the maximum key


// (key at root) from max heap.
int getMax()
{
return arr[0];

41
}

int curSize()
{
return heapSize;
}

// Deletes a key at given index i.


void deleteKey(int i);

// Inserts a new key 'x' in the Max Heap.


void insertKey(int x);
};

// Constructor function builds a heap


// from a given array a[]
// of the specified size.
MaxHeap::MaxHeap(int totSize)
{
heapSize = 0;
maxSize = totSize;
arr = new int[totSize];
}

// Inserting a new key 'x'.


void MaxHeap::insertKey(int x)
{
// To check whether the key
// can be inserted or not.
if (heapSize == maxSize) {
cout << "\nOverflow: Could not insertKey\n";
return;
}

// The new key is initially


// inserted at the end.
heapSize++;
int i = heapSize - 1;
arr[i] = x;

// The max heap property is checked


// and if violation occurs,
// it is restored.
while (i != 0 && arr[parent(i)] < arr[i]) {
swap(arr[i], arr[parent(i)]);
i = parent(i);
}
}

// Increases value of key at


// index 'i' to new_val.
void MaxHeap::increaseKey(int i, int newVal)
{
arr[i] = newVal;
while (i != 0 && arr[parent(i)] < arr[i]) {
swap(arr[i], arr[parent(i)]);
i = parent(i);
}
}

// To remove the root node which contains


// the maximum element of the Max Heap.
int MaxHeap::removeMax()

42
{
// Checking whether the heap array
// is empty or not.
if (heapSize <= 0)
return INT_MIN;
if (heapSize == 1) {
heapSize--;
return arr[0];
}

// Storing the maximum element


// to remove it.
int root = arr[0];
arr[0] = arr[heapSize - 1];
heapSize--;

// To restore the property


// of the Max heap.
MaxHeapify(0);

return root;
}

// In order to delete a key


// at a given index i.
void MaxHeap::deleteKey(int i)
{
// It increases the value of the key
// to infinity and then removes
// the maximum value.
increaseKey(i, INT_MAX);
removeMax();
}

// To heapify the subtree this method


// is called recursively
void MaxHeap::MaxHeapify(int i)
{
int l = lChild(i);
int r = rChild(i);
int largest = i;
if (l < heapSize && arr[l] > arr[i])
largest = l;
if (r < heapSize && arr[r] > arr[largest])
largest = r;
if (largest != i) {
swap(arr[i], arr[largest]);
MaxHeapify(largest);
}
}

// Driver program to test above functions.


int main()
{
// Assuming the maximum size of the heap to be 15.
MaxHeap h(15);

// Asking the user to input the keys:


int k, i, n = 6, arr[10];
cout << "Entered 6 keys:- 3, 10, 12, 8, 2, 14 \n";
h.insertKey(3);
h.insertKey(10);
h.insertKey(12);

43
h.insertKey(8);
h.insertKey(2);
h.insertKey(14);

// Printing the current size


// of the heap.
cout << "The current size of the heap is "
<< h.curSize() << "\n";

// Printing the root element which is


// actually the maximum element.
cout << "The current maximum element is " << h.getMax()
<< "\n";

// Deleting key at index 2.


h.deleteKey(2);

// Printing the size of the heap


// after deletion.
cout << "The current size of the heap is "
<< h.curSize() << "\n";

// Inserting 2 new keys into the heap.


h.insertKey(15);
h.insertKey(5);
cout << "The current size of the heap is "
<< h.curSize() << "\n";
cout << "The current maximum element is " << h.getMax()
<< "\n";

return 0;
}

Output of the code:


Entered 6 keys:- 3, 10, 12, 8, 2, 14
The current size of the heap is 6
The current maximum element is 14
The current size of the heap is 5
The current size of the heap is 7
The current maximum element is 15

6. Time Complexity of Heap

Consider the following algorithm for building a Heap of an input array A.


BUILD-HEAP(A)
heapsize := size(A);
for i := floor(heapsize/2) downto 1
do HEAPIFY(A, i);
end for
END
A quick look over the above algorithm suggests that the running time is O(n * log(n)) since each
call to Heapify costs O(log(n)) and Build-Heap costs O(n) makes such calls.
This upper bound, though correct, is not asymptotically tight.
We can derive a tighter bound by observing that the running time of Heapify depends on the
height of the tree ‘h’ (which is equal to lg(n), where n is a number of nodes) and the heights of
most sub-trees are small. The height ‘h’ increases as we move upwards along the tree. Line-3 of
Build-Heap runs a loop from the index of the last internal node (heapsize/2) with height=1, to the
index of root(1) with height = log(n). Hence, Heapify takes a different time for each node, which
is:

44
For finding the Time Complexity of building a heap, we must know the number of nodes having
𝑛
height h. For this we use the fact that, A heap of size n has at most ℎ+1 nodes with height h.
2
To derive the time complexity, we express the total cost of Build-Heap as-
𝑙𝑜𝑔(𝑛) ∞
𝑛 ℎ
𝑇(𝑛) = ∑ ℎ+1 * 𝑂(ℎ) = 𝑂(𝑛 * ∑ ℎ )
ℎ=0 2 ℎ=0 2
Step 2 uses the properties of the Big-Oh notation to ignore the ceiling function and the constant
ℎ+1
2(2 ). Similarly in Step 3, the upper limit of the summation can be increased to infinity since
we are using Big-Oh notation. Sum of infinite G.P. (x < 1)

𝑛 1
∑ 𝑥 = 1−𝑥
𝑛=0
On differentiating both sides and multiplying by x, we get

𝑛 𝑥
∑ 𝑛𝑥 = 2
𝑛=0 (1 − 𝑥)

Putting the result obtained in (3) back in our derivation (1), we get
1
2
𝑇(𝑛) = 𝑂(𝑛 * 1 2
) = 𝑂(2𝑛) = 𝑂(𝑛)
(1 − 2
)

Hence proved that the Time complexity for Building a Binary Heap is O(n).

7. Applications of Heaps

● Priority queues: The heap data structure is commonly used to implement priority queues,
where elements are stored in a heap and ordered based on their priority. This allows
constant-time access to the highest-priority element, making it an efficient data structure for
managing tasks or events that require prioritization.
● Heapsort algorithm: The heap data structure is the basis for the heapsort algorithm, which is
an efficient sorting algorithm with a worst-case time complexity of O(n log n). The heapsort
algorithm is used in various applications, including database indexing and numerical
analysis.
● Memory management: The heap data structure is used in memory management systems to
allocate and deallocate memory dynamically. The heap is used to store the memory blocks,
and the heap data structure is used to efficiently manage the memory blocks and allocate
them to programs as needed.
● Graph algorithms: The heap data structure is used in various graph algorithms, including
Dijkstra’s algorithm, Prim’s algorithm, and Kruskal’s algorithm. These algorithms require
efficient priority queue implementation, which can be achieved using the heap data
structure.
● Job scheduling: The heap data structure is used in job scheduling algorithms, where tasks
are scheduled based on their priority or deadline. The heap data structure allows efficient
access to the highest-priority task, making it a useful data structure for job scheduling
applications.
Real-Time Application of Heap:
● Patient treatment: In a hospital, an emergency patient, or the patient with more injury is
treated first. Here the priority is the degree of injury.
● Systems concerned with security use heap sort, like the Linux kernel.

8. Advantages of Heaps

● Efficient insertion and deletion:


The heap data structure allows efficient insertion and deletion of elements. When a new
element is added to the heap, it is placed at the bottom of the heap and moved up to its
correct position using the heapify operation. Similarly, when an element is removed from the
heap, it is replaced by the bottom element, and the heap is restructured using the heapify
operation.

45
● Efficient priority queue:
The heap data structure is commonly used to implement a priority queue, where the highest
priority element is always at the top of the heap. The heap allows constant-time access to the
highest priority element, making it an efficient data structure for implementing priority
queues.
● Guaranteed access to the maximum or minimum element:
In a max-heap, the top element is always the maximum element, and in a min-heap, the top
element is always the minimum element. This provides guaranteed access to the maximum
or minimum element in the heap, making it useful in algorithms that require access to the
extreme values.
● Space efficiency:
The heap data structure requires less memory compared to other data structures, such as
linked lists or arrays, as it stores elements in a complete binary tree structure.
● Heap-sort algorithm:
The heap data structure forms the basis for the heap-sort algorithm, which is an efficient
sorting algorithm that has a worst-case time complexity of O(n log n).

9. Disadvantages of Heaps

● Lack of flexibility: The heap data structure is not very flexible, as it is designed to maintain a
specific order of elements. This means that it may not be suitable for some applications that
require more flexible data structures.
● Not ideal for searching: While the heap data structure allows efficient access to the top
element, it is not ideal for searching for a specific element in the heap. Searching for an
element in a heap requires traversing the entire tree, which has a time complexity of O(n).
● Not a stable data structure: The heap data structure is not a stable data structure, which
means that the relative order of equal elements may not be preserved when the heap is
constructed or modified.
● Memory management: The heap data structure requires dynamic memory allocation, which
can be a challenge in some systems with limited memory. In addition, managing the memory
allocated to the heap can be complex and error-prone.
● Complexity: While the heap data structure allows efficient insertion, deletion, and priority
queue implementation, it has a worst-case time complexity of O(n log n), which may not be
optimal for some applications that require faster algorithms.

10. Comparison between Heap and Tree

S. No Heap Tree

1 Heap is a kind of Tree itself. The tree is not a kind of heap.

Whereas a Tree can be of various types


Usually, Heap is of two types,
2 for eg. binary Tree, BST(Binary Search
Max-Heap and Min-Heap.
tree), AVL tree, etc.

Binary Tree is not ordered but BST is


3 Heap is ordered.
ordered.

Insert and remove will take O(N) in


Insert and remove will take O(log(N))
4 the worst case in case the tree is
time in the worst case.
skewed.

46
Finding Min/Max value in Heap is Finding Min/Max value in BST is
5
O(1) in the respective Min/Max heap. O(log(N)) and Binary Tree is O(N).

A tree can also be referred to as a


Heap can also be referred to as
6 connected undirected graph with no
Priority Queue.
cycle.

Heap can be built in linear time BST: O(N * log(N)) and Binary Tree:
7
complexity. O(N).

Applications: Prim’s Algorithm and Applications: Spanning Trees, Trie, B+


8
Dijkstra’s algorithm. Tree, BST, Heap.

11. Other Types of Heaps

Binomial Heap:
A Binomial Tree of order 0 has 1 node. A Binomial Tree of order k can be constructed by taking
two binomial trees of order k-1 and making one the leftmost child of the other.
A Binomial Tree of order k has the following properties.
● It has exactly 2k nodes.
● It has depth as k.
● There are exactly kaiCi nodes at depth i for i = 0, 1, . . . , k.
● The root has degree k and children of the root are themselves Binomial Trees with order k-1,
k-2,.. 0 from left to right.
Fibonacci Heap:
A Fibonacci heap is a data structure used for implementing priority queues. It is a type of heap
data structure, but with several improvements over the traditional binary heap and binomial
heap data structures.
The key advantage of a Fibonacci heap over other heap data structures is its fast amortized
running time for operations such as insert, merge and extract-min, making it one of the most
efficient data structures for these operations. The running time of these operations in a Fibonacci
heap is O(1) for insert, O(log n) for extract-min and O(1) amortized for merge.
A Fibonacci heap is a collection of trees, where each tree is a heap-ordered multi-tree, meaning
that each tree has a single root node with its children arranged in a heap-ordered manner. The
trees in a Fibonacci heap are organized in such a way that the root node with the smallest key is
always at the front of the list of trees.
In a Fibonacci heap, when a new element is inserted, it is added as a singleton tree. When two
heaps are merged, the root list of one heap is simply appended to the root list of the other heap.
When the extract-min operation is performed, the tree with the minimum root node is removed
from the root list and its children are added to the root list.
One unique feature of a Fibonacci heap is the use of lazy consolidation, which is a technique for
improving the efficiency of the merge operation. In lazy consolidation, the merging of trees is
postponed until it is necessary, rather than performed immediately. This allows for the merging
of trees to be performed more efficiently in batches, rather than one at a time.
Leftist Heap:
A leftist tree, also known as a leftist heap, is a type of binary heap data structure used for
implementing priority queues. Like other heap data structures, it is a complete binary tree,
meaning that all levels are fully filled except possibly the last level, which is filled from left to
right.
1. In a leftist tree, the priority of the node is determined by its key value, and the node with the
smallest key value is designated as the root node. The left subtree of a node in a leftist tree is
always larger than the right subtree, based on the number of nodes in each subtree. This is
known as the “leftist property.”

47
2. One of the key features of a leftist tree is the calculation and maintenance of the “null path
length” of each node, which is defined as the distance from the node to the nearest null
(empty) child. The root node of a leftist tree has the shortest null path length of any node in
the tree.
3. The main operations performed on a leftist tree include insert, extract-min and merge. The
insert operation simply adds a new node to the tree, while the extract-min operation removes
the root node and updates the tree structure to maintain the leftist property. The merge
operation combines two leftist trees into a single leftist tree by linking the root nodes and
maintaining the leftist property.
In summary, a leftist tree is a type of binary heap data structure used for implementing priority
queues. Its key features include the leftist property, which ensures that the left subtree of a node
is always larger than the right subtree, and the calculation and maintenance of the null path
length, which is used to maintain the efficiency of operations such as extract-min and merge.
K-ary Heap:
K-ary heaps are a generalization of binary heaps (K=2) in which each node has K children
instead of 2. Just like binary heap, it follows two properties:
1. Nearly complete binary tree, with all levels having maximum number of nodes except the
last, which is filled in left to right manner.
2. Like Binary Heap, it can be divided into two categories:
○ Max k-ary heap (key at root is greater than all descendants and same is recursively true
for all nodes).
○ Min k-ary heap (key at root is lesser than all descendants and same is recursively true for
all nodes)

48
Chapter 5
Hashing & Tries
5.1 Hashing
1. Introduction

Hashing in Data Structures refers to the process of transforming a given key to another value. It
involves mapping data to a specific index in a hash table using a hash function that enables fast
retrieval of information based on its key. The transformation of a key to the corresponding value
is done using a Hash Function and the value obtained from the hash function is called Hash
Code.
Need for Hash Data Structure
Every day, the data on the internet is increasing multifold and it is always a struggle to store this
data efficiently. In day-to-day programming, this amount of data might not be that big, but still,
it needs to be stored, accessed, and processed easily and efficiently. A very common data
structure that is used for such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new data structure!
The answer to this is in the word ” efficiency “. Though storing in Array takes O(1) time,
searching in it takes at least O(log n) time. This time appears to be small, but for a large data set,
it can cause a lot of problems and this, in turn, makes the Array data structure inefficient.
So now we are looking for a data structure that can store the data and search in it in constant
time, i.e. in O(1) time. This is how Hashing data structure came into play. With the introduction
of the Hash data structure, it is now possible to easily store data in constant time and retrieve it
in constant time as well.

2. Components of Hashing

There are majorly three components of hashing:


1. Key: A Key can be anything string or integer which is fed as input in the hash function, the
technique that determines an index or location for storage of an item in a data structure.
2. Hash Function: The hash function receives the input key and returns the index of an element
in an array called a hash table. The index is known as the hash index .
3. Hash Table: Hash table is a data structure that maps keys to values using a special function
called a hash function. Hash stores the data in an associative manner in an array where each
data value has its own unique index.

3. Working of Hashing

Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a table.
Our main objective here is to search or update the values stored in the table quickly in O(1) time
and we are not concerned about the ordering of strings in the table. So the given set of strings
can act as a key and the string itself will act as the value of the string but how to store the value
corresponding to the key?
● Step 1: We know that hash functions (which is some mathematical formula) are used to
calculate the hash value which acts as the index of the data structure where the value will be
stored.
● Step 2: So, let’s assign
○ “a” = 1,
○ “b”=2, .. etc, to all alphabetical characters.
● Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,

49
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
● Step 4: Now, assume that we have a table of size 7 to store these strings. The hash function
that is used here is the sum of the characters in key mod Table size . We can compute the
location of the string in the array by taking the sum(string) mod 7 .
● Step 5: So we will then store
○ “ab” in 3 mod 7 = 3,
○ “cd” in 7 mod 7 = 0, and
○ “efg” in 18 mod 7 = 4.
The above technique enables us to calculate the location of a given string by using a simple hash
function and rapidly find the value that is stored in that location. Therefore the idea of hashing
seems like a great way to store (key, value) pairs of the data in a table.

4. Hash Function

The hash function creates a mapping between key and value, this is done through the use of
mathematical formulas known as hash functions. The result of the hash function is referred to as
a hash value or hash. The hash value is a representation of the original string of characters but
usually smaller than the original.
For example: Consider an array as a Map where the key is the index and the value is the value at
that index. So for an array A if we have index i which will be treated as the key then we can find
the value by simply looking at the value at A[i].
Key Properties of Hash Function
● Deterministic: A hash function must consistently produce the same output for the same
input.
● Fixed Output Size: The output of a hash function should have a fixed size, regardless of the
size of the input.
● Efficiency: The hash function should be able to process input quickly.
● Uniformity: The hash function should distribute the hash values uniformly across the output
space to avoid clustering.
● Preimage Resistance: It should be computationally infeasible to reverse the hash function,
i.e., to find the original input given a hash value.
● Collision Resistance: It should be difficult to find two different inputs that produce the same
hash value.
● Avalanche Effect: A small change in the input should produce a significantly different hash
value.
Properties of Good Hash Function
A good hash function should have the following properties:
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position is equally likely for each.
3. Should minimize collisions.
4. Should have a low load factor(number of items in the table divided by the size of the table).

5. Types of Hash Functions

1. Division Method
The division method involves dividing the key by a prime number and using the remainder as the
hash value.
h(k) = k mod m
Where k is the key and m is a prime number.
Advantages:
● Simple to implement.
● Works well when m is a prime number.
Disadvantages:
● Poor distribution if m is not chosen wisely.
2. Multiplication Method

50
In the multiplication method, a constant A (0 < A < 1) is used to multiply the key. The fractional
part of the product is then multiplied by 𝑚m to get the hash value.
h(k)=⌊m(kAmod1)⌋
Where ⌊ ⌋ denotes the floor function.
Advantages:
● Less sensitive to the choice of m.
Disadvantages:
● More complex than the division method.
3. Mid-Square Method
In the mid-square method, the key is squared, and the middle digits of the result are taken as the
hash value.
Steps:
1. Square the key.
2. Extract the middle digits of the squared value.
Advantages:
● Produces a good distribution of hash values.
Disadvantages:
● May require more computational effort.
4. Folding Method
The folding method involves dividing the key into equal parts, summing the parts, and then
taking the modulo with respect to 𝑚m.
Steps:
1. Divide the key into parts.
2. Sum the parts.
3. Take the modulo 𝑚m of the sum.
Advantages:
● Simple and easy to implement.
Disadvantages:
● Depends on the choice of partitioning scheme.
5. Cryptographic Hash Functions
Cryptographic hash functions are designed to be secure and are used in cryptography. Examples
include MD5, SHA-1, and SHA-256.
Characteristics:
● Preimage resistance.
● Second pre-image resistance.
● Collision resistance.
Advantages:
● High security.
Disadvantages:
● Computationally intensive.
6. Universal Hashing
Universal hashing uses a family of hash functions to minimize the chance of collision for any
given set of inputs.
h(k)=((a⋅k+b)modp)modm
Where a and b are randomly chosen constants, p is a prime number greater than m, and k is the
key.
Advantages:
● Reduces the probability of collisions.
Disadvantages:
● Requires more computation and storage.
7. Perfect Hashing
Perfect hashing aims to create a collision-free hash function for a static set of keys. It guarantees
that no two keys will hash to the same value.
Types:

51
● Minimal Perfect Hashing: Ensures that the range of the hash function is equal to the
number of keys.
● Non-minimal Perfect Hashing: The range may be larger than the number of keys.
Advantages:
● No collisions.
Disadvantages:
● Complex to construct.

6. Time and Space Complexity of a Hash Function

● Time complexity: O(n)


● Space complexity: O(1)

7. Collision in Hashing

Collision in Hashing occurs when two different keys map to the same hash value. Hash collisions
can be intentionally created for many hash algorithms. The probability of a hash collision
depends on the size of the algorithm, the distribution of hash values and the efficiency of the
Hash function.
The hashing process generates a small number for a big key, so there is a possibility that two
keys could produce the same value. The situation where the newly inserted key maps to an
already occupied, and it must be handled using some collision handling technology.
How to handle Collisions?
There are mainly two methods to handle collision:
1. Separate Chaining
2. Open Addressing

8. Separate Chaining

The idea behind separate chaining is to implement the array as a linked list called a chain.
The linked list data structure is used to implement this technique. So what happens is, when
multiple elements are hashed into the same slot index, then these elements are inserted into a
singly-linked list which is known as a chain.
Here, all those elements that hash into the same slot index are inserted into a linked list. Now,
we can use a key K to search in the linked list by just linearly traversing. If the intrinsic key for
any entry is equal to K then it means that we have found our entry. If we have reached the end of
the linked list and yet we haven’t found our entry then it means that the entry does not exist.
Hence, the conclusion is that in separate chaining, if two different elements have the same hash
value then we store both the elements in the same linked list one after the other.
Performance of Chaining:
Performance of hashing can be evaluated under the assumption that each key is equally likely to
be hashed to any slot of the table (simple uniform hashing).
m = Number of slots in hash table
n = Number of keys to be inserted in hash table
Load factor α = n/m
Expected time to search = O(1 + α)
Expected time to delete = O(1 + α)
Time to insert = O(1)
Time complexity of search insert and delete is O(1) if α is O(1)

9. Open Addressing

Open Addressing is a method for handling collisions. In Open Addressing, all elements are stored
in the hash table itself. So at any point, the size of the table must be greater than or equal to the
total number of keys (Note that we can increase table size by copying old data if needed). This

52
approach is also known as closed hashing. This entire procedure is based upon probing. We will
understand the types of probing ahead:
● Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.
● Search(k): Keep probing until the slot’s key doesn’t become equal to k or an empty slot is
reached.
● Delete(k): Delete operation is interesting. If we simply delete a key, then the search may fail.
So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.
Different ways of Open Addressing:
1. Linear Probing:
In linear probing, the hash table is searched sequentially that starts from the original location of
the hash. If the location that we get is already occupied, then we check for the next location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
For example, The typical gap between two probes is 1 as seen in the example below:
Let hash(x) be the slot index computed using a hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that are
to be inserted are 50, 70, 76, 85, 93.
2. Quadratic Probing:
If you observe carefully, then you will understand that the interval between probes will increase
proportionally to the hash value. Quadratic probing is a method with the help of which we can
solve the problem of clustering that was discussed above. This method is also known as the
mid-square method. In this method, we look for the i2‘th slot in the ith iteration. We always start
from the original hash location. If only the location is occupied then we check the other slots.
let hash(x) be the slot index computed using the hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision resolution
strategy to be f(i) = i2 . Insert = 22, 30, and 50.
3. Double Hashing:
The intervals that lie between probes are computed by another hash function. Double hashing is
a technique that reduces clustering in an optimized way. In this technique, the increments for the
probing sequence are computed by using another hash function. We use another hash function
hash2(x) and look for the i*hash2(x) slot in the ith rotation.
let hash(x) be the slot index computed using the hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-function
is h1​(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
Code for Double Hashing:
/*
** Handling of collision via open addressing
** Method for Probing: Double Hashing
*/

#include <iostream>
#include <vector>
#include <bitset>
using namespace std;
#define MAX_SIZE 10000001ll

class doubleHash {

53
int TABLE_SIZE, keysPresent, PRIME;
vector<int> hashTable;
bitset<MAX_SIZE> isPrime;

/* Function to set sieve of Eratosthenes. */


void __setSieve(){
isPrime[0] = isPrime[1] = 1;
for(long long i = 2; i*i <= MAX_SIZE; i++)
if(isPrime[i] == 0)
for(long long j = i*i; j <= MAX_SIZE; j += i)
isPrime[j] = 1;

int inline hash1(int value){


return value%TABLE_SIZE;
}

int inline hash2(int value){


return PRIME - (value%PRIME);
}

bool inline isFull(){


return (TABLE_SIZE == keysPresent);
}

public:

doubleHash(int n){
__setSieve();
TABLE_SIZE = n;

/* Find the largest prime number smaller than hash table's size. */
PRIME = TABLE_SIZE - 1;
while(isPrime[PRIME] == 1)
PRIME--;

keysPresent = 0;

/* Fill the hash table with -1 (empty entries). */


for(int i = 0; i < TABLE_SIZE; i++)
hashTable.push_back(-1);
}

void __printPrime(long long n){


for(long long i = 0; i <= n; i++)
if(isPrime[i] == 0)
cout<<i<<", ";
cout<<endl;
}

/* Function to insert value in hash table */


void insert(int value){

if(value == -1 || value == -2){


cout<<("ERROR : -1 and -2 can't be inserted in the table\n");
}

if(isFull()){
cout<<("ERROR : Hash Table Full\n");
return;
}

54
int probe = hash1(value), offset = hash2(value); // in linear probing offset
= 1;

while(hashTable[probe] != -1){
if(-2 == hashTable[probe])
break; //
insert at deleted element's location
probe = (probe+offset) % TABLE_SIZE;
}

hashTable[probe] = value;
keysPresent += 1;
}

void erase(int value){


/* Return if element is not present */
if(!search(value))
return;

int probe = hash1(value), offset = hash2(value);

while(hashTable[probe] != -1)
if(hashTable[probe] == value){
hashTable[probe] = -2; // mark element as
deleted (rather than unvisited(-1)).
keysPresent--;
return;
}
else
probe = (probe + offset) % TABLE_SIZE;

bool search(int value){


int probe = hash1(value), offset = hash2(value), initialPos = probe;
bool firstItr = true;

while(1){
if(hashTable[probe] == -1) // Stop search if
-1 is encountered.
break;
else if(hashTable[probe] == value) // Stop search after
finding the element.
return true;
else if(probe == initialPos && !firstItr) // Stop search if one
complete traversal of hash table is completed.
return false;
else
probe = ((probe + offset) % TABLE_SIZE); // if none of the
above cases occur then update the index and check at it.

firstItr = false;
}
return false;
}

/* Function to display the hash table. */


void print(){
for(int i = 0; i < TABLE_SIZE; i++)
cout<<hashTable[i]<<", ";
cout<<"\n";
}

55
};

int main(){
doubleHash myHash(13); // creates an empty hash table of size 13

/* Inserts random element in the hash table */

int insertions[] = {115, 12, 87, 66, 123},


n1 = sizeof(insertions)/sizeof(insertions[0]);

for(int i = 0; i < n1; i++)


myHash.insert(insertions[i]);

cout<< "Status of hash table after initial insertions : "; myHash.print();

/*
** Searches for random element in the hash table,
** and prints them if found.
*/

int queries[] = {1, 12, 2, 3, 69, 88, 115},


n2 = sizeof(queries)/sizeof(queries[0]);

cout<<"\n"<<"Search operation after insertion : \n";

for(int i = 0; i < n2; i++)


if(myHash.search(queries[i]))
cout<<queries[i]<<" present\n";

/* Deletes random element from the hash table. */

int deletions[] = {123, 87, 66},


n3 = sizeof(deletions)/sizeof(deletions[0]);

for(int i = 0; i < n3; i++)


myHash.erase(deletions[i]);

cout<< "Status of hash table after deleting elements : "; myHash.print();

return 0;
}
Output of the code:
Status of hash table after initial insertions : -1, 66, -1, -1, -1, -1, 123, -1, -1, 87,
-1, 115, 12,

Search operation after insertion :


12 present
115 present
Status of hash table after deleting elements : -1, -2, -1, -1, -1, -1, -2, -1, -1, -2, -1,
115, 12,

10. Load Factor in Hashing

● For the first step, the time taken depends on the K and the hash function.
For example, if the key is a string “abcd”, then its hash function may depend on the length of
the string. But for very large values of n, the number of entries into the map, and length of
the keys is almost negligible in comparison to n so hash computation can be considered to
take place in constant time, i.e, O(1).
● For the second step, traversal of the list of K-V pairs present at that index needs to be done.
For this, the worst case may be that all the n entries are at the same index. So, time

56
complexity would be O(n). But, enough research has been done to make hash functions
uniformly distribute the keys in the array so this almost never happens.
● So, on an average, if there are n entries and b is the size of the array there would be n/b
entries on each index. This value n/b is called the load factor that represents the load that is
there on our map.
● This Load Factor needs to be kept low, so that the number of entries at one index is less and
so is the complexity almost constant, i.e., O(1).

11. Re-Hashing

Rehashing is the process of increasing the size of a hashmap and redistributing the elements to
new buckets based on their new hash values. It is done to improve the performance of the
hashmap and to prevent collisions caused by a high load factor.
When a hashmap becomes full, the load factor (i.e., the ratio of the number of elements to the
number of buckets) increases. As the load factor increases, the number of collisions also
increases, which can lead to poor performance. To avoid this, the hashmap can be resized and the
elements can be rehashed to new buckets, which decreases the load factor and reduces the
number of collisions.
During rehashing, all elements of the hashmap are iterated and their new bucket positions are
calculated using the new hash function that corresponds to the new size of the hashmap. This
process can be time-consuming but it is necessary to maintain the efficiency of the hashmap.
Why Rehashing?
Rehashing is needed in a hashmap to prevent collision and to maintain the efficiency of the data
structure.
As elements are inserted into a hashmap, the load factor (i.e., the ratio of the number of elements
to the number of buckets) increases. If the load factor exceeds a certain threshold (often set to
0.75), the hashmap becomes inefficient as the number of collisions increases. To avoid this, the
hashmap can be resized and the elements can be rehashed to new buckets, which decreases the
load factor and reduces the number of collisions. This process is known as rehashing.
Rehashing can be costly in terms of time and space, but it is necessary to maintain the efficiency
of the hashmap.
How is Rehashing done?
Rehashing can be done as follows:
● For each addition of a new entry to the map, check the load factor.
● If it’s greater than its pre-defined value (or default value of 0.75 if not given), then Rehash.
● For Rehash, make a new array of double the previous size and make it the new bucket array.
● Then traverse to each element in the old bucketArray and call the insert() for each so as to
insert it into the new larger bucket array.
Program to implement Rehashing:
#include <iostream>
#include <vector>
#include <functional>

class Map {

private:
class MapNode {
public:
int key;
int value;
MapNode* next;

MapNode(int key, int value) {


this->key = key;
this->value = value;
this->next = NULL;
}
};

57
// The bucket array where
// the nodes containing K-V pairs are stored
std::vector<MapNode*> buckets;

// No. of pairs stored - n


int size;

// Size of the bucketArray - b


int numBuckets;

// Default loadFactor
double DEFAULT_LOAD_FACTOR = 0.75;

int getBucketInd(int key) {


// Using the inbuilt function from the object class
int hashCode = std::hash<int>()(key);

// array index = hashCode%numBuckets


return (hashCode % numBuckets);
}

public:
Map() {
numBuckets = 5;

buckets.resize(numBuckets);

std::cout << "HashMap created" << std::endl;


std::cout << "Number of pairs in the Map: " << size << std::endl;
std::cout << "Size of Map: " << numBuckets << std::endl;
std::cout << "Default Load Factor : " << DEFAULT_LOAD_FACTOR << std::endl;
}

void insert(int key, int value) {


// Getting the index at which it needs to be inserted
int bucketInd = getBucketInd(key);

// The first node at that index


MapNode* head = buckets[bucketInd];

// First, loop through all the nodes present at that index


// to check if the key already exists
while (head != NULL) {
// If already present the value is updated
if (head->key == key) {
head->value = value;
return;
}
head = head->next;
}

// new node with the K and V


MapNode* newElementNode = new MapNode(key, value);

// The head node at the index


head = buckets[bucketInd];

// the new node is inserted


// by making it the head
// and it's next is the previous head
newElementNode->next = head;

58
buckets[bucketInd] = newElementNode;

std::cout << "Pair(" << key << ", " << value << ") inserted successfully."
<< std::endl;

// Incrementing size
// as new K-V pair is added to the map
size++;

// Load factor calculated


double loadFactor = (1 * size) / numBuckets;

std::cout << "Current Load factor = " << loadFactor << std::endl;

// If the load factor is > 0.75, rehashing is done


if (loadFactor > DEFAULT_LOAD_FACTOR) {
std::cout << loadFactor << " is greater than " << DEFAULT_LOAD_FACTOR
<< std::endl;
std::cout << "Therefore Rehashing will be done." << std::endl;

// Rehash
rehash();

std::cout << "New Size of Map: " << numBuckets << std::endl;
}

std::cout << "Number of pairs in the Map: " << size << std::endl;
}

void rehash() {
std::cout << "\n***Rehashing Started***\n" << std::endl;

// The present bucket list is made temp


std::vector<MapNode*> temp = buckets;

// New bucketList of double the old size is created


buckets.resize(2 * numBuckets);

for (int i = 0; i < 2 * numBuckets; i++) {


// Initialised to null
buckets[i] = NULL;
}

// Now size is made zero


// and we loop through all the nodes in the original bucket list(temp)
// and insert it into the new list
size = 0;
numBuckets *= 2;

for (int i = 0; i < temp.size(); i++) {


// head of the chain at that index
MapNode* head = temp[i];

while (head != NULL) {


int key = head->key;
int val = head->value;

// calling the insert function for each node in temp


// as the new list is now the bucketArray
insert(key, val);
head = head->next;
}
}

59
std::cout << "***Rehashing Done***\n" << std::endl;
}
};

int main() {
Map map;
// Inserting elements
map.insert(1, 1);
map.insert(2, 2);
map.insert(3, 3);
map.insert(4, 4);
map.insert(5, 5);
map.insert(6, 6);
map.insert(7, 7);
map.insert(8, 8);
map.insert(9, 9);
map.insert(10, 10);

return 0;
}
Output of the code:
HashMap created
Number of pairs in the Map: 0
Size of Map: 5
Default Load Factor : 0.75

Pair(1, Geeks) inserted successfully.

Current Load factor = 0.2


Number of pairs in the Map: 1
Size of Map: 5

Current HashMap:
key = 1, val = Geeks

Pair(2, forGeeks) inserted successfully.

Current Load factor = 0.4


Number of pairs in the Map: 2
Size of Map: 5

Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks

Pair(3, A) inserted successfully.

Current Load factor = 0.6


Number of pairs in the Map: 3
Size of Map: 5

Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A

Pair(4, Computer) inserted successfully.

Current Load factor = 0.8


0.8 is greater than 0.75
Therefore Rehashing will be done.

60
***Rehashing Started***

Pair(1, Geeks) inserted successfully.

Current Load factor = 0.1


Number of pairs in the Map: 1
Size of Map: 10

Pair(2, forGeeks) inserted successfully.

Current Load factor = 0.2


Number of pairs in the Map: 2
Size of Map: 10

Pair(3, A) inserted successfully.

Current Load factor = 0.3


Number of pairs in the Map: 3
Size of Map: 10

Pair(4, Computer) inserted successfully.

Current Load factor = 0.4


Number of pairs in the Map: 4
Size of Map: 10

***Rehashing Ended***

New Size of Map: 10

Number of pairs in the Map: 4


Size of Map: 10

Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A
key = 4, val = Computer

Pair(5, Portal) inserted successfully.

Current Load factor = 0.5


Number of pairs in the Map: 5
Size of Map: 10

Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A
key = 4, val = Computer
key = 5, val = Portal

12. Applications of Hash Data Structure

● Hash is used in databases for indexing.


● Hash is used in disk-based data structures.
● In some programming languages like Python, JavaScript hash is used to implement objects.
Real-Time Applications of Hash Data Structure
● Hash is used for cache mapping for fast access to the data.
● Hash can be used for password verification.

61
● Hash is used in cryptography as a message digest.
● Rabin-Karp algorithm for pattern matching in a string.
● Calculating the number of different substrings of a string.

13. Advantages of Hash Data Structure

● Hash provides better synchronization than other data structures.


● Hash tables are more efficient than search trees or other data structures
● Hash provides constant time for searching, insertion, and deletion operations on average.

14. Disadvantages of Hash Data Structure

● Hash is inefficient when there are many collisions.


● Hash collisions are practically not avoided for a large set of possible keys.
● Hash does not allow null values.

5.2 Tries
1. Introduction

Trie data structure is defined as a Tree based data structure that is used for storing some
collection of strings and performing efficient search operations on them. The word Trie is derived
from reTRIEval, which means finding something or obtaining it.
Trie data structure follows some property that If two strings have a common prefix then they
will have the same ancestor in the trie. A trie data structure can be used to sort a collection of
strings alphabetically as well as search whether a string with a given prefix is present in the trie
or not.

Need of Trie
A Trie data structure is used for storing and retrieval of data and the same operations could be
done using another data structure which is Hash Table but Trie data structure can perform these
operations more efficiently than a Hash Table. Moreover, Trie has its own advantage over the
Hash table. A Trie data structure can be used for prefix-based searching whereas a Hash table
can’t be used in the same way.

2. Advantages of Trie over Hash Table

The trie data structure has the following advantages over a hash table:
● We can efficiently do prefix search (or auto-complete) with Trie.
● We can easily print all words in alphabetical order which is not easily possible with hashing.
● There is no overhead of Hash functions in a Trie data structure.
● Searching for a String even in the large collection of strings in a Trie data structure can be
done in O(L) Time complexity, Where L is the number of words in the query string. This
searching time could be even less than O(L) if the query string does not exist in the trie.

3. Properties of Trie

Below are some important properties of the Trie data structure:


● Each Trie has an empty root node, with links (or references) to other nodes
● Each node of a Trie represents a string and each edge represents a character.
● Every node consists of hashmaps or an array of pointers, with each index representing a
character and a flag to indicate if any string ends at the current node.
● Trie data structure can contain any number of characters including alphabets, numbers, and
special characters. But for this article, we will discuss strings with characters a-z. Therefore,
only 26 pointers are needed for every node, where the 0th index represents ‘a’ and the 25th
index represents ‘z’ characters.
● Each path from the root to any node represents a word or string.
62
4. Working of Trie

Trie data structure can contain any number of characters including alphabets, numbers, and
special characters. But for this article, we will discuss strings with characters a-z. Therefore, only
26 pointers are needed for every node, where the 0th index represents ‘a’ and the 25th index
represents ‘z’ characters.
Any lowercase English word can start with a-z, then the next letter of the word could be a-z, the
third letter of the word again could be a-z, and so on. So for storing a word, we need to take an
array (container) of size 26 and initially, all the characters are empty as there are no words and it
will look as shown below.
Let’s see how a word “and” and “ant” is stored in the Trie data structure:
1. Store “and” in Trie data structure:
● The word “and” starts with “a“, So we will mark the position “a” as filled in the Trie
node, which represents the use of “a”.
● After placing the first character, for the second character again there are 26
possibilities, So from “a“, again there is an array of size 26, for storing the 2nd
character.
● The second character is “n“, So from “a“, we will move to “n” and mark “n” in the 2nd
array as used.
● After “n“, the 3rd character is “d“, So mark the position “d” as used in the respective
array.
2. Store “ant” in the Trie data structure:
● The word “ant” starts with “a” and the position of “a” in the root node has already
been filled. So, no need to fill it again, just move to the node ‘a‘ in Trie.
● For the second character ‘n‘ we can observe that the position of ‘n’ in the ‘a’ node has
already been filled. So, no need to fill it again, just move to node ‘n’ in Trie.
● For the last character ‘t‘ of the word, The position for ‘t‘ in the ‘n‘ node is not filled. So,
fill the position of ‘t‘ in the ‘n‘ node and move to ‘t‘ node.

5. Representation of Trie Node

Every Trie node consists of a character pointer array or hashmap and a flag to represent if the
word is ending at that node or not. But if the words contain only lower-case letters (i.e. a-z), then
we can define Trie Node with an array instead of a hashmap.
Code in C++:
struct TrieNode {
struct TrieNode* children[ALPHABET_SIZE];

// This will keep track of number of strings that are


// stored in the Trie from root node to any Trie node.
int wordCount = 0;
};

6. Basic Operations on Trie

1. Insertion:
This operation is used to insert new strings into the Trie data structure.
Algorithm:
1. Define a function insert(TrieNode *root, string &word) which will take two parameters one
for the root and the other for the string that we want to insert in the Trie data structure.
2. Now take another pointer currentNode and initialize it with the root node.
3. Iterate over the length of the given string and check if the value is NULL or not in the array
of pointers at the current character of the string.
● If It’s NULL then, make a new node and point the current character to this newly created
node.
● Move the current to the newly created node.

63
4. Finally, increment the wordCount of the last currentNode, this implies that there is a string
ending currentNode.
Below is the implementation of the above algorithm:
void insert_key(TrieNode* root, string& key)
{
// Initialize the currentNode pointer
// with the root node
TrieNode* currentNode = root;

// Iterate across the length of the string


for (auto c : key) {

// Check if the node exist for the current


// character in the Trie.
if (currentNode->childNode[c - 'a'] == NULL) {

// If node for current character does not exist


// then make a new node
TrieNode* newNode = new TrieNode();

// Keep the reference for the newly created


// node.
currentNode->childNode[c - 'a'] = newNode;
}

// Now, move the current node pointer to the newly created node.
currentNode = currentNode->childNode[c - 'a'];
}

// Increment the wordEndCount for the last currentNode


// pointer this implies that there is a string ending at
// currentNode.
currentNode->wordCount++;
}

2. Searching:
Search operation in Trie is performed in a similar way as the insertion operation but the only
difference is that whenever we find that the array of pointers in curr node does not point to the
current character of the word then return false instead of creating a new node for that current
character of the word.
This operation is used to search whether a string is present in the Trie data structure or not.
There are two search approaches in the Trie data structure.
1. Find whether the given word exists in Trie.
2. Find whether any word that starts with the given prefix exists in Trie.
There is a similar search pattern in both approaches. The first step in searching a given word in
Trie is to convert the word to characters and then compare every character with the trie node
from the root node. If the current character is present in the node, move forward to its children.
Repeat this process until all characters are found.

2.1 Searching Prefix in Trie Data Structure:

bool isPrefixExist(TrieNode* root, string& key)


{
// Initialize the currentNode pointer
// with the root node
TrieNode* currentNode = root;

// Iterate across the length of the string


for (auto c : key) {

64
// Check if the node exist for the current
// character in the Trie.
if (currentNode->childNode[c - 'a'] == NULL) {

// Given word as a prefix does not exist in Trie


return false;
}

// Move the currentNode pointer to the already


// existing node for current character.
currentNode = currentNode->childNode[c - 'a'];
}

// Prefix exist in the Trie


return true;
}

2.2 Searching Complete Word in Trie Data Structure:

bool search_key(TrieNode* root, string& key)


{
// Initialize the currentNode pointer
// with the root node
TrieNode* currentNode = root;

// Iterate across the length of the string


for (auto c : key) {

// Check if the node exist for the current


// character in the Trie.
if (currentNode->childNode[c - 'a'] == NULL) {

// Given word does not exist in Trie


return false;
}

// Move the currentNode pointer to the already


// existing node for current character.
currentNode = currentNode->childNode[c - 'a'];
}

return (currentNode->wordCount > 0);


}

3. Deletion:
bool delete_key(TrieNode* root, string& word)
{
TrieNode* currentNode = root;
TrieNode* lastBranchNode = NULL;
char lastBrachChar = 'a';

for (auto c : word) {


if (currentNode->childNode[c - 'a'] == NULL) {
return false;
}
else {
int count = 0;
for (int i = 0; i < 26; i++) {
if (currentNode->childNode[i] != NULL)
count++;
}

65
if (count > 1) {
lastBranchNode = currentNode;
lastBrachChar = c;
}
currentNode = currentNode->childNode[c - 'a'];
}
}

int count = 0;
for (int i = 0; i < 26; i++) {
if (currentNode->childNode[i] != NULL)
count++;
}

// Case 1: The deleted word is a prefix of other words


// in Trie.
if (count > 0) {
currentNode->wordCount--;
return true;
}

// Case 2: The deleted word shares a common prefix with


// other words in Trie.
if (lastBranchNode != NULL) {
lastBranchNode->childNode[lastBrachChar] = NULL;
return true;
}
// Case 3: The deleted word does not share any common
// prefix with other words in Trie.
else {
root->childNode[word[0]] = NULL;
return true;
}
}

7. Implementation of Trie

#include <bits/stdc++.h>
using namespace std;

struct TrieNode {

// pointer array for child nodes of each node


TrieNode* childNode[26];
int wordCount;

TrieNode()
{
// constructor
// initialize the wordCnt variable with 0
// initialize every index of childNode array with
// NULL
wordCount = 0;
for (int i = 0; i < 26; i++) {
childNode[i] = NULL;
}
}
};

void insert_key(TrieNode* root, string& key)


{
// Initialize the currentNode pointer
// with the root node

66
TrieNode* currentNode = root;

// Iterate across the length of the string


for (auto c : key) {

// Check if the node exist for the current


// character in the Trie.
if (currentNode->childNode[c - 'a'] == NULL) {

// If node for current character does not exist


// then make a new node
TrieNode* newNode = new TrieNode();

// Keep the reference for the newly created


// node.
currentNode->childNode[c - 'a'] = newNode;
}

// Now, move the current node pointer to the newly


// created node.
currentNode = currentNode->childNode[c - 'a'];
}

// Increment the wordEndCount for the last currentNode


// pointer this implies that there is a string ending at
// currentNode.
currentNode->wordCount++;
}

bool search_key(TrieNode* root, string& key)


{
// Initialize the currentNode pointer
// with the root node
TrieNode* currentNode = root;

// Iterate across the length of the string


for (auto c : key) {

// Check if the node exist for the current


// character in the Trie.
if (currentNode->childNode[c - 'a'] == NULL) {

// Given word does not exist in Trie


return false;
}

// Move the currentNode pointer to the already


// existing node for current character.
currentNode = currentNode->childNode[c - 'a'];
}

return (currentNode->wordCount > 0);


}

bool delete_key(TrieNode* root, string& word)


{
TrieNode* currentNode = root;
TrieNode* lastBranchNode = NULL;
char lastBrachChar = 'a';

for (auto c : word) {


if (currentNode->childNode[c - 'a'] == NULL) {
return false;

67
}
else {
int count = 0;
for (int i = 0; i < 26; i++) {
if (currentNode->childNode[i] != NULL)
count++;
}

if (count > 1) {
lastBranchNode = currentNode;
lastBrachChar = c;
}
currentNode = currentNode->childNode[c - 'a'];
}
}

int count = 0;
for (int i = 0; i < 26; i++) {
if (currentNode->childNode[i] != NULL)
count++;
}

// Case 1: The deleted word is a prefix of other words


// in Trie.
if (count > 0) {
currentNode->wordCount--;
return true;
}

// Case 2: The deleted word shares a common prefix with


// other words in Trie.
if (lastBranchNode != NULL) {
lastBranchNode->childNode[lastBrachChar] = NULL;
return true;
}
// Case 3: The deleted word does not share any common
// prefix with other words in Trie.
else {
root->childNode[word[0]] = NULL;
return true;
}
}

// Driver code
int main()
{
// Make a root node for the Trie
TrieNode* root = new TrieNode();

// Stores the strings that we want to insert in the


// Trie
vector<string> inputStrings
= { "and", "ant", "do", "geek", "dad", "ball" };

// number of insert operations in the Trie


int n = inputStrings.size();

for (int i = 0; i < n; i++) {


insert_key(root, inputStrings[i]);
}

// Stores the strings that we want to search in the Trie


vector<string> searchQueryStrings

68
= { "do", "geek", "bat" };

// number of search operations in the Trie


int searchQueries = searchQueryStrings.size();

for (int i = 0; i < searchQueries; i++) {


cout << "Query String: " << searchQueryStrings[i]
<< "\n";
if (search_key(root, searchQueryStrings[i])) {
// the queryString is present in the Trie
cout << "The query string is present in the "
"Trie\n";
}
else {
// the queryString is not present in the Trie
cout << "The query string is not present in "
"the Trie\n";
}
}

// stores the strings that we want to delete from the


// Trie
vector<string> deleteQueryStrings = { "geek", "tea" };

// number of delete operations from the Trie


int deleteQueries = deleteQueryStrings.size();

for (int i = 0; i < deleteQueries; i++) {


cout << "Query String: " << deleteQueryStrings[i]
<< "\n";
if (delete_key(root, deleteQueryStrings[i])) {
// The queryString is successfully deleted from
// the Trie
cout << "The query string is successfully "
"deleted\n";
}
else {
// The query string is not present in the Trie
cout << "The query string is not present in "
"the Trie\n";
}
}

return 0;
}

Output of the code:


Query String: do
The query string is present in the Trie
Query String: geek
The query string is present in the Trie
Query String: bat
The query string is not present in the Trie
Query String: geek
The query string is successfully deleted
Query String: tea
The query string is not present in the Trie

8. Complexity Analysis of Trie

Operation Time Complexity Auxiliary Space

69
Insertion O(n) O(n * m)

Searching O(n) O(1)

Deletion O(n) O(1)

9. Applications of Trie

1. Autocomplete Feature: Autocomplete provides suggestions based on what you type in the
search box. Trie data structure is used to implement autocomplete functionality.
2. Spell Checkers: If the word typed does not appear in the dictionary, then it shows suggestions
based on what you typed.
It is a 3-step process that includes :
1. Checking for the word in the data dictionary.
2. Generating potential suggestions.
3. Sorting the suggestions with higher priority on top.
Trie stores the data dictionary and makes it easier to build an algorithm for searching the word
from the dictionary and provides the list of valid words for the suggestion.
3. Longest Prefix Matching Algorithm(Maximum Prefix Length Match): This algorithm is used in
networking by the routing devices in IP networking. Optimization of network routes requires
contiguous masking that bound the complexity of lookup time to O(n), where n is the length of
the URL address in bits.
To speed up the lookup process, Multiple Bit trie schemes were developed that perform the
lookups of multiple bits faster.

10. Advantages of Trie

● Trie allows us to input and finds strings in O(l) time, where l is the length of a single word. It
is faster as compared to both hash tables and binary search trees.
● It provides alphabetical filtering of entries by the key of the node and hence makes it easier
to print all words in alphabetical order.
● Trie takes less space when compared to BST because the keys are not explicitly saved;
instead, each key requires just an amortized fixed amount of space to be stored.
● Prefix search/Longest prefix matching can be efficiently done with the help of trie data
structure.
● Since trie doesn’t need any hash function for its implementation, they are generally faster
than hash tables for small keys like integers and pointers.
● Tries support ordered iteration whereas iteration in a hash table will result in pseudorandom
order given by the hash function which is usually more cumbersome.
● Deletion is also a straightforward algorithm with O(l) as its time complexity, where l is the
length of the word to be deleted.

11. Disadvantages of Trie

● The main disadvantage of the trie is that it takes a lot of memory to store all the strings. For
each node, we have too many node pointers which are equal to the number of characters in
the worst case.
● An efficiently constructed hash table(i.e. a good hash function and a reasonable load factor)
has O(1) as lookup time which is way faster than O(l) in the case of a trie, where l is the
length of the string.

70
Chapter 6
Dynamic Programming & Greedy
Algorithm
6.1 Dynamic Programming
1. Introduction

Dynamic Programming (DP) is a method used in mathematics and computer science to solve
complex problems by breaking them down into simpler subproblems. By solving each subproblem
only once and storing the results, it avoids redundant computations, leading to more efficient
solutions for a wide range of problems.

2. Working of DP

● Identify Subproblems: Divide the main problem into smaller, independent subproblems.
● Store Solutions: Solve each subproblem and store the solution in a table or array.
● Build Up Solutions: Use the stored solutions to build up the solution to the main problem.
● Avoid Redundancy: By storing solutions, DP ensures that each subproblem is solved only
once, reducing computation time.
Examples of Dynamic Programming (DP)
Example 1: Consider the problem of finding the Fibonacci Sequence:
Fibonacci Sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, …
Brute Force Approach:
To find the nth Fibonacci number using a brute force approach, you would simply add the (n-1)th
and (n-2)th Fibonacci numbers. This would work, but it would be inefficient for large values of n,
as it would require calculating all the previous Fibonacci numbers.
Dynamic Programming Approach:
Fibonacci Series using Dynamic Programming
● Subproblems: F(0), F(1), F(2), F(3), …
● Store Solutions: Create a table to store the values of F(n) as they are calculated.
● Build Up Solutions: For F(n), look up F(n-1) and F(n-2) in the table and add them.
● Avoid Redundancy: The table ensures that each subproblem (e.g., F(2)) is solved only once.
By using DP, we can efficiently calculate the Fibonacci sequence without having to recompute
subproblems.

3. Characteristics of DP

Dynamic programming is an optimization technique used when solving problems that consists of
the following characteristics:
A. Optimal Substructure:
Optimal substructure means that we combine the optimal results of subproblems to achieve the
optimal result of the bigger problem.
Example:
Consider the problem of finding the minimum cost path in a weighted graph from a source node to
a destination node. We can break this problem down into smaller subproblems:
● Find the minimum cost path from the source node to each intermediate node.
● Find the minimum cost path from each intermediate node to the destination node.
The solution to the larger problem (finding the minimum cost path from the source node to the
destination node) can be constructed from the solutions to these smaller subproblems.
B. Overlapping Subproblems:
The same subproblems are solved repeatedly in different parts of the problem.

71
Example:
Consider the problem of computing the Fibonacci series. To compute the Fibonacci number at
index n, we need to compute the Fibonacci numbers at indices n-1 and n-2. This means that the
subproblem of computing the Fibonacci number at index n-1 is used twice in the solution to the
larger problem of computing the Fibonacci number at index n.

4. Approaches of DP

Dynamic programming can be achieved using two approaches:


1. Top-Down Approach (Memoization):
In the top-down approach, also known as memoization, we start with the final solution and
recursively break it down into smaller subproblems. To avoid redundant calculations, we store
the results of solved subproblems in a memoization table.
Let’s breakdown Top down approach:
● Starts with the final solution and recursively breaks it down into smaller subproblems.
● Stores the solutions to subproblems in a table to avoid redundant calculations.
● Suitable when the number of subproblems is large and many of them are reused.
2. Bottom-Up Approach (Tabulation):
In the bottom-up approach, also known as tabulation, we start with the smallest subproblems
and gradually build up to the final solution. We store the results of solved subproblems in a table
to avoid redundant calculations.
Let’s breakdown Bottom-up approach:
● Starts with the smallest subproblems and gradually builds up to the final solution.
● Fills a table with solutions to subproblems in a bottom-up manner.
● Suitable when the number of subproblems is small and the optimal solution can be
directly computed from the solutions to smaller subproblems.

5. Advantages of DP

Dynamic programming has a wide range of advantages, including:


● Avoids recomputing the same subproblems multiple times, leading to significant time
savings.
● Ensures that the optimal solution is found by considering all possible combinations.
● Breaks down complex problems into smaller, more manageable subproblems.

6. Applications of DP

Dynamic programming has a wide range of applications, including:


● Optimization: Knapsack problem, shortest path problem, maximum subarray problem
● Computer Science: Longest common subsequence, edit distance, string matching
● Operations Research: Inventory management, scheduling, resource allocation

7. Difference between DP and Recursion

● In dynamic programming, problems are solved by breaking them down into smaller ones to
solve the larger ones, while recursion is when a function is called and executed by itself.
While dynamic programming can function without making use of recursion techniques, since
the purpose of dynamic programming is to optimize and accelerate the process, programmers
usually make use of recursion techniques to accelerate and turn the process efficiently.
● When a function can execute a specific task by calling itself, receive the name of the recursive
function. In order to perform and accomplish the work, this function calls itself when it has to
be executed.
● Using dynamic programming, you can break a problem into smaller parts, called
subproblems, to solve it. Dynamic programming involves solving the problem for the first
time, then using memoization to store the solutions.

72
● Therefore, the main difference between the two techniques is their intended use; recursion is
used to automate a function, whereas dynamic programming is an optimization technique
used to solve problems.
● Recursive functions recognize when they are needed, execute themselves, then stop working.
When the function identifies the moment it is needed, it calls itself and is executed; this is
called a recursive case. As a result, the function must stop once the task is completed, known
as the base case.
● By establishing states, dynamic programming recognizes the problem and divides it into
sub-problems in order to solve the whole scene. After solving these sub-problems, or
variables, the programmer must establish a mathematical relationship between them. Last
but not least, these solutions and results are stored as algorithms, so they can be accessed in
the future without having to solve the whole problem again.

8. Difference between Tabulation and Memoization

Method Tabulation Memoization

State transition relation is State Transition relation is


State
difficult to think easy to think

Code gets complicated when a lot Code is easy and less


Code
of conditions are required complicated

Fast, as we directly access Slow due to a lot of recursive


Speed
previous states from the table calls and return statements

If some subproblems in the


If all subproblems must be solved
subproblem space need not be
at least once, a bottom-up
solved at all, the memoized
Subproblem dynamic programming algorithm
solution has the advantage of
solving usually outperforms a top-down
solving only those
memoized algorithm by a
subproblems that are
constant factor
definitely required

Unlike the Tabulated version,


In the Tabulated version, all entries of the lookup table
Table entries starting from the first entry, all are not necessarily filled in the
entries are filled one by one Memoized version. The table
is filled on demand.

Generally, tabulation(dynamic On the other hand,


Approach programming) is an iterative memoization is a recursive
approach approach.

73
9. Implementation of DP in C++

A. Memoization
Code:
#include <iostream>
#include <unordered_map>

int fibonacci(int n, std::unordered_map<int, int>& cache) {


if (cache.find(n) != cache.end()) {
return cache[n];
}
int result;
if (n == 0) {
result = 0;
} else if (n == 1) {
result = 1;
} else {
result = fibonacci(n-1, cache) + fibonacci(n-2, cache);
}
cache[n] = result;
return result;
}
B. Tabulation
Code:
#include <iostream>
#include <vector>

int fibonacci(int n)
{
if (n == 0) {
return 0;
}
else if (n == 1) {
return 1;
}
else {
std::vector<int> table(n + 1, 0);
table[0] = 0;
table[1] = 1;
for (int i = 2; i <= n; i++) {
table[i] = table[i - 1] + table[i - 2];
}
return table[n];
}
}

10. Bitmasking and DP: Count ways to assign unique cap to every person

Suppose we have a collection of elements which are numbered from 1 to N. If we want to


represent a subset of this set then it can be encoded by a sequence of N bits (we usually call this
sequence a “mask”). In our chosen subset the i-th element belongs to it if and only if the i-th bit of
the mask is set i.e., it equals to 1. For example, the mask 10000101 means that the subset of the
set [1… 8] consists of elements 1, 3 and 8. We know that for a set of N elements there are total
2N subsets thus 2N masks are possible, one representing each subset. Each mask is, in fact, an
integer number written in binary notation.
Our main methodology is to assign a value to each mask (and, therefore, to each subset) and thus
calculate the values for new masks using values of the already computed masks. Usually our
main target is to calculate the value/solution for the complete set i.e., for mask 11111111.
Normally, to find the value for a subset X we remove an element in every possible way and use
values for obtained subsets X’1, X’2… ,X’k to compute the value/solution for X. This means that
the values for X’i must have been computed already, so we need to establish an ordering in which
74
masks will be considered. It’s easy to see that the natural ordering will do: go over masks in
increasing order of corresponding numbers. Also, We sometimes start with the empty subset X
and we add elements in every possible way and use the values of obtained subsets X’1, X’2… ,X’k
to compute the value/solution for X. We mostly use the following notations/operations on masks:
bit(i, mask) – the i-th bit of mask count(mask) – the number of non-zero bits in the mask first
(mask) – the number of the lowest non-zero bit in the mask set(i, mask) – set the ith bit in the
mask check(i, mask) – check the ith bit in the mask
How is this problem solved using Bitmasking + DP? The idea is to use the fact that there are up
to 10 persons. So we can use an integer variable as a bitmask to store which person is wearing a
cap and which is not.
If we draw the complete recursion tree, we can observe that many subproblems are solved again
and again. So we use Dynamic Programming. A table dp[][] is used such that in every entry
dp[i][j], i is mask and j is cap number. Since we want to access all persons that can wear a given
cap, we use an array of vectors, capList[101]. A value capList[i] indicates the list of persons that
can wear cap i.
Below is the implementation of the above idea.
// C++ program to find number of ways to wear hats
#include<bits/stdc++.h>
#define MOD 1000000007
using namespace std;

// capList[i]'th vector contains the list of persons having a cap with id i


// id is between 1 to 100 so we declared an array of 101 vectors as indexing
// starts from 0.
vector<int> capList[101];

// dp[2^10][101] .. in dp[i][j], i denotes the mask i.e., it tells that


// how many and which persons are wearing cap. j denotes the first j caps
// used. So, dp[i][j] tells the number ways we assign j caps to mask i
// such that none of them wears the same cap
int dp[1025][101];

// This is used for base case, it has all the N bits set
// so, it tells whether all N persons are wearing a cap.
int allmask;

// Mask is the set of persons, i is cap-id (OR the


// number of caps processed starting from first cap).
long long int countWaysUtil(int mask, int i)
{
// If all persons are wearing a cap so we
// are done and this is one way so return 1
if (mask == allmask) return 1;

// If not everyone is wearing a cap and also there are no more


// caps left to process, so there is no way, thus return 0;
if (i > 100) return 0;

// If we already have solved this subproblem, return the answer.


if (dp[mask][i] != -1) return dp[mask][i];

// Ways, when we don't include this cap in our arrangement


// or solution set.
long long int ways = countWaysUtil(mask, i+1);

// size is the total number of persons having cap with id i.


int size = capList[i].size();

// So, assign one by one ith cap to all the possible persons
// and recur for remaining caps.
for (int j = 0; j < size; j++)

75
{
// if person capList[i][j] is already wearing a cap so continue as
// we cannot assign him this cap
if (mask & (1 << capList[i][j])) continue;

// Else assign him this cap and recur for remaining caps with
// new updated mask vector
else ways += countWaysUtil(mask | (1 << capList[i][j]), i+1);
ways %= MOD;
}

// Save the result and return it.


return dp[mask][i] = ways;
}

// Reads n lines from standard input for current test case


void countWays(int n)
{
//----------- READ INPUT --------------------------
string temp, str;
int x;
getline(cin, str); // to get rid of newline character
for (int i=0; i<n; i++)
{
getline(cin, str);
stringstream ss(str);

// while there are words in the stream object ss


while (ss >> temp)
{
stringstream s;
s << temp;
s >> x;

// add the ith person in the list of cap if with id x


capList[x].push_back(i);
}
}
//----------------------------------------------------

// All mask is used to check whether all persons


// are included or not, set all n bits as 1
allmask = (1 << n) - 1;

// Initialize all entries in dp as -1


memset(dp, -1, sizeof dp);

// Call recursive function count ways


cout << countWaysUtil(0, 1) << endl;
}

// Driver Program
int main()
{
int n; // number of persons in every test case
cin >> n;
countWays(n);
return 0;
}

76
11. Sum over Subsets

Consider the following problem where we will use Sum over subset Dynamic Programming to
solve it.
Given an array of 2n integers, we need to calculate the function F(x) = ? such that x&i==i for all
x. i.e, i is a bitwise subset of x. i will be a bitwise subset of mask x, if x&i==i.
Examples:
Input: A[] = {7, 12, 14, 16} , n = 2
Output: 7, 19, 21, 49
Explanation: There will be 4 values of x: 0,1,2,3
So, we need to calculate F(0),F(1),F(2),F(3).
Now, F(0) = A0 = 7
F(1) = A0 + A1 = 19
F(2) = A0 + A2 = 21
F(3) = A0 + A1 + A2 + A3 = 49

Input: A[] = {7, 11, 13, 16} , n = 2


Output: 7, 18, 20, 47
Explanation: There will be 4 values of x: 0,1,2,3
So, we need to calculate F(0),F(1),F(2),F(3).
Now, F(0) = A0 = 7
F(1) = A0 + A1 = 18
F(2) = A0 + A2 = 20
F(3) = A0 + A1 + A2 + A3 = 47
Code:
1. Brute Force Approach:
Iterate for all the x from 0 to (2n-1) . Calculate the bitwise subsets of all the x and sum it up for
every x.
𝑛
Time Complexity: O(4 )
Below is the implementation of above idea:
// CPP program for brute force
// approach of SumOverSubsets DP
#include <bits/stdc++.h>
using namespace std;

// function to print the sum over subsets value


void SumOverSubsets(int a[], int n) {

// array to store the SumOverSubsets


int sos[1 << n] = {0};

// iterate for all possible x


for (int x = 0; x < (1 << n); x++) {

// iterate for all possible bitwise subsets


for (int i = 0; i < (1 << n); i++) {

// if i is a bitwise subset of x
if ((x & i) == i)
sos[x] += a[i];
}
}

// print all the subsets


for (int i = 0; i < (1 << n); i++)
cout << sos[i] << " ";
}

// Driver Code
int main() {
int a[] = {7, 12, 14, 16};

77
int n = 2;
SumOverSubsets(a, n);
return 0;
}
2. Sub Optimal Approach:
The brute-force algorithm can be easily improved by just iterating over bitwise subsets. Instead
of iterating for every i, we can simply iterate for the bitwise subsets only. Iterating backward for
i=(i-1)&x gives us every bitwise subset, where i starts from x and ends at 1. If the mask x has k
𝑘 𝑘
set bits, we do 2 iterations. A number of k set bits will have 2 bitwise subsets. Therefore the
𝑛
total number of masks x with k set bits is 𝐶𝑘. Therefore the total number of iterations is
𝑛 𝑘 𝑛
𝐶𝑘 . 2 = 3
𝑛
Time Complexity: O(3 )
Below is the implementation of above idea:
// CPP program for sub-optimal
// approach of SumOverSubsets DP
#include <bits/stdc++.h>
using namespace std;

// function to print the sum over subsets value


void SumOverSubsets(int a[], int n) {

// array to store the SumOverSubsets


int sos[1 << n] = {0};

// iterate for all possible x


for (int x = 0; x < (1 << n); x++) {
sos[x] = a[0];

// iterate for the bitwise subsets only


for (int i = x; i > 0; i = (i - 1) & x)
sos[x] += a[i];
}

// print all the subsets


for (int i = 0; i < (1 << n); i++)
cout << sos[i] << " ";
}

// Driver Code
int main() {
int a[] = {7, 12, 14, 16};
int n = 2;
SumOverSubsets(a, n);
return 0;
}

12. DP Problem 1: Kadane’s Algorithm (Largest Sum Contiguous Subarray)

Given an array arr[] of size N. The task is to find the sum of the contiguous subarray within an
arr[] with the largest sum.
Example:
Input: arr = {-2,-3,4,-1,-2,1,5,-3}
Output: 7
Explanation: The subarray {4,-1, -2, 1, 5} has the largest sum 7.
Input: arr = {2}
Output: 2
Explanation: The subarray {2} has the largest sum 1.
Input: arr = {5,4,1,7,8}
Output: 25
Explanation: The subarray {5,4,1,7,8} has the largest sum 25.
78
The idea of Kadane’s algorithm is to maintain a variable max_ending_here that stores the
maximum sum contiguous subarray ending at current index and a variable max_so_far stores the
maximum sum of contiguous subarray found so far, Everytime there is a positive-sum value in
max_ending_here compare it with max_so_far and update max_so_far if it is greater than
max_so_far.
// C++ program to print largest contiguous array sum
#include <bits/stdc++.h>
using namespace std;

int maxSubArraySum(int a[], int size)


{
int max_so_far = INT_MIN, max_ending_here = 0;

for (int i = 0; i < size; i++) {


max_ending_here = max_ending_here + a[i];
if (max_so_far < max_ending_here)
max_so_far = max_ending_here;

if (max_ending_here < 0)
max_ending_here = 0;
}
return max_so_far;
}

// Driver Code
int main()
{
int a[] = { -2, -3, 4, -1, -2, 1, 5, -3 };
int n = sizeof(a) / sizeof(a[0]);

// Function Call
int max_sum = maxSubArraySum(a, n);
cout << "Maximum contiguous sum is " << max_sum;
return 0;
}
To print the subarray with the maximum sum the idea is to maintain start index of
maximum_sum_ending_here at current index so that whenever maximum_sum_so_far is updated
with maximum_sum_ending_here then start index and end index of subarray can be updated with
start and current index.
// C++ program to print largest contiguous array sum

#include <climits>
#include <iostream>
using namespace std;

void maxSubArraySum(int a[], int size)


{
int max_so_far = INT_MIN, max_ending_here = 0,
start = 0, end = 0, s = 0;

for (int i = 0; i < size; i++) {


max_ending_here += a[i];

if (max_so_far < max_ending_here) {


max_so_far = max_ending_here;
start = s;
end = i;
}

if (max_ending_here < 0) {
max_ending_here = 0;
s = i + 1;

79
}
}
cout << "Maximum contiguous sum is " << max_so_far
<< endl;
cout << "Starting index " << start << endl
<< "Ending index " << end << endl;
}

/*Driver program to test maxSubArraySum*/


int main()
{
int a[] = { -2, -3, 4, -1, -2, 1, 5, -3 };
int n = sizeof(a) / sizeof(a[0]);
maxSubArraySum(a, n);
return 0;
}
Largest Sum Contiguous Subarray using Dynamic Programming:
// C++ program to print largest contiguous array sum
#include <bits/stdc++.h>
using namespace std;

void maxSubArraySum(int a[], int size)


{
vector<int> dp(size, 0);
dp[0] = a[0];
int ans = dp[0];
for (int i = 1; i < size; i++) {
dp[i] = max(a[i], a[i] + dp[i - 1]);
ans = max(ans, dp[i]);
}
cout << ans;
}

/*Driver program to test maxSubArraySum*/


int main()
{
int a[] = { -2, -3, 4, -1, -2, 1, 5, -3 };
int n = sizeof(a) / sizeof(a[0]);
maxSubArraySum(a, n);
return 0;
}

13. DP Problem 2: Assembly Line Scheduling Problem

A car factory has two assembly lines, each with n stations. A station is denoted by Si,j where i is
either 1 or 2 and indicates the assembly line the station is on, and j indicates the number of the
station. The time taken per station is denoted by ai,j. Each station is dedicated to some sort of
work like engine fitting, body fitting, painting, and so on. So, a car chassis must pass through
each of the n stations in order before exiting the factory. The parallel stations of the two assembly
lines perform the same task. After it passes through station Si,j, it will continue to station Si,j+1
unless it decides to transfer to the other line. Continuing on the same line incurs no extra cost,
but transferring from line i at station j – 1 to station j on the other line takes time ti,j. Each
assembly line takes an entry time ei and exit time xi which may be different for the two lines.
Give an algorithm for computing the minimum time it will take to build a car chassis.
The below figure presents the problem in a clear picture:
The following information can be extracted from the problem statement to make it simpler:
● Two assembly lines, 1 and 2, each with stations from 1 to n.
● A car chassis must pass through all stations from 1 to n in order(in any of the two assembly
lines). i.e. it cannot jump from station i to station j if they are not at one move distance.

80
● The car chassis can move one station forward in the same line, or one station diagonally in
the other line. It incurs an extra cost ti, j to move to station j from line i. No cost is incurred
for movement in the same line.
● The time taken in station j on line i is ai, j.
● Si, j represents a station j on line i.
Breaking the problem into smaller sub-problems:
We can easily find the i’th factorial if (i-1)th factorial is known. Can we apply the similar funda
here?
If the minimum time taken by the chassis to leave station Si, j-1 is known, the minimum time
taken to leave station Si, j can be calculated quickly by combining ai, j and ti, j.
T1(j) indicates the minimum time taken by the car chassis to leave station j on assembly line 1.
T2(j) indicates the minimum time taken by the car chassis to leave station j on assembly line 2.
Base cases:
The entry time e_i comes into picture only when the car chassis enters the car factory.
Time taken to leave the first station in line 1 is given by:
T1(1) = Entry time in Line 1 + Time spent in station S1,1
T1(1) = e1 + a1,1
Similarly, time taken to leave the first station in line 2 is given by:
T2(1) = e2 + a2,1
Recursive Relations:
If we look at the problem statement, it quickly boils down to the below observations:
The car chassis at station S1,j can come either from station S1, j-1 or station S2, j-1.
Case #1: Its previous station is S1, j-1
The minimum time to leave station S1,j is given by:
T1(j) = Minimum time taken to leave station S1, j-1 + Time spent in station S1, j
T1(j) = T1(j-1) + a1, j
Case #2: Its previous station is S2, j-1
The minimum time to leave station S1, j is given by:
T1(j) = Minimum time taken to leave station S2, j-1 + Extra cost incurred to change the assembly
line + Time spent in station S1, j
T1(j) = T2(j-1) + t2, j + a1, j
The minimum time T1(j) is given by the minimum of the two obtained in cases #1 and #2.
T1(j) = min((T1(j-1) + a1, j), (T2(j-1) + t2, j + a1, j))
Similarly, the minimum time to reach station S2, j is given by:
T2(j) = min((T2(j-1) + a2, j), (T1(j-1) + t1, j + a2, j))
The total minimum time taken by the car chassis to come out of the factory is given by:
Tmin = min(Time taken to leave station Si,n + Time taken to exit the car factory)
Tmin = min(T1(n) + x1, T2(n) + x2)
Code:
#include <iostream>
#include <vector>
using namespace std;

int fun(vector<vector<int> > a, vector<vector<int> > t,


int cl, int cs, int x1, int x2, int n)
{
// base case
if (cs == n - 1) {
if (cl == 0) { // exiting from (current) line =0
return x1;
}
else // exiting from line 2
return x2;
}
// continue on same line
int same
= fun(a, t, cl, cs + 1, x1, x2, n) + a[cl][cs + 1];
// continue on different line

81
int diff = fun(a, t, !cl, cs + 1, x1, x2, n)
+ a[!cl][cs + 1] + t[cl][cs + 1];

return min(same, diff);


}
int main()
{
int n = 4; // number of statin
vector<vector<int> > a
= { { 4, 5, 3, 2 }, { 2, 10, 1, 4 } };
vector<vector<int> > t
= { { 0, 7, 4, 5 }, { 0, 9, 2, 8 } };

int e1 = 10;
int e2 = 12;
int x1 = 18;
int x2 = 7;
// entry from 1st line
int x = fun(a, t, 0, 0, x1, x2, n) + e1 + a[0][0];
// entry from 2nd line
int y = fun(a, t, 1, 0, x1, x2, n) + e2 + a[1][0];
cout << min(x, y) << endl;
}

6.2 Greedy Algorithm


1. Introduction

A greedy algorithm is a type of optimization algorithm that makes locally optimal choices at each
step to find a globally optimal solution. It operates on the principle of “taking the best option
now” without considering the long-term consequences.

2. Steps for implementing a Greedy Algorithm

The steps to define a greedy algorithm are:


● Define the problem: Clearly state the problem to be solved and the objective to be
optimized.
● Identify the greedy choice: Determine the locally optimal choice at each step based on the
current state.
● Make the greedy choice: Select the greedy choice and update the current state.
● Repeat: Continue making greedy choices until a solution is reached.

3. Characteristics of Greedy Algorithms

● There is an ordered list of resources(profit, cost, value, etc.)


● Maximum of all the resources(max profit, max value, etc.) are taken.
● For example, in the fractional knapsack problem, the maximum value/weight is taken first
according to available capacity.
Characteristic components of Greedy Algorithm
● The feasible solution: A subset of given inputs that satisfies all specified constraints of a
problem is known as a “feasible solution”.
● Optimal solution: The feasible solution that achieves the desired extremum is called an
“optimal solution”. In other words, the feasible solution that either minimizes or maximizes
the objective function specified in a problem is known as an “optimal solution”.
● Feasibility check: It investigates whether the selected input fulfills all constraints mentioned
in a problem or not. If it fulfills all the constraints then it is added to a set of feasible
solutions; otherwise, it is rejected.
● Optimality check: It investigates whether a selected input produces either a minimum or
maximum value of the objective function by fulfilling all the specified constraints. If an
82
element in a solution set produces the desired extremum, then it is added to a sel of optimal
solutions.
● Optimal substructure property: The globally optimal solution to a problem includes the
optimal sub solutions within it.
● Greedy choice property: The globally optimal solution is assembled by selecting locally
optimal choices. The greedy approach applies some locally optimal criteria to obtain a partial
solution that seems to be the best at that moment and then find out the solution for the
remaining sub-problem.
The local decisions (or choices) must possess three characteristics as mentioned below:
● Feasibility: The selected choice must fulfill local constraints.
● Optimality: The selected choice must be the best at that stage (locally optimal choice).
● Irrevocability: The selected choice cannot be changed once it is made.

4. Difference between Greedy Algorithm and Dynamic Programming

Feature Greedy Algorithm Dynamic Programming

Guarantees an optimal solution if


May not always provide an
Optimality the problem exhibits the principle
optimal solution.
of optimality.

Subproblem Does not reuse solutions to Reuses solutions to overlapping


Reuse subproblems. subproblems.

May involve backtracking,


Does not involve
Backtracking especially in top-down
backtracking.
implementations.

Typically simpler and faster May be more complex and slower


Complexity
to implement. to implement.

Suitable for problems where Suitable for problems with


Application local optimization leads to overlapping subproblems and
global optimization. optimal substructure.

Minimum Spanning Tree, Fibonacci sequence, Longest


Examples
Shortest Path algorithms. Common Subsequence.

5. Greedy Algorithm Problem 1: Fractional Knapsack Problem

Given the weights and profits of N items, in the form of {profit, weight} put these items in a
knapsack of capacity W to get the maximum total profit in the knapsack. In Fractional
Knapsack, we can break items for maximizing the total value of the knapsack.
An efficient solution is to use the Greedy approach.

83
The basic idea of the greedy approach is to calculate the ratio profit/weight for each item and sort
the item on the basis of this ratio. Then take the item with the highest ratio and add them as much
as we can (can be the whole element or a fraction of it).
This will always give the maximum profit because, in each step it adds an element such that this
is the maximum possible profit for that much weight.
Follow the given steps to solve the problem using the above approach:
● Calculate the ratio (profit/weight) for each item.
● Sort all the items in decreasing order of the ratio.
● Initialize res = 0, curr_cap = given_cap.
● Do the following for every item i in the sorted order:
○ If the weight of the current item is less than or equal to the remaining capacity then add
the value of that item into the result
○ Else add the current item as much as we can and break out of the loop.
● Return res.
Below is the implementation of the above approach:
// C++ program to solve fractional Knapsack Problem

#include <bits/stdc++.h>
using namespace std;

// Structure for an item which stores weight and


// corresponding value of Item
struct Item {
int profit, weight;

// Constructor
Item(int profit, int weight)
{
this->profit = profit;
this->weight = weight;
}
};

// Comparison function to sort Item


// according to profit/weight ratio
static bool cmp(struct Item a, struct Item b)
{
double r1 = (double)a.profit / (double)a.weight;
double r2 = (double)b.profit / (double)b.weight;
return r1 > r2;
}

// Main greedy function to solve problem


double fractionalKnapsack(int W, struct Item arr[], int N)
{
// Sorting Item on basis of ratio
sort(arr, arr + N, cmp);

double finalvalue = 0.0;

// Looping through all items


for (int i = 0; i < N; i++) {

// If adding Item won't overflow,


// add it completely
if (arr[i].weight <= W) {
W -= arr[i].weight;
finalvalue += arr[i].profit;
}

// If we can't add current Item,

84
// add fractional part of it
else {
finalvalue
+= arr[i].profit
* ((double)W / (double)arr[i].weight);
break;
}
}

// Returning final value


return finalvalue;
}

// Driver code
int main()
{
int W = 50;
Item arr[] = { { 60, 10 }, { 100, 20 }, { 120, 30 } };
int N = sizeof(arr) / sizeof(arr[0]);

// Function call
cout << fractionalKnapsack(W, arr, N);
return 0;
}

6. Greedy Algorithm Problem 2: Huffman Coding

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length
codes to input characters; lengths of the assigned codes are based on the frequencies of
corresponding characters.
The variable-length codes assigned to input characters are Prefix Codes, meaning the codes (bit
sequences) are assigned in such a way that the code assigned to one character is not the prefix of
code assigned to any other character. This is how Huffman Coding makes sure that there is no
ambiguity when decoding the generated bitstream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d,
and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity
because code assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream
is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.
There are mainly two major parts in Huffman Coding
● Build a Huffman Tree from input characters.
● Traverse the Huffman Tree and assign codes to characters.
Algorithm of Huffman Coding
Algorithm Huffman (c)
{
n= |c|

Q = c
for i<-1 to n-1

do
{

temp <- get node ()

left (temp] Get_min (Q) right [temp] Get Min (Q)

a = left [templ b = right [temp]

F [temp]<- f[a] + [b]

insert (Q, temp)

85
}

return Get_min (0)


}
Steps to build Huffman Tree
Input is an array of unique characters along with their frequency of occurrences and output is
Huffman Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap
is used as a priority queue. The value of the frequency field is used to compare two nodes in
the min heap. Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies.
Make the first extracted node as its left child and the other extracted node as its right child.
Add this node to the min heap.
4. Repeat steps 2 and 3 until the heap contains only one node. The remaining node is the root
node and the tree is complete.
Code in C++
// C++ program for Huffman Coding
#include <cstdlib>
#include <iostream>
using namespace std;

// This constant can be avoided by explicitly calculating height of Huffman Tree


#define MAX_TREE_HT 100

// A Huffman tree node


struct MinHeapNode {

// One of the input characters


char data;

// Frequency of the character


unsigned freq;

// Left and right child of this node


struct MinHeapNode *left, *right;
};

// A Min Heap: Collection of


// min-heap (or Huffman tree) nodes
struct MinHeap {

// Current size of min heap


unsigned size;

// capacity of min heap


unsigned capacity;

// Array of minheap node pointers


struct MinHeapNode** array;
};

// A utility function allocate a new


// min heap node with given character
// and frequency of the character
struct MinHeapNode* newNode(char data, unsigned freq)
{
struct MinHeapNode* temp = (struct MinHeapNode*)malloc(
sizeof(struct MinHeapNode));

86
temp->left = temp->right = NULL;
temp->data = data;
temp->freq = freq;

return temp;
}

// A utility function to create


// a min heap of given capacity
struct MinHeap* createMinHeap(unsigned capacity)

struct MinHeap* minHeap


= (struct MinHeap*)malloc(sizeof(struct MinHeap));

// current size is 0
minHeap->size = 0;

minHeap->capacity = capacity;

minHeap->array = (struct MinHeapNode**)malloc(


minHeap->capacity * sizeof(struct MinHeapNode*));
return minHeap;
}

// A utility function to
// swap two min heap nodes
void swapMinHeapNode(struct MinHeapNode** a,
struct MinHeapNode** b)

struct MinHeapNode* t = *a;


*a = *b;
*b = t;
}

// The standard minHeapify function.


void minHeapify(struct MinHeap* minHeap, int idx)

int smallest = idx;


int left = 2 * idx + 1;
int right = 2 * idx + 2;

if (left < minHeap->size


&& minHeap->array[left]->freq
< minHeap->array[smallest]->freq)
smallest = left;

if (right < minHeap->size


&& minHeap->array[right]->freq
< minHeap->array[smallest]->freq)
smallest = right;

if (smallest != idx) {
swapMinHeapNode(&minHeap->array[smallest],
&minHeap->array[idx]);
minHeapify(minHeap, smallest);
}
}

87
// A utility function to check
// if size of heap is 1 or not
int isSizeOne(struct MinHeap* minHeap)
{

return (minHeap->size == 1);


}

// A standard function to extract


// minimum value node from heap
struct MinHeapNode* extractMin(struct MinHeap* minHeap)

struct MinHeapNode* temp = minHeap->array[0];


minHeap->array[0] = minHeap->array[minHeap->size - 1];

--minHeap->size;
minHeapify(minHeap, 0);

return temp;
}

// A utility function to insert


// a new node to Min Heap
void insertMinHeap(struct MinHeap* minHeap,
struct MinHeapNode* minHeapNode)

++minHeap->size;
int i = minHeap->size - 1;

while (i
&& minHeapNode->freq
< minHeap->array[(i - 1) / 2]->freq) {

minHeap->array[i] = minHeap->array[(i - 1) / 2];


i = (i - 1) / 2;
}

minHeap->array[i] = minHeapNode;
}

// A standard function to build min heap


void buildMinHeap(struct MinHeap* minHeap)

int n = minHeap->size - 1;
int i;

for (i = (n - 1) / 2; i >= 0; --i)


minHeapify(minHeap, i);
}

// A utility function to print an array of size n


void printArr(int arr[], int n)
{
int i;
for (i = 0; i < n; ++i)
cout << arr[i];

88
cout << "\n";
}

// Utility function to check if this node is leaf


int isLeaf(struct MinHeapNode* root)

return !(root->left) && !(root->right);


}

// Creates a min heap of capacity


// equal to size and inserts all character of
// data[] in min heap. Initially size of
// min heap is equal to capacity
struct MinHeap* createAndBuildMinHeap(char data[],
int freq[], int size)

struct MinHeap* minHeap = createMinHeap(size);

for (int i = 0; i < size; ++i)


minHeap->array[i] = newNode(data[i], freq[i]);

minHeap->size = size;
buildMinHeap(minHeap);

return minHeap;
}

// The main function that builds Huffman tree


struct MinHeapNode* buildHuffmanTree(char data[],
int freq[], int size)

{
struct MinHeapNode *left, *right, *top;

// Step 1: Create a min heap of capacity


// equal to size. Initially, there are
// modes equal to size.
struct MinHeap* minHeap
= createAndBuildMinHeap(data, freq, size);

// Iterate while size of heap doesn't become 1


while (!isSizeOne(minHeap)) {

// Step 2: Extract the two minimum


// freq items from min heap
left = extractMin(minHeap);
right = extractMin(minHeap);

// Step 3: Create a new internal


// node with frequency equal to the
// sum of the two nodes frequencies.
// Make the two extracted node as
// left and right children of this new node.
// Add this node to the min heap
// '$' is a special value for internal nodes, not
// used
top = newNode('$', left->freq + right->freq);

89
top->left = left;
top->right = right;

insertMinHeap(minHeap, top);
}

// Step 4: The remaining node is the


// root node and the tree is complete.
return extractMin(minHeap);
}

// Prints huffman codes from the root of Huffman Tree.


// It uses arr[] to store codes
void printCodes(struct MinHeapNode* root, int arr[],
int top)

// Assign 0 to left edge and recur


if (root->left) {

arr[top] = 0;
printCodes(root->left, arr, top + 1);
}

// Assign 1 to right edge and recur


if (root->right) {

arr[top] = 1;
printCodes(root->right, arr, top + 1);
}

// If this is a leaf node, then


// it contains one of the input
// characters, print the character
// and its code from arr[]
if (isLeaf(root)) {

cout << root->data << ": ";


printArr(arr, top);
}
}

// The main function that builds a


// Huffman Tree and print codes by traversing
// the built Huffman Tree
void HuffmanCodes(char data[], int freq[], int size)

{
// Construct Huffman Tree
struct MinHeapNode* root
= buildHuffmanTree(data, freq, size);

// Print Huffman codes using


// the Huffman tree built above
int arr[MAX_TREE_HT], top = 0;

printCodes(root, arr, top);


}

// Driver code
int main()
{

90
char arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
int freq[] = { 5, 9, 12, 13, 16, 45 };

int size = sizeof(arr) / sizeof(arr[0]);

HuffmanCodes(arr, freq, size);

return 0;
}
Output
f: 0
c: 100
d: 101
a: 1100
b: 1101
e: 111

91
Chapter 7
Graphs
7.1 Introduction
Graph Theory is a fundamental area of study in computer science and mathematics that deals
with the study of graphs, which are mathematical structures used to model pairwise
relationships between objects. Graphs consist of vertices (nodes) and edges (connections between
nodes). In this comprehensive overview, we'll cover the major topics in Graph Theory, including
basic definitions, types of graphs, representations, traversal algorithms, shortest path
algorithms, and applications.

A graph G is defined as a pair G = (V, E), where:

● V is a set of vertices (nodes).


● E is a set of edges (connections between nodes), which can be directed or undirected.

Graphs can represent various real-world scenarios such as social networks, road networks,
computer networks, etc.

7.2 Types of Graphs


1. Directed and Undirected Graphs:

● Undirected Graph: Edges have no direction. If there's an edge between vertices u and v,
it can be traversed in both directions.
● Directed Graph (Digraph): Edges have a direction. If there's a directed edge from
vertex u to vertex v, it can only be traversed from u to v.

2. Weighted Graphs:

● Edges have weights or costs associated with them, which represent the cost, distance, or
capacity between vertices.

3. Acyclic and Cyclic Graphs:

● Acyclic Graph: Contains no cycles (a path that starts and ends at the same vertex).
● Cyclic Graph: Contains at least one cycle.

7.3 Graph Representation


1. Adjacency Matrix:

● A 2D array A of size |𝑉| * |𝑉|, where A[i][j] is non-zero if there's an edge from vertex i to
vertex j.
2
● Suitable for dense graphs but consumes O(𝑉 ) space.

2. Adjacency List:

● An array of lists (or hashmap of lists), where each element iii contains a list of vertices
adjacent to vertex iii.
● Suitable for sparse graphs and consumes O(V + E) space.

92
7.4 Graph Traversal Algorithms
1. Depth-First Search (DFS):

● Explores as far as possible along each branch before backtracking.


● Uses a stack (recursive implementation) or explicit stack (iterative implementation).

2. Breadth-First Search (BFS):

● Explores all neighbors at the present depth level before moving on to nodes at the next
depth level.
● Uses a queue.

7.5 Shortest Path Algorithms


1. Dijkstra's Algorithm:

● Finds the shortest paths from a source vertex to all other vertices in a graph with
non-negative edge weights.
● Uses a priority queue (min-heap).

2. Bellman-Ford Algorithm:

● Finds the shortest paths from a single source vertex to all other vertices, even in the
presence of negative edge weights.
● Uses relaxation technique over all edges |𝑉| − 1 times.

3. Floyd-Warshall Algorithm:

● Finds shortest paths between all pairs of vertices in a graph, including negative edge
weights.
● Uses dynamic programming.

7.6 Minimum Spanning Tree (MST)


1. Kruskal's Algorithm:

● Finds an MST by sorting all edges and adding the smallest edge that doesn’t form a cycle
until |𝑉| − 1 edges are added.
● Uses Union-Find data structure for cycle detection.

2. Prim's Algorithm:

● Finds an MST by starting from an arbitrary vertex and greedily adding the minimum
weight edge that connects the tree to an outside vertex.
● Uses a priority queue (min-heap).

7.7 Applications of Graph Theory


Graph theory finds applications in various fields:

● Social Networks: Modeling relationships between individuals.


● Transport Networks: Optimizing routes and schedules.
● Computer Networks: Routing algorithms.
● Bioinformatics: Modeling genetic networks.
● Operations Research: Optimization problems.

93
7.8 Advanced Topics
1. Strongly Connected Components (SCC):

● Maximal subgraphs where each vertex is reachable from every other vertex in the
subgraph.

2. Eulerian and Hamiltonian Paths and Cycles:

● Eulerian Path/Cycle: Visit each edge exactly once (Cycle returns to the starting vertex).
● Hamiltonian Path/Cycle: Visit each vertex exactly once (Cycle returns to the starting
vertex).

7.9 Complexity Analysis


Graph algorithms vary in complexity depending on the representation and the specific problem.
2
Common complexities include O(V + E) for traversal algorithms, O(𝑉 . 𝑙𝑜𝑔(𝑉) + 𝐸) for MST
3
algorithms, and O(𝑉 ) for all pairs shortest path algorithms.

7.10 Graphs Problem 1: Implementation of Kruskal’s


Algorithm
In Kruskal’s algorithm, sort all edges of the given graph in increasing order. Then it keeps on
adding new edges and nodes in the MST if the newly added edge does not form a cycle. It picks
the minimum weighted edge at first and the maximum weighted edge at last. Thus we can say
that it makes a locally optimal choice in each step in order to find the optimal solution. Hence
this is an example of Greedy Algorithm.
How to find MST using Kruskal’s Algorithm?
Below are the steps for finding MST using Kruskal’s algorithm:
1. Sort all the edges in non-decreasing order of their weight.
2. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far. If the
cycle is not formed, include this edge. Else, discard it.
3. Repeat step#2 until there are (V-1) edges in the spanning tree.

Code in C++
// C++ program for the above approach

#include <bits/stdc++.h>
using namespace std;

// DSU data structure


// path compression + rank by union
class DSU {
int* parent;
int* rank;

public:
DSU(int n)
{
parent = new int[n];
rank = new int[n];

for (int i = 0; i < n; i++) {


parent[i] = -1;
rank[i] = 1;
}
}

94
// Find function
int find(int i)
{
if (parent[i] == -1)
return i;

return parent[i] = find(parent[i]);


}

// Union function
void unite(int x, int y)
{
int s1 = find(x);
int s2 = find(y);

if (s1 != s2) {
if (rank[s1] < rank[s2]) {
parent[s1] = s2;
}
else if (rank[s1] > rank[s2]) {
parent[s2] = s1;
}
else {
parent[s2] = s1;
rank[s1] += 1;
}
}
}
};

class Graph {
vector<vector<int> > edgelist;
int V;

public:
Graph(int V) { this->V = V; }

// Function to add edge in a graph


void addEdge(int x, int y, int w)
{
edgelist.push_back({ w, x, y });
}

void kruskals_mst()
{
// Sort all edges
sort(edgelist.begin(), edgelist.end());

// Initialize the DSU


DSU s(V);
int ans = 0;
cout << "Following are the edges in the "
"constructed MST"
<< endl;
for (auto edge : edgelist) {
int w = edge[0];
int x = edge[1];
int y = edge[2];

// Take this edge in MST if it does


// not forms a cycle
if (s.find(x) != s.find(y)) {

95
s.unite(x, y);
ans += w;
cout << x << " -- " << y << " == " << w
<< endl;
}
}
cout << "Minimum Cost Spanning Tree: " << ans;
}
};

// Driver code
int main()
{
Graph g(4);
g.addEdge(0, 1, 10);
g.addEdge(1, 3, 15);
g.addEdge(2, 3, 4);
g.addEdge(2, 0, 6);
g.addEdge(0, 3, 5);

// Function call
g.kruskals_mst();

return 0;
}
7.11 Graphs Problem 2: Minimize the cash flow among a given
set of friends who have borrowed money from each other
Given a number of friends who have to give or take some amount of money from one another.
Design an algorithm by which the total cash flow among all the friends is minimized.
The idea is to use Greedy Algorithm where at every step, settle all amounts of one person and
recur for remaining n-1 persons.
How to pick the first person? To pick the first person, calculate the net amount for every person
where net amount is obtained by subtracting all debts (amounts to pay) from all credits (amounts
to be paid). Once the net amount for every person is evaluated, find two persons with maximum
and minimum net amounts. These two persons are the most creditors and debtors. The person
with a minimum of two is our first person to be settled and removed from the list. Let the
minimum of two amounts be x. We pay ‘x’ amount from the maximum debtor to maximum
creditor and settle one person. If x is equal to the maximum debit, then maximum debtor is
settled, else maximum creditor is settled.
The following is a detailed algorithm.
Do the following for every person Pi where i is from 0 to n-1.
1. Compute the net amount for every person. The net amount for person ‘i’ can be computed by
subtracting sum of all debts from sum of all credits.
2. Find the two persons that are maximum creditor and maximum debtor. Let the maximum
amount to be credited maximum creditor be maxCredit and maximum amount to be debited
from maximum debtor be maxDebit. Let the maximum debtor be Pd and maximum creditor
be Pc.
3. Find the minimum of maxDebit and maxCredit. Let a minimum of two be x. Debit ‘x’ from Pd
and credit this amount to Pc
4. If x is equal to maxCredit, then remove Pc from the set of persons and recur for remaining
(n-1) persons.
5. If x is equal to maxDebit, then remove Pd from the set of persons and recur for remaining
(n-1) persons.

Code in C++ 14
// C++ program to find maximum cash flow among a set of persons
#include <bits/stdc++.h>
using namespace std;

96
// Number of persons (or vertices in the graph)
#define N 3

// A utility function that returns index of minimum value in


// arr[]
int getMin(int arr[])
{
int minInd = 0;
for (int i = 1; i < N; i++)
if (arr[i] < arr[minInd])
minInd = i;
return minInd;
}

// A utility function that returns index of maximum value in


// arr[]
int getMax(int arr[])
{
int maxInd = 0;
for (int i = 1; i < N; i++)
if (arr[i] > arr[maxInd])
maxInd = i;
return maxInd;
}

// A utility function to return minimum of 2 values


int minOf2(int x, int y) { return (x < y) ? x : y; }

// amount[p] indicates the net amount to be credited/debited


// to/from person 'p'
// If amount[p] is positive, then i'th person will amount[i]
// If amount[p] is negative, then i'th person will give
// -amount[i]
void minCashFlowRec(int amount[])
{
// Find the indexes of minimum and maximum values in
// amount[] amount[mxCredit] indicates the maximum
// amount to be given
// (or credited) to any person .
// And amount[mxDebit] indicates the maximum amount to
// be taken
// (or debited) from any person.
// So if there is a positive value in amount[], then
// there must be a negative value
int mxCredit = getMax(amount), mxDebit = getMin(amount);

// If both amounts are 0, then all amounts are settled


if (amount[mxCredit] == 0 && amount[mxDebit] == 0)
return;

// Find the minimum of two amounts


int min = minOf2(-amount[mxDebit], amount[mxCredit]);
amount[mxCredit] -= min;
amount[mxDebit] += min;

// If minimum is the maximum amount to be


cout << "Person " << mxDebit << " pays " << min
<< " to "
<< "Person " << mxCredit << endl;

// Recur for the amount array. Note that it is


// guaranteed that the recursion would terminate as

97
// either amount[mxCredit] or amount[mxDebit] becomes 0
minCashFlowRec(amount);
}

// Given a set of persons as graph[] where graph[i][j]


// indicates the amount that person i needs to pay person j,
// this function finds and prints the minimum cash flow to
// settle all debts.
void minCashFlow(int graph[][N])
{
// Create an array amount[], initialize all value in it
// as 0.
int amount[N] = { 0 };

// Calculate the net amount to be paid to person 'p',


// and stores it in amount[p]. The value of amount[p]
// can be calculated by subtracting debts of 'p' from
// credits of 'p'
for (int p = 0; p < N; p++)
for (int i = 0; i < N; i++)
amount[p] += (graph[i][p] - graph[p][i]);

minCashFlowRec(amount);
}

// Driver program to test above function


int main()
{
// graph[i][j] indicates the amount that person i needs
// to pay person j
int graph[N][N] = {
{ 0, 1000, 2000 },
{ 0, 0, 5000 },
{ 0, 0, 0 },
};

// Print the solution


minCashFlow(graph);
return 0;
}

98

You might also like