0% found this document useful (0 votes)
5 views

DAA-DA-output

The document discusses three string matching algorithms: Rabin-Karp, Knuth-Morris-Pratt (KMP), and Automata-Based. It provides C implementations for each algorithm, explaining their key concepts, parameters, and time complexities. Each algorithm is designed to efficiently find occurrences of a pattern in a given text, with unique approaches to handle mismatches and optimize search performance.

Uploaded by

Manipriyaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

DAA-DA-output

The document discusses three string matching algorithms: Rabin-Karp, Knuth-Morris-Pratt (KMP), and Automata-Based. It provides C implementations for each algorithm, explaining their key concepts, parameters, and time complexities. Each algorithm is designed to efficiently find occurrences of a pattern in a given text, with unique approaches to handle mismatches and optimize search performance.

Uploaded by

Manipriyaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

NAME: MANIPRIYAA R

REG NO: 23BAI0214

1.Rabin-Karp Algorithm (C Implementation)


#include <stdio.h>

#include <string.h>

#define d 256

void rabinKarp(char pattern[], char text[], int q) {

int M = strlen(pattern);

int N = strlen(text);

int p = 0, t = 0, h = 1;

int i, j;

for (i = 0; i < M - 1; i++)

h = (h * d) % q;

for (i = 0; i < M; i++) {

p = (d * p + pattern[i]) % q;

t = (d * t + text[i]) % q;

}// Slide the pattern over the text

for (i = 0; i <= N - M; i++) {

if (p == t) {

for (j = 0; j < M; j++)

if (text[i + j] != pattern[j])

break;

if (j == M)

printf("Pattern found at index %d\n", i);

if (i < N - M) {

t = (d * (t - text[i] * h) + text[i + M]) % q;

if (t < 0) t += q; }}}
int main() {

char text[] = "ABCCDEABCDABCD";

char pattern[] = "ABCD";

int q = 101; // Prime number

rabinKarp(pattern, text, q);

return 0;

1. Understanding the Algorithm

 The Rabin-Karp algorithm is a pattern-searching algorithm that uses hashing to find


occurrences of a pattern in a given text efficiently.

 It calculates the hash value of the pattern and compares it with the hash values of substrings of
the text.

2. Defining Key Parameters

 The text and pattern lengths are calculated.

 Variables are initialized to store the hash values of the pattern and the current substring of the
text.

 h is a factor used in hash value calculations.

3. Precomputing the Hash Factor (h)

 A helper value h = (d^(M-1)) % q is computed.

 This helps in efficiently removing the first character and adding a new character while
updating the rolling hash

4. Calculating Initial Hash Values

 The hash values of the pattern and the first window of text (of the same length as the pattern)
are computed.

 This avoids comparing the pattern character-by-character initially.

5. Sliding the Pattern Over the Text

 The algorithm moves the pattern one character at a time over the text.
 If the hash values match, a character-by-character comparison is performed to confirm the
match.

6. Rolling Hash Technique

 Instead of recalculating the hash from scratch for each new substring, the hash is updated using
the rolling hash formula:

o Remove the first character's contribution.

o Add the new character at the end.

 This reduces redundant computations and makes the algorithm efficient.

7. Handling Negative Hash Values

 If the updated hash value becomes negative, a small adjustment is made to convert it into a
positive value.

8. Output the Match Index

 If a pattern match is found, the index of occurrence is printed.

9. Main Function Execution

 A sample text ("ABCCDEABCDABCD") and pattern ("ABCD") are given.

 The Rabin-Karp function is called, and it finds all occurrences of "ABCD" in the text.

10. Time Complexity

 Best case: O(N)O(N)O(N) (when there are few hash collisions).

 Worst case: O(NM)O(NM)O(NM) (when many hash collisions occur, leading to character-by-
character comparisons).

 Average case: O(N)O(N)O(N) (efficient due to hashing).

2. Knuth-Morris-Pratt (KMP) Algorithm


#include <stdio.h>

#include <string.h>
void computeLPSArray(char* pattern, int M, int* lps) {

int len = 0, i = 1;

lps[0] = 0;

while (i < M) {

if (pattern[i] == pattern[len]) {

len++;

lps[i] = len;

i++;

} else {

if (len != 0)

len = lps[len - 1];

else {

lps[i] = 0;

i++}}}}

void KMPSearch(char* pattern, char* text) {

int M = strlen(pattern);

int N = strlen(text);

int lps[M];

computeLPSArray(pattern, M, lps);

int i = 0, j = 0;

while (i < N) {

if (pattern[j] == text[i]) {

i++;

j++;

}
if (j == M) {

printf("Pattern found at index %d\n", i - j);

j = lps[j - 1];

} else if (i < N && pattern[j] != text[i]) {

if (j != 0)

j = lps[j - 1];

else

i++;}}}

int main() {

char text[] = "ABABDABACDABABCABAB";

char pattern[] = "ABABCABAB";

KMPSearch(pattern, text);

return 0;}

1. Compute the LPS (Longest Prefix Suffix) Array

 The LPS array helps determine how much the pattern should be shifted when a mismatch occurs.

 It stores the length of the longest proper prefix which is also a suffix for each prefix of the
pattern.

 If a mismatch happens, the LPS array tells where to resume the comparison instead of restarting
from the beginning.

2. Constructing the LPS Array

 Start with the first character (lps[0] = 0).

 Compare the current character with the last matched prefix character:

o If they match, extend the LPS length and update lps[i].

o If they don’t match, reduce the prefix length using previous LPS values until a match is
found or reset to zero.
 Continue until the entire pattern is processed.

3. Searching the Pattern in the Text

 Traverse the text and compare it with the pattern.

 If characters match, move both pointers forward.

 If a complete match occurs, print the index and use lps[j-1] to determine how much to shift.

 If a mismatch occurs:

o Use the LPS array to avoid unnecessary rechecking.

o Move the pattern according to lps[j-1], instead of shifting it back completely.

4. Time Complexity

 Preprocessing (LPS Calculation): O(M)O(M)O(M)

 Pattern Matching: O(N)O(N)O(N)

 Overall Complexity: O(N+M)O(N + M)O(N+M), making KMP much faster than naive string-
matching algorithms.

5. Key Advantage

 Unlike brute force methods, KMP doesn’t backtrack in the text, making it highly efficient for
searching in large strings.

3. Automata-Based String Matching Algorithm


#include <stdio.h>

#include <string.h>

#define NO_OF_CHARS 256

void computeTF(char* pattern, int M, int TF[][NO_OF_CHARS]) {

int i, lps = 0, x;
for (x = 0; x < NO_OF_CHARS; x++)

TF[0][x] = 0;

TF[0][pattern[0]] = 1;

for (i = 1; i < M; i++) {

for (x = 0; x < NO_OF_CHARS; x++)

TF[i][x] = TF[lps][x];

TF[i][pattern[i]] = i + 1;

if (i > 0)

lps = TF[lps][pattern[i]]; }

for (x = 0; x < NO_OF_CHARS; x++)

TF[M][x] = TF[lps][x];}

void search(char* text, char* pattern) {

int M = strlen(pattern);

int N = strlen(text);

int TF[M + 1][NO_OF_CHARS];

computeTF(pattern, M, TF);

int state = 0, i;

for (i = 0; i < N; i++) {

state = TF[state][text[i]];

if (state == M)

printf("Pattern found at index %d\n", i - M + 1);}}

int main() {

char text[] = "AABAACAADAABAABA";

char pattern[] = "AABA";

search(text, pattern);

return 0;
}

1. Transition Function (TF) Table Construction

 The Transition Function (TF) table determines the next state based on the current state and the
incoming character.

 It ensures that when a mismatch occurs, the algorithm jumps to the longest prefix which is also a
suffix of the matched portion.

Steps for Building the TF Table:

1. Initialize the table: Set all transitions for state 0 to 0, except for the first character.

2. Iterate through the pattern:

o Copy transitions from the longest prefix suffix (LPS) state.

o Update the transition for the current character.

o Update the LPS state using previous transitions.

3. Final state transitions: Ensure that the last row contains proper shift values to continue pattern
matching efficiently.

2. Pattern Matching Process

 Start from state 0.

 For each character in the text:

o Move to the next state using the TF table.

o If the final state (equal to pattern length) is reached, a match is found.

3. Time Complexity

 Preprocessing (TF Table Construction): O(M×NO_OF_CHARS)O(M \times


\text{NO\_OF\_CHARS})O(M×NO_OF_CHARS) where MMM is the pattern length.

 Pattern Matching: O(N)O(N)O(N) where NNN is the text length.

 Overall Complexity: O(M×NO_OF_CHARS+N)O(M \times \text{NO\_OF\_CHARS} +


N)O(M×NO_OF_CHARS+N).

4. Key Advantages
 Efficient searching: Uses precomputed shifts, avoiding unnecessary comparisons.

 No backtracking: Unlike brute force methods, FA ensures smooth state transitions.

5. Example Execution

 Text: "AABAACAADAABAABA"

 Pattern: "AABA"

 The algorithm quickly identifies matches at indices 0, 9, and 12 by efficiently shifting the pattern

You might also like