0% found this document useful (0 votes)
5 views

M3-string_matching

The document discusses the Naive String-Matching Algorithm, which identifies valid shifts of a pattern within a text using a loop. The algorithm checks for matching conditions across all possible shifts, resulting in a time complexity of O((n-m+1)*m), which can be θ(n²) when the pattern length is half of the text length. Various tables illustrate the matching process through examples.

Uploaded by

ssanjayreg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

M3-string_matching

The document discusses the Naive String-Matching Algorithm, which identifies valid shifts of a pattern within a text using a loop. The algorithm checks for matching conditions across all possible shifts, resulting in a time complexity of O((n-m+1)*m), which can be θ(n²) when the pattern length is half of the text length. Various tables illustrate the matching process through examples.

Uploaded by

ssanjayreg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

String Matching Algorithms

Naive String-Matching Algorithm

• Finds all valid shifts using a loop that checks the condition

P [1..m] = T[s+1..s+m] for each of the n - m + 1 possible

values of s

• takes time O((n-m+1)*m), which is θ(n2) if m = ⌊n/2⌋


Naive String-Matching Algorithm
Naive String-Matching Algorithm
Naive String-Matching Algorithm 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 13
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 23
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm
Rabin-Karp algorithm

• Performs well in practice and that also generalizes to other

algorithms for related problems, such as two-dimensional

pattern matching

• uses θ(m) preprocessing time, and its worst-case running

time is θ((n-m+1)*m)
Rabin-Karp algorithm

• Makes use of elementary number-theoretic notions such as


the equivalence of two numbers modulo a third number

• Assume that Σ= {0, 1, 2,...,9}, so that each character is a


decimal digit

• Can then view a string of k consecutive characters as


representing a length-k decimal number
Rabin-Karp algorithm

• Character string 31415 thus corresponds to the decimal


number 31,415

• Given a pattern P [1..m], let p denote its corresponding


decimal value.

• In a similar manner, given a text T[1..n], let t s denote


decimal value of the length-m substring T [s+1..s + m]
Rabin-Karp algorithm

• Can compute p in time θ(m) using Horner’s rule:

• Similarly compute t0 from T [1..m] in time θ(m)

• t1 is computed from [2..m+1] and so on


Rabin-Karp algorithm
• Observe that we can compute ts+ 1from ts in constant time

• For example, if m = 5 and t s = 31415, then we wish to


remove the high-order digit T [s + 1] = 3 and bring in the
new low-order digit (suppose it is T [s + 5 + 1] = 2) to
obtain
• ts+1 = 10(31415 - 10000 * 3) + 2 = 14152
Rabin-Karp algorithm
• We can find all occurrences of the pattern P[1..m] in the text
T[1..n] with θ(m) preprocessing time and θ(n - m + 1)
matching time
• One problem: p and t s may be too large to work with
conveniently
• If P contains m characters, then we cannot reasonably assume
that each arithmetic operation on p (which is m digits long)
takes “constant time.”
Rabin-Karp algorithm
• Fortunately, we can solve this problem easily, as Figure 32.5
shows: compute p and the ts values modulo a suitable modulus
q
• Choose modulus q as a prime such that 10q just fits within
one computer word
• then we can perform all the necessary computations with
single-precision arithmetic
Rabin-Karp algorithm
Rabin-Karp algorithm
Rabin-Karp algorithm
• If q is large enough, then we hope that spurious hits occur
infrequently enough that the cost of the extra checking is low.

• The inputs to the procedure are the text T , the pattern P , the
radix d to use (which is typically taken to be |Σ|), and the
prime q to use
Rabin-Karp algorithm
The Knuth-Morris-Pratt algorithm
• Linear-time string-matching algorithm

• using just an auxiliary function π, which we precompute


from the pattern in time θ(m) and store in an array π[1..m]

• prefix function π for a pattern encapsulates knowledge about


how the pattern matches against shifts of itself
The Knuth-Morris-Pratt algorithm
• avoid testing useless shifts in the naive pattern-matching
algorithm

• For this example, q = 5 of the characters have matched


successfully, but the 6th pattern character fails to match the
corresponding text character
The Knuth-Morris-Pratt algorithm
• Knowing these q text characters allows us to determine
immediately that certain shifts are invalid

• In this example, shift s + 1 is necessarily invalid, since first


character (a) would be aligned with a text character that we
know does not match the first pattern character, but does
match the second pattern character (b).
The Knuth-Morris-Pratt algorithm
• Given a pattern P [1..m], the prefix function for the pattern P
is the function π: {1, 2, ... , m} -> {0, 1, ..., m-1} such that

• π[q] = max {k : k < q and Pk > Pq }


KMP algorithm
KMP algorithm
KMP algorithm
Trace
• T - bacbababaababacac

• P - ababaca
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Trace
• P - ababca

• T - abgababcababcabdababebcedaababcababcabdababe

You might also like