Robin Karp Algorithm For String Matching
Robin Karp Algorithm For String Matching
Karp Algorithm
String Matching Problem
3 1 4 1 5 9 2 6 5 3 5
31 mod 11 = 9 not equal to 4
3 1 4 1 5 9 2 6 5 3 5
14 mod 11 = 3 not equal to 4
3 1 4 1 5 9 2 6 5 3 5
41 mod 11 = 8 not equal to 4
Rabin-Karp example continued
3 1 4 1 5 9 2 6 5 3 5
15 mod 11 = 4 equal to 4 -> spurious hit
3 1 4 1 5 9 2 6 5 3 5
59 mod 11 = 4 equal to 4 -> spurious hit
3 1 4 1 5 9 2 6 5 3 5
92 mod 11 = 4 equal to 4 -> spurious hit
3 1 4 1 5 9 2 6 5 3 5
26 mod 11 = 4 equal to 4 -> an exact match!!
3 1 4 1 5 9 2 6 5 3 5
65 mod 11 = 10 not equal to 4
Rabin-Karp example continued
3 1 4 1 5 9 2 6 5 3 5
53 mod 11 = 9 not equal to 4
3 1 4 1 5 9 2 6 5 3 5
35 mod 11 = 2 not equal to 4
As we can see, when a match is found, further testing is
done to insure that a match has indeed been found.
Complexity
• The running time of the Rabin-Karp algorithm in the
worst-case scenario is O(n-m+1)m but it has a good
average-case running time.
• If the expected number of valid shifts is small O(1)
and the prime q is chosen to be quite large, then the
Rabin-Karp algorithm can be expected to run in time
O(n+m) plus the time to required to process spurious
hits.
Applications
• Bioinformatics
– Used in looking for similarities of two or more
proteins; i.e. high sequence similarity usually implies
significant structural or functional similarity.
Example:
Hb A_human
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL
G+ +VK+HGKKV A++++++AH+ D++ ++ +++LS+LH KL
Hb B_human
GNPKVKAHGKKVLGAFSDGLAH LDNLKGTF ATLSELH CDKL
+ similar amino acids
Applications continued
• Alpha hemoglobin and beta hemoglobin are subunits that
make up a protein called hemoglobin in red blood cells.
Notice the similarities between the two sequences, which
probably signify functional similarity.
• Many distantly related proteins have domains that are similar
to each other, such as the DNA binding domain or cation
binding domain. To find regions of high similarity within
multiple sequences of proteins, local alignment must be
performed. The local alignment of sequences may provide
information of similar functional domains present among
distantly related proteins.