diff --git a/src/index.md b/src/index.md index 0c7b05de4..a55487182 100644 --- a/src/index.md +++ b/src/index.md @@ -23,6 +23,7 @@ especially popular in field of competitive programming.* ### String Processing - [String Hashing](./string/string-hashing.html) +- [Rabin-Karp Algorithm for String Matching](./string/rabin-karp.html) - [Suffix Array](./string/suffix-array.html) - [Z-function](./string/z-function.html) diff --git a/src/string/rabin-karp.md b/src/string/rabin-karp.md new file mode 100644 index 000000000..7e1534004 --- /dev/null +++ b/src/string/rabin-karp.md @@ -0,0 +1,56 @@ + + +# Rabin-Karp Algorithm for string matching in O(|S| + |T|) + + +This algorithm is based on the concept of hashing, so if you are not familiar with string hashing, kindly refer to the [String Hashing](./string/string-hashing.html) article. + + +This algorithm was authored by Rabin and Karp in 1987. + +Problem: Given two strings - a pattern $S$ and a text $T$, determine if the pattern appears in the text and if it does, enumerate all its +occurences in $O(|S| + |T|)$ time. + +Algorithm: Calculate the hash for the pattern $S$. Calculate hash values for all the prefixes of the text $T$. Now, we can compare a substring of length $|S|$ with $S$ in constant time using the calculated hashes. So, compare each substring of length $|S|$ with the pattern. This will take a total of $O(|T|)$ time. Hence the final complexity of the algorithm is $O(|T| + |S|)$ where $O(|S|)$ is required for calculating the hash of the pattern and $O(|T|)$ for comparing each substring of length $|S|$ with the pattern. + + +## Implementation + + string s, t; // input + + // calculate all powers of p + const int p = 31; + vector p_pow(max(s.length(), t.length())); + p_pow[0] = 1; + for (size_t i = 1; i < p_pow.size(); ++i) + p_pow[i] = p_pow[i-1] * p; + + // calculate hashes of all prefixes of text T + vector h(t.length()); + for (size_t i = 0; i < t.length(); i++) + { + h[i] = (t[i] - 'a' + 1) * p_pow[i]; + if (i) h[i] + = h[i - 1]; + } + + // calculate the hash of the pattern S + unsigned long long h_s = 0; + for (size_t i = 0; i < s.length(); i++) + h_s += (s[i] - 'a' + 1) * p_pow[i]; + + // iterate over all substrings of T having length |S| and compare them + // with S + for (size_t i = 0; i + s.length() - 1 < t.length(); i++) + { + unsigned long long cur_h = h[i + s.length () - 1]; + if (i) cur_h -= h [i - 1]; + + // get the hashes multiplied to the same degree of p and compare them + if (cur_h == H_s * p_pow[i]) + cout << i << ''; + } + +## Practice Problems + +* [Pattern Find - SPOJ](http://www.spoj.com/problems/NAJPF/) +