Skip to content

Translate Rabin Karp Algorithm #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 5, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ especially popular in field of competitive programming.*

### String Processing
- [String Hashing](./string/string-hashing.html)
- [Rabin-Karp Algorithm for String Matching](./string/rabin-karp.html)
- [Suffix Array](./string/suffix-array.html)
- [Z-function](./string/z-function.html)

Expand Down
56 changes: 56 additions & 0 deletions src/string/rabin-karp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
<!--?title Rabin-Karp Algorithm-->

# Rabin-Karp Algorithm for string matching in O(|S| + |T|)


This algorithm is based on the concept of hashing, so if you are not familiar with string hashing, kindly refer to the [String Hashing](./string/string-hashing.html) article.


This algorithm was authored by Rabin and Karp in 1987.

Problem: Given two strings - a pattern $S$ and a text $T$, determine if the pattern appears in the text and if it does, enumerate all its
occurences in $O(|S| + |T|)$ time.

Algorithm: Calculate the hash for the pattern $S$. Calculate hash values for all the prefixes of the text $T$. Now, we can compare a substring of length $|S|$ with $S$ in constant time using the calculated hashes. So, compare each substring of length $|S|$ with the pattern. This will take a total of $O(|T|)$ time. Hence the final complexity of the algorithm is $O(|T| + |S|)$ where $O(|S|)$ is required for calculating the hash of the pattern and $O(|T|)$ for comparing each substring of length $|S|$ with the pattern.


## Implementation

string s, t; // input

// calculate all powers of p
const int p = 31;
vector<unsigned long long> p_pow(max(s.length(), t.length()));
p_pow[0] = 1;
for (size_t i = 1; i < p_pow.size(); ++i)
p_pow[i] = p_pow[i-1] * p;

// calculate hashes of all prefixes of text T
vector<unsigned long long> h(t.length());
for (size_t i = 0; i < t.length(); i++)
{
h[i] = (t[i] - 'a' + 1) * p_pow[i];
if (i) h[i] + = h[i - 1];
}

// calculate the hash of the pattern S
unsigned long long h_s = 0;
for (size_t i = 0; i < s.length(); i++)
h_s += (s[i] - 'a' + 1) * p_pow[i];

// iterate over all substrings of T having length |S| and compare them
// with S
for (size_t i = 0; i + s.length() - 1 < t.length(); i++)
{
unsigned long long cur_h = h[i + s.length () - 1];
if (i) cur_h -= h [i - 1];

// get the hashes multiplied to the same degree of p and compare them
if (cur_h == H_s * p_pow[i])
cout << i << '';
}

## Practice Problems

* [Pattern Find - SPOJ](http://www.spoj.com/problems/NAJPF/)