Add Sparse-Table article (#168)

jakobkogler · tcNickolas · commit dd1bead907a8 · 2017-10-15T14:42:09.000-07:00
diff --git a/src/data_structures/sparse-table.md b/src/data_structures/sparse-table.md
@@ -0,0 +1,137 @@
+<!--?title Sparse Table-->
+
+# Sparse Table
+
+Sparse Table is a data structure, that allows answering range queries.
+It can answer most range queries in $O(\log n)$, but its true power is answering range minimum queries (or equivalent range maximum queries).
+For those queries it can compute the answer in $O(1)$ time. 
+
+The only drawback of this data structure is, that it can only be used on _immutable_ arrays. 
+This means, that the array cannot be changed between two queries.
+If any element in the array changes, the complete data structure has to be recomputed. 
+
+### Intuition
+
+Any non-negative number can be uniquely represented as a sum of decreasing powers of two. 
+This is just a variant of the binary representation of a number. 
+E.g. $13 = (1101)_2 = 8 + 4 + 1$. 
+For a number $x$ there can be at most $\lceil \log_2 x \rceil$ summands. 
+
+By the same reasoning any interval can be uniquely represented as a union of intervals with lengths that are decreasing powers of two. 
+E.g. $\[2, 14\] = \[2, 9\] \cup \[10, 13\] \cup \[14, 14\]$, where the complete interval has length 13, and the individual intervals have the lengths 8, 4 and 1 respectably. 
+And also here the union consists of at most $\lceil \log_2(\text{length of interval}) \rceil$ many intervals. 
+
+The main idea behind Sparse Tables is to precompute all answers for range queries with power of two length. 
+Afterwards a different range query can be answered by splitting the range into ranges with power of two lengths, looking up the precomputed answers, and combining them to receive a complete answer. 
+
+### Precomputation
+
+We will use a 2-dimensional array for storing the answers to the precomputed queries. 
+$\text{st}\[i\]\[j\]$ will store the answer for the range $[i, i + 2^j - 1]$ of length $2^j$. 
+The size of the 2-dimensional array will be $\text{MAXN} \times (K + 1)$, where $\text{MAXN}$ is the biggest possible array length. 
+$\text{K}$ has to satisfy $\text{K} \ge \lfloor \log_2 \text{MAXN} \rfloor + 1$, because $2^{\lfloor \log_2 \text{MAXN} \rfloor}$ is the biggest power of two range, that we have to support. 
+For arrays with reasonable length ($\le 10^7$ elements), $K = 25$ is a good value. 
+
+```cpp
+int st[MAXN][K + 1];
+```
+
+Because the range $\[i, i + 2^j - 1\]$ of length $2^j$ splits nicely into the ranges $\[i, i + 2^{j - 1} - 1\]$ and $\[i + 2^{j - 1}, i + 2^j - 1\]$, both of length $2^{j - 1}$, we can generate the table efficiently using dynamic programming:
+
+```cpp
+for (int i = 0; i < N; i++) 
+    st[i][0] = f(array[i]);
+
+for (int j = 1; j <= K; j++) 
+    for (int i = 0; i + (1 << j) <= N; i++) 
+        st[i][j] = f(st[i][j-1], st[i + (1 << (j - 1))][j - 1]);
+```
+
+The function $f$ will depend on the type of query.
+For range sum queries it will compute the sum, for range minimum queries it will compute the minimum. 
+
+The time complexity of the precomputation is $O(\text{N} \log \text{N})$. 
+
+### Range Sum Queries
+
+For this type of queries, we want to find the sum of all values in a range. 
+Therefore the natural definition of the function $f$ is $f(x, y) = x + y$. 
+We can construct the data structure with:
+
+```cpp
+long long st[MAXN][K];
+
+for (int i = 0; i < N; i++) 
+    st[i][0] = array[i];
+
+for (int j = 1; j <= K; j++) 
+    for (int i = 0; i + (1 << j) <= N; i++) 
+        st[i][j] = st[i][j-1] + st[i + (1 << (j - 1))][j - 1];
+```
+
+To answer the sum query for the range $\[L, R\]$, we iterate over all powers of two, starting from the biggest one.
+As soon as a power of two $2^j$ is smaller or equal to the length of the range ($= R - L + 1$), we process the first the first part of range $\[L, L + 2^j - 1\]$, and continue with the remaining range $\[L + 2^j, R\]$.  
+
+```cpp
+long long sum = 0;
+for (int j = K; j >= 0; j--) {
+    if ((1 << j) <= R - L + 1) {
+        sum += st[L][j];
+        L += 1 << j;
+    }
+}
+```
+
+Time complexity for a Range Sum Query is $O(K) = O(\log \text{MAXN})$.
+
+### Range Minimum Queries (RMQ)
+
+These are the queries where the Sparse Table shines. 
+When computing the minimum of a range, it doesn't matter if we process a value in the range once or twice. 
+Therefore instead of splitting a range into multiple ranges, we can also split the range into only two overlapping ranges with power of two length. 
+E.g. we can split the range $\[1, 6\]$ into the ranges $\[1, 4\]$ and $\[3, 6\]$. 
+The range minimum of $\[1, 6\]$ is clearly the same as the minumum of the range minimum of $\[1, 4\]$ and the range minimum of $\[3, 6\]$. 
+So we can compute the minimum of the range $\[L, R\]$ with:
+
+$$\min(\text{st}\[L\]\[j\], \text{st}\[R - 2^j + 1][j]) \quad \text{ where } j = \log_2(R - L + 1)$$
+
+This requires that we are able to compute $\log_2(R - L + 1)$ fast. 
+You can accomplish that by precomputing all logarithms: 
+
+```cpp
+int log[MAXN+1];
+log[1] = 0;
+for (int i = 2; i <= MAXN; i++) 
+    log[i] = log[i/2] + 1;
+```
+
+Afterwards we need to precompute the Sparse Table structure. This time we define $f$ with $f(x, y) = \min(x, y)$. 
+
+```cpp
+int st[MAXN][K];
+
+for (int i = 0; i < N; i++) 
+    st[i][0] = array[i];
+
+for (int j = 1; j <= K; j++) 
+    for (int i = 0; i + (1 << j) <= N; i++) 
+        st[i][j] = min(st[i][j-1], st[i + (1 << (j - 1))][j - 1]);
+```
+
+And the minimum of a range $\[L, R\]$ can be computed with:
+
+```cpp
+int j = log[R - L + 1];
+int minimum = min(st[L][j], st[R - (1 << j) + 1][j]);
+```
+ 
+Time complexity for a Range Minimum Query is $O(1)$.
+
+## Practice Problems
+
+* [SPOJ - RMQSQ](http://www.spoj.com/problems/RMQSQ/)
+* [SPOJ - THRBL](http://www.spoj.com/problems/THRBL/)
+* [Codechef - MSTICK](https://www.codechef.com/problems/MSTICK)
+* [Codechef - SEAD](https://www.codechef.com/problems/SEAD)
+* [Codeforces - CGCDSSQ](http://codeforces.com/contest/475/problem/D)
+
diff --git a/src/index.md b/src/index.md
@@ -28,6 +28,7 @@ especially popular in field of competitive programming.*
 
 ### Data Structures
 - [Fenwick Tree](./data_structures/fenwick.html)
+- [Sparse Table](./data_structures/sparse-table.html)
 - [Treap](./data_structures/treap.html)
 - [Sqrt Decomposition](./data_structures/sqrt_decomposition.html)
 
diff --git a/src/sequences/rmq.md b/src/sequences/rmq.md
@@ -18,10 +18,12 @@ From the list of data structures described on this site, you can choose:
   Pros: good runtime complexity. Cons: larger amount of code compared to the other data structures.
 - [Fenwick tree](../data_structures/fenwick.html) - answers each query in $O(\log N)$, preprocessing done in $O(N \log N)$.
   Pros: the shortest code, good runtime complexity. Cons: Fenwick tree can only be used for queries with $L = 1$, so it is not applicable to many problems.
+- [Sparse Table](../data_structures/sparse-table.html) - answers each query in $O(1)$, preprocessing done in $O(N \log N)$.
+  Pros: simple data structure, excellent runtime complexity. Cons: doesn't allow modifications on the array between queries.
 
 Note: Preprocessing is the preliminary processing of the given array by building corresponding data structure for it.
 
-If the array $A$ might change during the runtime (i.e. there will also be queries to change values in some interval), the problem can only be solved by [Sqrt-decomposition]() or [Segment tree]().
+If the array $A$ might change during the runtime (i.e. there will also be queries to change values in some interval), the problem can only be solved by [Sqrt-decomposition](), [Segment tree]() or [Fenwick tree](../data_structures/fenwick.html).
 
 ## Practice Problems
 - [SPOJ: Range Minimum Query](http://www.spoj.com/problems/RMQSQ/)