|
| 1 | +<!--?title Sparse Table--> |
| 2 | + |
| 3 | +# Sparse Table |
| 4 | + |
| 5 | +Sparse Table is a data structure, that allows answering range queries. |
| 6 | +It can answer most range queries in $O(\log n)$, but its true power is answering range minimum queries (or equivalent range maximum queries). |
| 7 | +For those queries it can compute the answer in $O(1)$ time. |
| 8 | + |
| 9 | +The only drawback of this data structure is, that it can only be used on _immutable_ arrays. |
| 10 | +This means, that the array cannot be changed between two queries. |
| 11 | +If any element in the array changes, the complete data structure has to be recomputed. |
| 12 | + |
| 13 | +### Intuition |
| 14 | + |
| 15 | +Any non-negative number can be uniquely represented as a sum of decreasing powers of two. |
| 16 | +This is just a variant of the binary representation of a number. |
| 17 | +E.g. $13 = (1101)_2 = 8 + 4 + 1$. |
| 18 | +For a number $x$ there can be at most $\lceil \log_2 x \rceil$ summands. |
| 19 | + |
| 20 | +By the same reasoning any interval can be uniquely represented as a union of intervals with lengths that are decreasing powers of two. |
| 21 | +E.g. $\[2, 14\] = \[2, 9\] \cup \[10, 13\] \cup \[14, 14\]$, where the complete interval has length 13, and the individual intervals have the lengths 8, 4 and 1 respectably. |
| 22 | +And also here the union consists of at most $\lceil \log_2(\text{length of interval}) \rceil$ many intervals. |
| 23 | + |
| 24 | +The main idea behind Sparse Tables is to precompute all answers for range queries with power of two length. |
| 25 | +Afterwards a different range query can be answered by splitting the range into ranges with power of two lengths, looking up the precomputed answers, and combining them to receive a complete answer. |
| 26 | + |
| 27 | +### Precomputation |
| 28 | + |
| 29 | +We will use a 2-dimensional array for storing the answers to the precomputed queries. |
| 30 | +$\text{st}\[i\]\[j\]$ will store the answer for the range $[i, i + 2^j - 1]$ of length $2^j$. |
| 31 | +The size of the 2-dimensional array will be $\text{MAXN} \times (K + 1)$, where $\text{MAXN}$ is the biggest possible array length. |
| 32 | +$\text{K}$ has to satisfy $\text{K} \ge \lfloor \log_2 \text{MAXN} \rfloor + 1$, because $2^{\lfloor \log_2 \text{MAXN} \rfloor}$ is the biggest power of two range, that we have to support. |
| 33 | +For arrays with reasonable length ($\le 10^7$ elements), $K = 25$ is a good value. |
| 34 | + |
| 35 | +```cpp |
| 36 | +int st[MAXN][K + 1]; |
| 37 | +``` |
| 38 | + |
| 39 | +Because the range $\[i, i + 2^j - 1\]$ of length $2^j$ splits nicely into the ranges $\[i, i + 2^{j - 1} - 1\]$ and $\[i + 2^{j - 1}, i + 2^j - 1\]$, both of length $2^{j - 1}$, we can generate the table efficiently using dynamic programming: |
| 40 | + |
| 41 | +```cpp |
| 42 | +for (int i = 0; i < N; i++) |
| 43 | + st[i][0] = f(array[i]); |
| 44 | + |
| 45 | +for (int j = 1; j <= K; j++) |
| 46 | + for (int i = 0; i + (1 << j) <= N; i++) |
| 47 | + st[i][j] = f(st[i][j-1], st[i + (1 << (j - 1))][j - 1]); |
| 48 | +``` |
| 49 | + |
| 50 | +The function $f$ will depend on the type of query. |
| 51 | +For range sum queries it will compute the sum, for range minimum queries it will compute the minimum. |
| 52 | + |
| 53 | +The time complexity of the precomputation is $O(\text{N} \log \text{N})$. |
| 54 | + |
| 55 | +### Range Sum Queries |
| 56 | + |
| 57 | +For this type of queries, we want to find the sum of all values in a range. |
| 58 | +Therefore the natural definition of the function $f$ is $f(x, y) = x + y$. |
| 59 | +We can construct the data structure with: |
| 60 | + |
| 61 | +```cpp |
| 62 | +long long st[MAXN][K]; |
| 63 | + |
| 64 | +for (int i = 0; i < N; i++) |
| 65 | + st[i][0] = array[i]; |
| 66 | + |
| 67 | +for (int j = 1; j <= K; j++) |
| 68 | + for (int i = 0; i + (1 << j) <= N; i++) |
| 69 | + st[i][j] = st[i][j-1] + st[i + (1 << (j - 1))][j - 1]; |
| 70 | +``` |
| 71 | + |
| 72 | +To answer the sum query for the range $\[L, R\]$, we iterate over all powers of two, starting from the biggest one. |
| 73 | +As soon as a power of two $2^j$ is smaller or equal to the length of the range ($= R - L + 1$), we process the first the first part of range $\[L, L + 2^j - 1\]$, and continue with the remaining range $\[L + 2^j, R\]$. |
| 74 | + |
| 75 | +```cpp |
| 76 | +long long sum = 0; |
| 77 | +for (int j = K; j >= 0; j--) { |
| 78 | + if ((1 << j) <= R - L + 1) { |
| 79 | + sum += st[L][j]; |
| 80 | + L += 1 << j; |
| 81 | + } |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +Time complexity for a Range Sum Query is $O(K) = O(\log \text{MAXN})$. |
| 86 | + |
| 87 | +### Range Minimum Queries (RMQ) |
| 88 | + |
| 89 | +These are the queries where the Sparse Table shines. |
| 90 | +When computing the minimum of a range, it doesn't matter if we process a value in the range once or twice. |
| 91 | +Therefore instead of splitting a range into multiple ranges, we can also split the range into only two overlapping ranges with power of two length. |
| 92 | +E.g. we can split the range $\[1, 6\]$ into the ranges $\[1, 4\]$ and $\[3, 6\]$. |
| 93 | +The range minimum of $\[1, 6\]$ is clearly the same as the minumum of the range minimum of $\[1, 4\]$ and the range minimum of $\[3, 6\]$. |
| 94 | +So we can compute the minimum of the range $\[L, R\]$ with: |
| 95 | + |
| 96 | +$$\min(\text{st}\[L\]\[j\], \text{st}\[R - 2^j + 1][j]) \quad \text{ where } j = \log_2(R - L + 1)$$ |
| 97 | + |
| 98 | +This requires that we are able to compute $\log_2(R - L + 1)$ fast. |
| 99 | +You can accomplish that by precomputing all logarithms: |
| 100 | + |
| 101 | +```cpp |
| 102 | +int log[MAXN+1]; |
| 103 | +log[1] = 0; |
| 104 | +for (int i = 2; i <= MAXN; i++) |
| 105 | + log[i] = log[i/2] + 1; |
| 106 | +``` |
| 107 | + |
| 108 | +Afterwards we need to precompute the Sparse Table structure. This time we define $f$ with $f(x, y) = \min(x, y)$. |
| 109 | + |
| 110 | +```cpp |
| 111 | +int st[MAXN][K]; |
| 112 | + |
| 113 | +for (int i = 0; i < N; i++) |
| 114 | + st[i][0] = array[i]; |
| 115 | + |
| 116 | +for (int j = 1; j <= K; j++) |
| 117 | + for (int i = 0; i + (1 << j) <= N; i++) |
| 118 | + st[i][j] = min(st[i][j-1], st[i + (1 << (j - 1))][j - 1]); |
| 119 | +``` |
| 120 | + |
| 121 | +And the minimum of a range $\[L, R\]$ can be computed with: |
| 122 | + |
| 123 | +```cpp |
| 124 | +int j = log[R - L + 1]; |
| 125 | +int minimum = min(st[L][j], st[R - (1 << j) + 1][j]); |
| 126 | +``` |
| 127 | + |
| 128 | +Time complexity for a Range Minimum Query is $O(1)$. |
| 129 | + |
| 130 | +## Practice Problems |
| 131 | + |
| 132 | +* [SPOJ - RMQSQ](http://www.spoj.com/problems/RMQSQ/) |
| 133 | +* [SPOJ - THRBL](http://www.spoj.com/problems/THRBL/) |
| 134 | +* [Codechef - MSTICK](https://www.codechef.com/problems/MSTICK) |
| 135 | +* [Codechef - SEAD](https://www.codechef.com/problems/SEAD) |
| 136 | +* [Codeforces - CGCDSSQ](http://codeforces.com/contest/475/problem/D) |
| 137 | + |
0 commit comments