Skip to content

Commit dd1bead

Browse files
jakobkoglertcNickolas
authored andcommitted
Add Sparse-Table article (#168)
1 parent 7fb45e7 commit dd1bead

File tree

3 files changed

+141
-1
lines changed

3 files changed

+141
-1
lines changed

src/data_structures/sparse-table.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
<!--?title Sparse Table-->
2+
3+
# Sparse Table
4+
5+
Sparse Table is a data structure, that allows answering range queries.
6+
It can answer most range queries in $O(\log n)$, but its true power is answering range minimum queries (or equivalent range maximum queries).
7+
For those queries it can compute the answer in $O(1)$ time.
8+
9+
The only drawback of this data structure is, that it can only be used on _immutable_ arrays.
10+
This means, that the array cannot be changed between two queries.
11+
If any element in the array changes, the complete data structure has to be recomputed.
12+
13+
### Intuition
14+
15+
Any non-negative number can be uniquely represented as a sum of decreasing powers of two.
16+
This is just a variant of the binary representation of a number.
17+
E.g. $13 = (1101)_2 = 8 + 4 + 1$.
18+
For a number $x$ there can be at most $\lceil \log_2 x \rceil$ summands.
19+
20+
By the same reasoning any interval can be uniquely represented as a union of intervals with lengths that are decreasing powers of two.
21+
E.g. $\[2, 14\] = \[2, 9\] \cup \[10, 13\] \cup \[14, 14\]$, where the complete interval has length 13, and the individual intervals have the lengths 8, 4 and 1 respectably.
22+
And also here the union consists of at most $\lceil \log_2(\text{length of interval}) \rceil$ many intervals.
23+
24+
The main idea behind Sparse Tables is to precompute all answers for range queries with power of two length.
25+
Afterwards a different range query can be answered by splitting the range into ranges with power of two lengths, looking up the precomputed answers, and combining them to receive a complete answer.
26+
27+
### Precomputation
28+
29+
We will use a 2-dimensional array for storing the answers to the precomputed queries.
30+
$\text{st}\[i\]\[j\]$ will store the answer for the range $[i, i + 2^j - 1]$ of length $2^j$.
31+
The size of the 2-dimensional array will be $\text{MAXN} \times (K + 1)$, where $\text{MAXN}$ is the biggest possible array length.
32+
$\text{K}$ has to satisfy $\text{K} \ge \lfloor \log_2 \text{MAXN} \rfloor + 1$, because $2^{\lfloor \log_2 \text{MAXN} \rfloor}$ is the biggest power of two range, that we have to support.
33+
For arrays with reasonable length ($\le 10^7$ elements), $K = 25$ is a good value.
34+
35+
```cpp
36+
int st[MAXN][K + 1];
37+
```
38+
39+
Because the range $\[i, i + 2^j - 1\]$ of length $2^j$ splits nicely into the ranges $\[i, i + 2^{j - 1} - 1\]$ and $\[i + 2^{j - 1}, i + 2^j - 1\]$, both of length $2^{j - 1}$, we can generate the table efficiently using dynamic programming:
40+
41+
```cpp
42+
for (int i = 0; i < N; i++)
43+
st[i][0] = f(array[i]);
44+
45+
for (int j = 1; j <= K; j++)
46+
for (int i = 0; i + (1 << j) <= N; i++)
47+
st[i][j] = f(st[i][j-1], st[i + (1 << (j - 1))][j - 1]);
48+
```
49+
50+
The function $f$ will depend on the type of query.
51+
For range sum queries it will compute the sum, for range minimum queries it will compute the minimum.
52+
53+
The time complexity of the precomputation is $O(\text{N} \log \text{N})$.
54+
55+
### Range Sum Queries
56+
57+
For this type of queries, we want to find the sum of all values in a range.
58+
Therefore the natural definition of the function $f$ is $f(x, y) = x + y$.
59+
We can construct the data structure with:
60+
61+
```cpp
62+
long long st[MAXN][K];
63+
64+
for (int i = 0; i < N; i++)
65+
st[i][0] = array[i];
66+
67+
for (int j = 1; j <= K; j++)
68+
for (int i = 0; i + (1 << j) <= N; i++)
69+
st[i][j] = st[i][j-1] + st[i + (1 << (j - 1))][j - 1];
70+
```
71+
72+
To answer the sum query for the range $\[L, R\]$, we iterate over all powers of two, starting from the biggest one.
73+
As soon as a power of two $2^j$ is smaller or equal to the length of the range ($= R - L + 1$), we process the first the first part of range $\[L, L + 2^j - 1\]$, and continue with the remaining range $\[L + 2^j, R\]$.
74+
75+
```cpp
76+
long long sum = 0;
77+
for (int j = K; j >= 0; j--) {
78+
if ((1 << j) <= R - L + 1) {
79+
sum += st[L][j];
80+
L += 1 << j;
81+
}
82+
}
83+
```
84+
85+
Time complexity for a Range Sum Query is $O(K) = O(\log \text{MAXN})$.
86+
87+
### Range Minimum Queries (RMQ)
88+
89+
These are the queries where the Sparse Table shines.
90+
When computing the minimum of a range, it doesn't matter if we process a value in the range once or twice.
91+
Therefore instead of splitting a range into multiple ranges, we can also split the range into only two overlapping ranges with power of two length.
92+
E.g. we can split the range $\[1, 6\]$ into the ranges $\[1, 4\]$ and $\[3, 6\]$.
93+
The range minimum of $\[1, 6\]$ is clearly the same as the minumum of the range minimum of $\[1, 4\]$ and the range minimum of $\[3, 6\]$.
94+
So we can compute the minimum of the range $\[L, R\]$ with:
95+
96+
$$\min(\text{st}\[L\]\[j\], \text{st}\[R - 2^j + 1][j]) \quad \text{ where } j = \log_2(R - L + 1)$$
97+
98+
This requires that we are able to compute $\log_2(R - L + 1)$ fast.
99+
You can accomplish that by precomputing all logarithms:
100+
101+
```cpp
102+
int log[MAXN+1];
103+
log[1] = 0;
104+
for (int i = 2; i <= MAXN; i++)
105+
log[i] = log[i/2] + 1;
106+
```
107+
108+
Afterwards we need to precompute the Sparse Table structure. This time we define $f$ with $f(x, y) = \min(x, y)$.
109+
110+
```cpp
111+
int st[MAXN][K];
112+
113+
for (int i = 0; i < N; i++)
114+
st[i][0] = array[i];
115+
116+
for (int j = 1; j <= K; j++)
117+
for (int i = 0; i + (1 << j) <= N; i++)
118+
st[i][j] = min(st[i][j-1], st[i + (1 << (j - 1))][j - 1]);
119+
```
120+
121+
And the minimum of a range $\[L, R\]$ can be computed with:
122+
123+
```cpp
124+
int j = log[R - L + 1];
125+
int minimum = min(st[L][j], st[R - (1 << j) + 1][j]);
126+
```
127+
128+
Time complexity for a Range Minimum Query is $O(1)$.
129+
130+
## Practice Problems
131+
132+
* [SPOJ - RMQSQ](http://www.spoj.com/problems/RMQSQ/)
133+
* [SPOJ - THRBL](http://www.spoj.com/problems/THRBL/)
134+
* [Codechef - MSTICK](https://www.codechef.com/problems/MSTICK)
135+
* [Codechef - SEAD](https://www.codechef.com/problems/SEAD)
136+
* [Codeforces - CGCDSSQ](http://codeforces.com/contest/475/problem/D)
137+

src/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ especially popular in field of competitive programming.*
2828

2929
### Data Structures
3030
- [Fenwick Tree](./data_structures/fenwick.html)
31+
- [Sparse Table](./data_structures/sparse-table.html)
3132
- [Treap](./data_structures/treap.html)
3233
- [Sqrt Decomposition](./data_structures/sqrt_decomposition.html)
3334

src/sequences/rmq.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,12 @@ From the list of data structures described on this site, you can choose:
1818
Pros: good runtime complexity. Cons: larger amount of code compared to the other data structures.
1919
- [Fenwick tree](../data_structures/fenwick.html) - answers each query in $O(\log N)$, preprocessing done in $O(N \log N)$.
2020
Pros: the shortest code, good runtime complexity. Cons: Fenwick tree can only be used for queries with $L = 1$, so it is not applicable to many problems.
21+
- [Sparse Table](../data_structures/sparse-table.html) - answers each query in $O(1)$, preprocessing done in $O(N \log N)$.
22+
Pros: simple data structure, excellent runtime complexity. Cons: doesn't allow modifications on the array between queries.
2123

2224
Note: Preprocessing is the preliminary processing of the given array by building corresponding data structure for it.
2325

24-
If the array $A$ might change during the runtime (i.e. there will also be queries to change values in some interval), the problem can only be solved by [Sqrt-decomposition]() or [Segment tree]().
26+
If the array $A$ might change during the runtime (i.e. there will also be queries to change values in some interval), the problem can only be solved by [Sqrt-decomposition](), [Segment tree]() or [Fenwick tree](../data_structures/fenwick.html).
2527

2628
## Practice Problems
2729
- [SPOJ: Range Minimum Query](http://www.spoj.com/problems/RMQSQ/)

0 commit comments

Comments
 (0)