Skip to content

Commit d890b87

Browse files
Fixed spelling mistake in suffix-automaton.md (cp-algorithms#1391)
* Fixed spelling mistake in suffix-automaton.md * Fixed another mistake in the algorithm section * More fixes in suffix-automaton.md
1 parent efecea8 commit d890b87

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

src/string/suffix-automaton.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -221,11 +221,11 @@ Let us describe this process:
221221
(Initially we set $last = 0$, and we will change $last$ in the last step of the algorithm accordingly.)
222222
- Create a new state $cur$, and assign it with $len(cur) = len(last) + 1$.
223223
The value $link(cur)$ is not known at the time.
224-
- Now we to the following procedure:
224+
- Now we do the following procedure:
225225
We start at the state $last$.
226226
While there isn't a transition through the letter $c$, we will add a transition to the state $cur$, and follow the suffix link.
227227
If at some point there already exists a transition through the letter $c$, then we will stop and denote this state with $p$.
228-
- If it haven't found such a state $p$, then we reached the fictitious state $-1$, then we can just assign $link(cur) = 0$ and leave.
228+
- If we haven't found such a state $p$, then we reached the fictitious state $-1$, then we can just assign $link(cur) = 0$ and leave.
229229
- Suppose now that we have found a state $p$, from which there exists a transition through the letter $c$.
230230
We will denote the state, to which the transition leads, with $q$.
231231
- Now we have two cases. Either $len(p) + 1 = len(q)$, or not.
@@ -241,7 +241,7 @@ Let us describe this process:
241241

242242
- In any of the three cases, after completing the procedure, we update the value $last$ with the state $cur$.
243243

244-
If we also want to know which states are **terminal** and which are not, the we can find all terminal states after constructing the complete suffix automaton for the entire string $s$.
244+
If we also want to know which states are **terminal** and which are not, we can find all terminal states after constructing the complete suffix automaton for the entire string $s$.
245245
To do this, we take the state corresponding to the entire string (stored in the variable $last$), and follow its suffix links until we reach the initial state.
246246
We will mark all visited states as terminal.
247247
It is easy to understand that by doing so we will mark exactly the states corresponding to all the suffixes of the string $s$, which are exactly the terminal states.
@@ -280,7 +280,7 @@ The linearity of the number of transitions, and in general the linearity of the
280280

281281
- In the second case we came across an existing transition $(p, q)$.
282282
This means that we tried to add a string $x + c$ (where $x$ is a suffix of $s$) to the machine that **already exists** in the machine (the string $x + c$ already appears as a substring of $s$).
283-
Since we assume that the automaton for the string $s$ is build correctly, we should not add a new transition here.
283+
Since we assume that the automaton for the string $s$ is built correctly, we should not add a new transition here.
284284

285285
However there is a difficulty.
286286
To which state should the suffix link from the state $cur$ lead?
@@ -324,7 +324,7 @@ If we consider all parts of the algorithm, then it contains three places in the
324324
- The second place is the copying of transitions when the state $q$ is cloned into a new state $clone$.
325325
- Third place is changing the transition leading to $q$, redirecting them to $clone$.
326326

327-
We use the fact that the size of the suffix automaton (both in number of states and in the number of transitions) is **linear**.
327+
We use the fact that the size of the suffix automaton (both in the number of states and in the number of transitions) is **linear**.
328328
(The proof of the linearity of the number of states is the algorithm itself, and the proof of linearity of the number of states is given below, after the implementation of the algorithm).
329329

330330
Thus the total complexity of the **first and second places** is obvious, after all each operation adds only one amortized new transition to the automaton.
@@ -334,7 +334,7 @@ We denote $v = longest(p)$.
334334
This is a suffix of the string $s$, and with each iteration its length decreases - and therefore the position $v$ as the suffix of the string $s$ increases monotonically with each iteration.
335335
In this case, if before the first iteration of the loop, the corresponding string $v$ was at the depth $k$ ($k \ge 2$) from $last$ (by counting the depth as the number of suffix links), then after the last iteration the string $v + c$ will be a $2$-th suffix link on the path from $cur$ (which will become the new value $last$).
336336

337-
Thus, each iteration of this loop leads to the fact that the position of the string $longest(link(link(last))$ as suffix of the current string will monotonically increase.
337+
Thus, each iteration of this loop leads to the fact that the position of the string $longest(link(link(last))$ as a suffix of the current string will monotonically increase.
338338
Therefore this cycle cannot be executed more than $n$ iterations, which was required to prove.
339339

340340
### Implementation
@@ -444,7 +444,7 @@ Let the current non-continuous transition be $(p, q)$ with the character $c$.
444444
We take the correspondent string $u + c + w$, where the string $u$ corresponds to the longest path from the initial state to $p$, and $w$ to the longest path from $q$ to any terminal state.
445445
On one hand, each such string $u + c + w$ for each incomplete strings will be different (since the strings $u$ and $w$ are formed only by complete transitions).
446446
On the other hand each such string $u + c + w$, by the definition of the terminal states, will be a suffix of the entire string $s$.
447-
Since there are only $n$ non-empty suffixes of $s$, and non of the strings $u + c + w$ can contain $s$ (because the entire string only contains complete transitions), the total number of incomplete transitions does not exceed $n - 1$.
447+
Since there are only $n$ non-empty suffixes of $s$, and none of the strings $u + c + w$ can contain $s$ (because the entire string only contains complete transitions), the total number of incomplete transitions does not exceed $n - 1$.
448448
449449
Combining these two estimates gives us the bound $3n - 3$.
450450
However, since the maximum number of states can only be achieved with the test case $\text{"abbb\dots bbb"}$ and this case has clearly less than $3n - 3$ transitions, we get the tighter bound of $3n - 4$ for the number of transitions in a suffix automaton.
@@ -460,7 +460,7 @@ For the simplicity we assume that the alphabet size $k$ is constant, which allow
460460
461461
### Check for occurrence
462462
463-
Given a text $T$, and multiple patters $P$.
463+
Given a text $T$, and multiple patterns $P$.
464464
We have to check whether or not the strings $P$ appear as a substring of $T$.
465465
466466
We build a suffix automaton of the text $T$ in $O(length(T))$ time.
@@ -525,7 +525,7 @@ The value $ans[v]$ can be computed using the recursion:
525525

526526
$$ans[v] = \sum_{w : (v, w, c) \in DAWG} d[w] + ans[w]$$
527527

528-
We take the answer of each adjacent vertex $w$, and add to it $d[w]$ (since every substrings is one character longer when starting from the state $v$).
528+
We take the answer of each adjacent vertex $w$, and add to it $d[w]$ (since every substring is one character longer when starting from the state $v$).
529529

530530
Again this task can be computed in $O(length(S))$ time.
531531

@@ -547,15 +547,15 @@ long long get_tot_len_diff_substings() {
547547
}
548548
```
549549

550-
This approaches runs in $O(length(S))$ time, but experimentally runs 20x faster than the memoized dynamic programming version on randomized strings. It requires no extra space and no recursion.
550+
This approach runs in $O(length(S))$ time, but experimentally runs 20x faster than the memoized dynamic programming version on randomized strings. It requires no extra space and no recursion.
551551

552552
### Lexicographically $k$-th substring {data-toc-label="Lexicographically k-th substring"}
553553

554554
Given a string $S$.
555555
We have to answer multiple queries.
556556
For each given number $K_i$ we have to find the $K_i$-th string in the lexicographically ordered list of all substrings.
557557

558-
The solution of this problem is based on the idea of the previous two problems.
558+
The solution to this problem is based on the idea of the previous two problems.
559559
The lexicographically $k$-th substring corresponds to the lexicographically $k$-th path in the suffix automaton.
560560
Therefore after counting the number of paths from each state, we can easily search for the $k$-th path starting from the root of the automaton.
561561

@@ -577,7 +577,7 @@ Total time complexity is $O(length(S))$.
577577

578578
For a given text $T$.
579579
We have to answer multiple queries.
580-
For each given pattern $P$ we have to find out how many times the string $P$ appears in the string $T$ as substring.
580+
For each given pattern $P$ we have to find out how many times the string $P$ appears in the string $T$ as a substring.
581581

582582
We construct the suffix automaton for the text $T$.
583583

@@ -603,7 +603,7 @@ Therefore initially we have $cnt = 1$ for each such state, and $cnt = 0$ for all
603603
Then we apply the following operation for each $v$: $cnt[link(v)] \text{ += } cnt[v]$.
604604
The meaning behind this is, that if a string $v$ appears $cnt[v]$ times, then also all its suffixes appear at the exact same end positions, therefore also $cnt[v]$ times.
605605

606-
Why don't we overcount in this procedure (i.e. don't count some position twice)?
606+
Why don't we overcount in this procedure (i.e. don't count some positions twice)?
607607
Because we add the positions of a state to only one other state, so it can not happen that one state directs its positions to another state twice in two different ways.
608608

609609
Thus we can compute the quantities $cnt$ for all states in the automaton in $O(length(T))$ time.
@@ -690,7 +690,7 @@ void output_all_occurrences(int v, int P_length) {
690690
### Shortest non-appearing string
691691
692692
Given a string $S$ and a certain alphabet.
693-
We have to find a string of smallest length, that doesn't appear in $S$.
693+
We have to find a string of the smallest length, that doesn't appear in $S$.
694694
695695
We will apply dynamic programming on the suffix automaton built for the string $S$.
696696
@@ -706,7 +706,7 @@ The answer to the problem will be $d[t_0]$, and the actual string can be restore
706706
### Longest common substring of two strings
707707
708708
Given two strings $S$ and $T$.
709-
We have to find the longest common substring, i.e. such a string $X$ that appears as substring in $S$ and also in $T$.
709+
We have to find the longest common substring, i.e. such a string $X$ that appears as a substring in $S$ and also in $T$.
710710
711711
We construct a suffix automaton for the string $S$.
712712

0 commit comments

Comments
 (0)