Skip to content

Commit 947d39b

Browse files
committed
update notes
1 parent b6df675 commit 947d39b

File tree

5 files changed

+361
-1
lines changed

5 files changed

+361
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
- [Notes of Algorithms, Part I](part-I/readme.md)
44
- [Notes of Algorithms, Part II](part-II/readme.md)
55

6-
The notes are taken from the *Algorithms* course on Coursera ([Part I](https://www.coursera.org/learn/algorithms-part1), [Part II](https://www.coursera.org/learn/algorithms-part2)). This repository is intended to track my own learning process as well as for future reference.
6+
The notes are taken from the *Algorithms* course on Coursera ([Part I](https://www.coursera.org/learn/algorithms-part1), [Part II](https://www.coursera.org/learn/algorithms-part2)).
77

88
> This course covers the essential information that every serious programmer needs to know about algorithms and data structures, with emphasis on applications and scientific performance analysis of Java implementations. Part I covers elementary data structures, sorting, and searching algorithms. Part II focuses on graph- and string-processing algorithms.
99

part-II/readme.md

Lines changed: 360 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,15 @@ Notes are taken from the course [Algorithms, Part II](https://www.coursera.org/l
109109
- [6.7.3. Suffix sorting: worst-case input](#673-suffix-sorting-worst-case-input)
110110
- [6.7.4. Manber-Myers MSD algorithm](#674-manber-myers-msd-algorithm)
111111
- [6.8. String sorting summary](#68-string-sorting-summary)
112+
- [7. Trie](#7-trie)
113+
- [7.1. R-way tries](#71-r-way-tries)
114+
- [7.1.1. Trie performance](#711-trie-performance)
115+
- [7.2. Ternary search trie](#72-ternary-search-trie)
116+
- [7.2.1. TST vs. hashing](#721-tst-vs-hashing)
117+
- [7.2.2. String symbol table implementation cost summary](#722-string-symbol-table-implementation-cost-summary)
118+
- [7.3. Patricia trie](#73-patricia-trie)
119+
- [7.4. Suffix tree](#74-suffix-tree)
120+
- [7.5. String symbol tables summary](#75-string-symbol-tables-summary)
112121

113122

114123
## 1. Undirected Graph
@@ -3442,3 +3451,354 @@ See also:
34423451
- [American flag sort](https://en.wikipedia.org/wiki/American_flag_sort)
34433452

34443453

3454+
3455+
<br/>
3456+
<div align="right">
3457+
<b><a href="#top">↥ back to top</a></b>
3458+
</div>
3459+
<br/>
3460+
3461+
3462+
## 7. Trie
3463+
3464+
Performance of symbol-table implementations
3465+
3466+
Order of growth of the frequency of operations.
3467+
3468+
| implementation | typical case<br/>search | typical case<br/>insert | typical case<br/>delete | ordered operations | operations on keys |
3469+
| :--: | :--: | :--: | :--: | :--: | :--: |
3470+
| red-black BST | log N | log N | log N | yes | compareTo() |
3471+
| hash table | 1 <sup></sup> | 1 <sup></sup> | 1 <sup></sup> | no | equals()<br/>hashCode() |
3472+
3473+
† under uniform hashing assumption
3474+
3475+
As with sorting, we can take advantage of properties of strings to develop search methods (symbol-table implementations) that can be more efficient than the general-purpose methods (i.e., BST, hash tables) for typical applications where search keys are strings.
3476+
3477+
API for a symbol table with string keys
3478+
3479+
| `public class StringST< Value> `| |
3480+
| :--: | :--: |
3481+
| `StringST()` | create a symbol table |
3482+
| `void put(String key, Value val)` | put key-value pair into the table (remove key if value is null) |
3483+
| `Value get(String key)` | value paired with key (null if key is absent) |
3484+
| `void delete(String key)` | remove key (and its value) |
3485+
| `boolean contains(String key) `| is there a value paired with key? |
3486+
| `boolean isEmpty()` | is the table empty? |
3487+
| `String longestPrefixOf(String s)` | the longest key that is a prefix of s |
3488+
| `Iterable<String> keysWithPrefix(String s)` | all the keys having s as a prefix |
3489+
| `Iterable<String> keysThatMatch(String s)` | all the keys that match s (where . matches any character) |
3490+
| `int size()` | number of key-value pairs |
3491+
| `Iterable<String> keys()` | all the keys in the table |
3492+
3493+
3494+
### 7.1. R-way tries
3495+
3496+
Tries. [from re**trie**val, but pronounced "try"]
3497+
- Store characters in nodes (not keys).
3498+
- Each node has R children, one for each possible character.
3499+
3500+
<img src="res/trie.png" alt="Trie" width="500">
3501+
3502+
ALGORITHM: Trie symbol table
3503+
3504+
3505+
```java
3506+
public class TrieST<Value>
3507+
{
3508+
private static int R = 256; // radix, extended ASCII
3509+
private Node root; // root of trie
3510+
3511+
private static class Node
3512+
{
3513+
private Object val;
3514+
private Node[] next = new Node[R];
3515+
}
3516+
3517+
public Value get(String key)
3518+
{
3519+
Node x = get(root, key, 0);
3520+
if (x == null) return null;
3521+
return (Value) x.val; // cast needed
3522+
}
3523+
3524+
private Node get(Node x, String key, int d)
3525+
{ // Return value associated with key in the subtrie rooted at x.
3526+
if (x == null) return null;
3527+
if (d == key.length()) return x;
3528+
char c = key.charAt(d); // Use dth key char to identify subtrie.
3529+
return get(x.next[c], key, d+1);
3530+
}
3531+
3532+
public void put(String key, Value val)
3533+
{ root = put(root, key, val, 0); }
3534+
3535+
private Node put(Node x, String key, Value val, int d)
3536+
{ // Change value associated with key if in subtrie rooted at x.
3537+
if (x == null) x = new Node();
3538+
if (d == key.length()) { x.val = val; return x; }
3539+
char c = key.charAt(d); // Use dth key char to identify subtrie.
3540+
x.next[c] = put(x.next[c], key, val, d+1);
3541+
return x;
3542+
}
3543+
3544+
public Iterable<String> keys()
3545+
{ return keysWithPrefix(""); }
3546+
3547+
public Iterable<String> keysWithPrefix(String pre)
3548+
{
3549+
Queue<String> q = new Queue<String>();
3550+
collect(get(root, pre, 0), pre, q);
3551+
return q;
3552+
}
3553+
3554+
private void collect(Node x, String pre, Queue<String> q)
3555+
{
3556+
if (x == null) return;
3557+
if (x.val != null) q.enqueue(pre);
3558+
for (char c = 0; c < R; c++)
3559+
collect(x.next[c], pre + c, q);
3560+
}
3561+
}
3562+
```
3563+
3564+
NOTE: The value in Node has to be an Object because Java does not support arrays of generics; cast values back to Value in `get()`.
3565+
3566+
3567+
**Deletion in an R-way trie**
3568+
3569+
To delete a key-value pair:
3570+
- Find the node corresponding to key and set value to null.
3571+
- If node has null value and all null links, remove that node (and recur).
3572+
3573+
3574+
**Collecting keys**
3575+
3576+
To iterate through all keys in sorted order:
3577+
- Do inorder traversal of trie; add keys encountered to a queue.
3578+
- Maintain sequence of characters *on path from root to node*.
3579+
- Use `collect()` method to collect keys for both the `keys()` and the `keysWithPrefix()` methods in the API.
3580+
3581+
<img src="res/trie-collecting.png" alt="collecting keys" width="250">
3582+
3583+
3584+
3585+
**Prefix matches**
3586+
3587+
Find all keys in a symbol table starting with a given prefix.
3588+
3589+
Ex. Autocomplete in a cell phone, search bar, text editor, or shell.
3590+
- User types characters one at a time.
3591+
- System reports all matching strings.
3592+
3593+
3594+
***Interview question***. Design a data structure to perform efficient spell checking.
3595+
3596+
<details>
3597+
<summary>Solution.</summary>
3598+
3599+
Build a 26-way trie, key = word, value = bit).
3600+
3601+
</details>
3602+
3603+
3604+
3605+
**Longest prefix**
3606+
3607+
Find longest key in symbol table that is a prefix of query string.
3608+
- Search for query string.
3609+
- Keep track of longest key encountered.
3610+
3611+
Ex. To send packet toward destination IP address, router chooses IP address in routing table that is longest prefix match.
3612+
3613+
```java
3614+
public String longestPrefixOf(String query)
3615+
{
3616+
int length = search(root, query, 0, 0);
3617+
return query.substring(0, length);
3618+
}
3619+
private int search(Node x, String query, int d, int length)
3620+
{
3621+
if (x == null) return length;
3622+
if (x.val != null) length = d;
3623+
if (d == query.length()) return length;
3624+
char c = query.charAt(d);
3625+
return search(x.next[c], query, d+1, length);
3626+
}
3627+
```
3628+
3629+
3630+
3631+
3632+
#### 7.1.1. Trie performance
3633+
3634+
**Search hit**. Need to examine all `L` characters for equality.
3635+
3636+
**Search miss**.
3637+
- Could have mismatch on first character.
3638+
- Typical case: examine only a few characters (sublinear).
3639+
3640+
**Space**. `R` null links at each leaf.
3641+
(but sublinear space possible if many short strings share common prefixes)
3642+
3643+
**Bottom line**. Fast search hit and even faster search miss, but **wastes space**.
3644+
3645+
3646+
3647+
3648+
### 7.2. Ternary search trie
3649+
3650+
3651+
Ternary search tries: another data structure that responds to the problem of too much memory space used by tries.
3652+
3653+
- Store characters and values in nodes (not keys).
3654+
- Each node has 3 children: smaller (left), equal (middle), larger (right).
3655+
3656+
**Search hit**. Node where search ends has a non-null value.
3657+
3658+
**Search miss**. Reach a null link or node where search ends has null value.
3659+
3660+
3661+
<img src="res/trie-TST.png" alt="Ternary search trie" width="200">
3662+
3663+
ALGORITHM: TST symbol table
3664+
3665+
```java
3666+
public class TST<Value>
3667+
{
3668+
private Node root; // root of trie
3669+
3670+
private class Node
3671+
{
3672+
char c; // character
3673+
Node left, mid, right; // left, middle, and right subtries
3674+
Value val; // value associated with string
3675+
}
3676+
3677+
public Value get(String key) // same as for tries (See page 737).
3678+
3679+
private Node get(Node x, String key, int d)
3680+
{
3681+
if (x == null) return null;
3682+
char c = key.charAt(d);
3683+
if (c < x.c) return get(x.left, key, d);
3684+
else if (c > x.c) return get(x.right, key, d);
3685+
else if (d < key.length() - 1)
3686+
return get(x.mid, key, d+1);
3687+
else return x;
3688+
}
3689+
3690+
public void put(String key, Value val)
3691+
{ root = put(root, key, val, 0); }
3692+
3693+
private Node put(Node x, String key, Value val, int d)
3694+
{
3695+
char c = key.charAt(d);
3696+
if (x == null) { x = new Node(); x.c = c; }
3697+
if (c < x.c) x.left = put(x.left, key, val, d);
3698+
else if (c > x.c) x.right = put(x.right, key, val, d);
3699+
else if (d < key.length() - 1)
3700+
x.mid = put(x.mid, key, val, d+1);
3701+
else x.val = val;
3702+
return x;
3703+
}
3704+
}
3705+
```
3706+
3707+
> - **TST** correspond to **3-way string quicksort** in the same way that
3708+
> - **BST** correspond to **quicksort** and
3709+
> - **trie** correspond to **MSD sorting**.
3710+
3711+
3712+
#### 7.2.1. TST vs. hashing
3713+
3714+
| algorithm | property |
3715+
| :--: | :-- |
3716+
| Hashing | <ul><li>Need to examine entire key.</li><li>Search hits and misses cost about the same.</li><li>Performance relies on hash function.</li><li>Does not support ordered symbol table operations.</li></ul> |
3717+
| TST | <ul><li>Works only for strings (or digital keys).</li><li>Only examines just enough key characters.</li><li>Search miss may involve only a few characters.</li><li>Supports ordered symbol table operations (plus others!).</li></ul> |
3718+
3719+
**Bottom line**. TSTs are:
3720+
- Faster than hashing (especially for search misses).
3721+
- More flexible than red-black BSTs.
3722+
3723+
#### 7.2.2. String symbol table implementation cost summary
3724+
3725+
| implementation | character accesses<br/>(typical case)<br/>**search hit** | character accesses<br/>(typical case)<br/>**search miss** | character accesses<br/>(typical case)<br/>**insert** | character accesses<br/>(typical case)<br/>**space (references)** | moby.txt | actors.txt |
3726+
| :--: | :--: | :--: | :--: | :--: | :--: | :--: |
3727+
| red-black BST | L + c lg <sup>2</sup> N | c lg <sup>2</sup> N | c lg <sup>2</sup> N | 4 N | 1.40 | 97.4 |
3728+
| hash table | L | L | L | 4 N to 16 N | 0.76 | 40.6 |
3729+
| R-way trie | L | log <sub>R</sub> N | L | (R + 1) N | 1.12 | out of memory |
3730+
| TST | L + ln N | ln N | L + ln N | 4 N | 0.72 | 38.7 |
3731+
| TST with R<sup>2</sup> | L + ln N | ln N | L + ln N | 4 N + R<sup>2</sup> | 0.51 | 32.7 |
3732+
3733+
3734+
3735+
- R-way trie
3736+
- Method of choice for small *R*.
3737+
- Too much memory for large *R*.
3738+
- Challenge. Use less memory, e.g., 65,536-way trie for Unicode!
3739+
- TST
3740+
- Can build balanced TSTs via rotations to achieve *L + log N* worst-case guarantees.
3741+
- TST is as fast as hashing (for string keys), space efficient.
3742+
- TST with R<sup>2</sup> branching at root. (Hybrid of R-way trie and TST.)
3743+
- Do R<sup>2</sup>-way branching at root.
3744+
- Each of R<sup>2</sup> root nodes points to a TST.
3745+
3746+
3747+
> If space is available, **R-way tries** provide the fastest search, essentially completing the job with a constant number of character compares. For *large alphabets*, where space may not be available for R-way tries, **TSTs** are preferable, since they use a logarithmic number of character compares, while **BSTs** use a logarithmic number of key compares. **Hashing** can be competitive, but, as usual, cannot support ordered symbol-table operations or extended character-based API operations such as *prefix* or *wildcard match*.
3748+
3749+
3750+
### 7.3. Patricia trie
3751+
3752+
[Practical Algorithm to Retrieve Information Coded in Alphanumeric]
3753+
- Remove one-way branching.
3754+
- Each node represents a sequence of characters.
3755+
- Implementation: one step beyond this course.
3756+
3757+
> **Applications**.
3758+
> - Database search.
3759+
> - P2P network search.
3760+
> - IP routing tables: find longest prefix match.
3761+
> - Compressed quad-tree for N-body simulation.
3762+
> - Efficiently storing and querying XML documents.
3763+
3764+
3765+
Also known as: crit-bit tree, radix tree.
3766+
3767+
3768+
### 7.4. Suffix tree
3769+
3770+
Suffix tree.
3771+
- Patricia trie of suffixes of a string.
3772+
- Linear-time construction: beyond this course.
3773+
3774+
> **Applications**.
3775+
> - Linear-time: longest repeated substring, longest common substring, longest palindromic substring, substring search, tandem repeats, ….
3776+
> - Computational biology databases (BLAST, FASTA).
3777+
3778+
3779+
3780+
### 7.5. String symbol tables summary
3781+
3782+
| algorithm | property |
3783+
| :--: | :-- |
3784+
| Red-black BST | <ul><li>Performance guarantee: log N key compares.</li><li>Supports ordered symbol table API.</li></ul> |
3785+
| Hash tables | <ul><li>Performance guarantee: constant number of probes.</li><li>Requires good hash function for key type.</li></ul> |
3786+
| Tries. R-way, TST | <ul><li>Performance guarantee: log N characters accessed.</li><li>Supports character-based operations.</li></ul> |
3787+
3788+
3789+
3790+
See also:
3791+
- [Trie](https://en.wikipedia.org/wiki/Trie)
3792+
- [Patricia trie](https://en.wikipedia.org/wiki/Trie#:~:text=memory.%5B8%5D-,Patricia%20trees,-%5Bedit%5D)
3793+
- [Suffix tree](https://en.wikipedia.org/wiki/Suffix_tree)
3794+
3795+
3796+
<br/>
3797+
<div align="right">
3798+
<b><a href="#top">↥ back to top</a></b>
3799+
</div>
3800+
<br/>
3801+
3802+
3803+
3804+

part-II/res/trie-TST.png

71 KB
Loading

part-II/res/trie-collecting.png

75.1 KB
Loading

part-II/res/trie.png

61.7 KB
Loading

0 commit comments

Comments
 (0)