Skip to content

Commit eb752f2

Browse files
authored
fix images, titles, notion link
1 parent 3e84332 commit eb752f2

File tree

1 file changed

+7
-9
lines changed

1 file changed

+7
-9
lines changed

content/news/208-claim-trie-memory-reduction.md

+7-9
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,9 @@ date: '2019-03-26 15:00:00'
55
cover: 'database2.jpg'
66
category: technical
77
---
8-
9-
# The claim trie memory reduction
108
Here follows a little writeup on some work that we have done to make the data structure in LBRYcrd less naive (and less RAM-hungry). We anticipate that it will ship in a LBRYcrd release in late spring.
119

12-
## Section 1: the requirements
10+
## Section 1: The Requirements
1311

1412
LBRY is a collection of names to bids. You might outline the data like so:
1513

@@ -44,7 +42,7 @@ It's decided then: we're going to move to a custom container with a tree structu
4442

4543
With this kind of structure, I can walk from the root node to the leaf node for any search in O(len(name)) time. I can keep a set of nodes that need their hashes updated as I go. It works okay, but now consider the general inefficiencies of this approach. Example keys: superfluous, stupendous, and stupified. How does that look?
4644

47-
![](208-orig-trie.png)
45+
![original trie](https://spee.ch/0/208-compressed-trie.png)
4846

4947
In other words, we're now using 25 nodes to hold three data points. All but two of those nodes have one or no child. 22 of the 25 nodes have an empty data member. This is RAM intensive and very wasteful.
5048

@@ -58,11 +56,11 @@ Over the years there have been many proposals to improve this structure. I'm goi
5856

5957
It ends up that idea #1 makes all the difference. You have to combine the nodes as much as possible. That turns the above trie into 5 nodes down from 25 becoming:
6058

61-
![](208-compressed-trie.png)
59+
![compressed trie](https://spee.ch/4/208-compressed-trie-blog.png)
6260

63-
## Section 2: the experiments
61+
## Section 2: The Experiments
6462

65-
[ Timed experiments for 1 million insertions of random data [a-zA-Z0-9]{1, 60}](https://www.notion.so/adecf55e97fb4c8080e5288bb44cd65d)
63+
[ Timed experiments for 1 million insertions of random data [a-zA-Z0-9]{1, 60}](https://www.notion.so/lbry/adecf55e97fb4c8080e5288bb44cd65d?v=187bbb545577449489d12bc87a1892bb)
6664

6765
A few notes about the table:
6866

@@ -78,7 +76,7 @@ We also experimented with a memory-mapped backing allocator. Namely: `boost::int
7876

7977
We experimented with using [LevelDB](https://github.com/google/leveldb) as the backing store for a custom trie. This has an interesting advantage in that we can keep trie history forever; we can index by hash as well as by name. It could be handy for querying the trie data from a snapshot of ancient days. We had trouble making this performant, though. It's at least an order of magnitude slower; it's not in the same league as the options in the chart. And for the in-RAM trie, rolling back just a few frames for a recent historical snapshot is usually not a big deal. LevelDB has a nice LRU cache feature. We saw that it used about 830MB of RAM with 100MB of LRU configured (for our test of 1M insertions). Whenever we run out of RAM again, this approach may again come into play.
8078

81-
## Section 3: how it works
79+
## Section 3: How it Works
8280

8381
A trie is made of nodes. We'll start with a simple definition for that:
8482

@@ -99,7 +97,7 @@ An illustration of `lower_bound`, assuming `set = std::set<std::string> { "B", "
9997

10098
The general find algorithm:
10199

102-
![](208-flowchart.png)
100+
![flow chart](https://spee.ch/9/208-flowchart.png)
103101

104102
The general find algorithm in pseudo-code:
105103

0 commit comments

Comments
 (0)