You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/news/208-claim-trie-memory-reduction.md
+7-9
Original file line number
Diff line number
Diff line change
@@ -5,11 +5,9 @@ date: '2019-03-26 15:00:00'
5
5
cover: 'database2.jpg'
6
6
category: technical
7
7
---
8
-
9
-
# The claim trie memory reduction
10
8
Here follows a little writeup on some work that we have done to make the data structure in LBRYcrd less naive (and less RAM-hungry). We anticipate that it will ship in a LBRYcrd release in late spring.
11
9
12
-
## Section 1: the requirements
10
+
## Section 1: The Requirements
13
11
14
12
LBRY is a collection of names to bids. You might outline the data like so:
15
13
@@ -44,7 +42,7 @@ It's decided then: we're going to move to a custom container with a tree structu
44
42
45
43
With this kind of structure, I can walk from the root node to the leaf node for any search in O(len(name)) time. I can keep a set of nodes that need their hashes updated as I go. It works okay, but now consider the general inefficiencies of this approach. Example keys: superfluous, stupendous, and stupified. How does that look?
In other words, we're now using 25 nodes to hold three data points. All but two of those nodes have one or no child. 22 of the 25 nodes have an empty data member. This is RAM intensive and very wasteful.
50
48
@@ -58,11 +56,11 @@ Over the years there have been many proposals to improve this structure. I'm goi
58
56
59
57
It ends up that idea #1 makes all the difference. You have to combine the nodes as much as possible. That turns the above trie into 5 nodes down from 25 becoming:
[ Timed experiments for 1 million insertions of random data [a-zA-Z0-9]{1, 60}](https://www.notion.so/adecf55e97fb4c8080e5288bb44cd65d)
63
+
[ Timed experiments for 1 million insertions of random data [a-zA-Z0-9]{1, 60}](https://www.notion.so/lbry/adecf55e97fb4c8080e5288bb44cd65d?v=187bbb545577449489d12bc87a1892bb)
66
64
67
65
A few notes about the table:
68
66
@@ -78,7 +76,7 @@ We also experimented with a memory-mapped backing allocator. Namely: `boost::int
78
76
79
77
We experimented with using [LevelDB](https://github.com/google/leveldb) as the backing store for a custom trie. This has an interesting advantage in that we can keep trie history forever; we can index by hash as well as by name. It could be handy for querying the trie data from a snapshot of ancient days. We had trouble making this performant, though. It's at least an order of magnitude slower; it's not in the same league as the options in the chart. And for the in-RAM trie, rolling back just a few frames for a recent historical snapshot is usually not a big deal. LevelDB has a nice LRU cache feature. We saw that it used about 830MB of RAM with 100MB of LRU configured (for our test of 1M insertions). Whenever we run out of RAM again, this approach may again come into play.
80
78
81
-
## Section 3: how it works
79
+
## Section 3: How it Works
82
80
83
81
A trie is made of nodes. We'll start with a simple definition for that:
84
82
@@ -99,7 +97,7 @@ An illustration of `lower_bound`, assuming `set = std::set<std::string> { "B", "
0 commit comments