-
Notifications
You must be signed in to change notification settings - Fork 387
UPDATE request implementation (ropes)
The current UPDATE request implementation has a few logical defects. The old algorithm recalculates tuple indexes at the last moment, and thus it can’t operate with inserted fields in the same request which inserts them, and when a request containing several operations, e.g. inserts, is split in two or more requests, the end result may be different from the case when all these operations are put together. Some operation sequences, such as push_back(field)
(it equals to insert(field, -1)
) pop_back(field)
(it equals to delete(-1)
) have non-obvious behavior.
That is why we should develop a new algorithm which doesn’t have the described defect. The algorithm should satisfy the following criteria:
- Any sequence of operations, executed either in a single request or as two or more smaller requests (with the same ordering) should have the same effect;
- The algorithm should have computational complexity O(n*log(n)) or lower, where n is the number of operations in the request packet (we assume that n << m, where m is the number of fields in the tuple);
- The algorithm should be optimized to minimize memory allocation.
A simple implementation which satisfies the criteria is:
- move all tuple fields to a vector,
- apply all operations to the fields of the vector,
- write the vector to the new tuple.
It has computational complexity O(mm), where m -- the number of fields in the tuple. However m >> n, therefore O(mm) > O(n*log(n)).
The algorithm from the previous section can be modified in the following way:
- create a list which initially contains only one element -- a blob with all tuple fields?
- when there is an operation on field i, iterate through the list until an element which "contains" the required field is found. If the element spans more than one field, spit it in 3 parts: a head (all fields before the required field), the required field, and a tail (all fields after the required field).
Pseudo code:
struct list {
struct list *next;
struct field *first_field;
int field_count;
};
field *
get_field_from_list(list *head, int n)
{
for (list *l = head; l != NULL; ++l) {
if (l->field_count < n)
if (l->field_count == 1)
/* this node contains only one field, and we can return it right away */
return l->first_field;
else
/* We should cut out the requested field from a blob with more than one field */
return split_list(list *l, n);
n -= l->field_count;
}
return NULL;
}
struct field *
split_list(list *head, int n)
{
/* create a list node with the required field only (field_count == 1) */
list *cut_node = create_list(get_field(l->first_field, n), 1);
/* create list node with tail after required field */
list *cut_ tail = create_list(get_field(l->first_field, n + 1), l->field_count - n - 1);
/* decrease fields count in head, now it have only fields before required field */
head->field_count = n;
/* save pointer to tail */
list *tail = head->next;
/* linking cut_node and cut_tail to list */
head->next = cut_node;
cut_node->next = cut_tail;
cut_tail->next = tail;
return cut_node->first_field;
}
Since the algorithm uses lazy node allocation, it is easy to show that the list will have fewer nodes than n + 2, therefore computational complexity of the algorithm is O(n*n).
Linear data structures like vector, list or queue have linear computational complexity of search or insertion. We can try to use a tree-like data structure instead, such as 'rope'.
Rope is a data structure which allows to access sequential data (like elements of a tree or a list) using a tree representation. Computational complexity of a search in a rope is O(log(n)) (it's faster than list) and computational complexity of an insert into a rope is O(log(n)) (it's faster than vector). A re-balancing procedure helps to keep rope depth under O(log(n)).
Operations Rope List Vector
index O(log(n)) O(n) O(1)
split O(log(n)) O(1) O(n)
concatenate O(log(n)) O(1) O(n)
insert O(log(n)) O(1) O(n)
delete O(log(n)) O(1) O(n)
Rope has two main operations: split and concatenate. Other operations like search, insertion, deletion are implemented by means of split and concatenate.
A tuple can be represented as a string, where tuple fields are characters of the string. Leaf nodes of the rope will therefore contain pointers either to fields of the old tuple, or to the new fields.
struct rope {
/* number of fields in the rope */
int size;
/* the depth of the rope */
int depth;
/* pointer to left child (NULL for leaf) */
struct rope *left;
/* pointer to right child (NULL for leaf) */
struct rope *right;
/* pointer on fields data (NULL for internal nodes) */
void *fields;
};
In the beginning, the rope has only one leaf node which contains all tuple fields:
t = {f0, f1, ... , fn-1}
^----------------+
r |
└── {size = n, data = &t[0]}
When we'd like to apply an operation to a field, we should first extract it to a separate node which contains only this field. The new field value should be stored in a temporary buffer. When all operations are applied, we should do rope traversal and calculate the resulting tuple length. In the next rope traversal values of new fields are stored in a new tuple.
When we'd like to modify i-th field of a tuple, we should first extract it to a rope node which contains only this field. The extract algorithm is:
- Find leaf node which has i-th field. If the node has only i-th field
(node->size == 1)
then return the node; - Split the rope in 3 parts: the first one
r1
which has fields from 0 to i-1, the second oner2
which has only i-th field and the third oner3
which has fields from i+1 to n-1; - Concatenate
r1
withr2
, and then concatenate the result withr3
; - Return the node which has i-th field.
Pseudocode:
rope
extract_field_from_rope(rope root, int i)
{
rope *r = find_rope_node(root, i);
if (r->size == 1)
return r;
/* The node constans mode than one field. Split it’s on tree parts and concat it. */
rope cut_node = rope_split(root, i - 1);
rope cut_tail = rope_split(cut_node, 1);
rope_concat(root, cut_node);
rope_concat(root, cut_tail);
return cut_node;
}
For example we have a rope:
t = {0, 1, 'a', 'b'}
^
+----------------+
r |
└── {size = 4, data = &t[0]}
-
Split rope
r
in three:t = {0, 1, 'a', 'b'} ^ ^ ^ | | +------------------+ | +--------------------+ | +----------------+ | | r1 | | | | | | | └── {size = 1, data = &t[0]} | | r2 +------+ | | | | └── {size = 1, data = &t[1]} | r3 +--------+ | | └── {size = 2, data = &t[2]}
-
Concatenate the ropes:
r └── {size = 4} ├── {size = 1, data = &t[0]} └── {size = 3} ├── {size = 1, data = &t[1]} └── {size = 2, data = &t[2]}
Not all operations work under existed fields. The INSERT operation creates new field. We should create new leaf node which contain one inserted field:
op = {opcode = INSERT, field_no = i, value = new_field}
ri = {size = 1, data = &op.value}
Then we should insert it to the rope by following algorithm:
- Split the rope on two part. The first one
r1
contains all fields from 0 to j - 1 and another oner2
all fields from j to n - 1; - Concatenate
r1
withri
then concatenate result withr2
.
Pseudocode:
rope
insert_field_to_rope(rope root, int i, void *field)
{
ri = rope_new(1, field);
r1 = root;
r2 = rope_split(root, i);
rope_concat(r1, ri);
rope_concat(r1, r2);
return ri;
}
For example, we have rope r
and inserting field f
which we should insert before the first element:
r
└── {size = 4}
├── {size = 1, data = &t[0]}
└── {size = 3}
├── {size = 1, data = &t[1]}
└── {size = 2, data = &t[2]}
ri
└── {size = 1, data = &f}
-
Split
r
onr1
andr2
:r1 └── {size = 1, data = &t[0]} r2 └── {size = 3} ├── {size = 1, data = &t[1]} └── {size = 2, data = &t[2]}
-
Concatenate
r1
andri
r1 ├── {size = 1, data = &t[0]} └── {size = 1, data = &f}
-
Concatenate result with
r2
:r └── {size = 5} ├── {size = 2} │ ├── {size = 1, data = &t[0]} │ └── {size = 1, data = f} └── {size = 3} ├── {size = 1, data = &t[1]} └── {size = 2, data = &t[2]}
Also the UPDATE command can delete existed fields, that's why we should support delete operations from ropes. The algorithm is pretty much like extract algorithm:
- Split the rope in 3 part: the first one
r1
which has fields from 0 to i-1, the second oner2
which has only i-th field and the third oner3
which has fields from i+1 to n-1; - Concatenate
r1
withr3
; - Delete
r2
.
Pseudocode:
void
delete_field_from_rope(rope root, int i)
{
rope cut_node = rope_split(root, i - 1);
rope cut_tail = rope_split(cut_node, 1);
rope_concat(root, cut_tail);
rope_delete(cut_node);
}
For example, we have rope r
and we'd like to delete field 2:
r
└── {size = 5}
├── {size = 2}
│ ├── {size = 1, data = &t[0]}
│ └── {size = 1, data = f}
└── {size = 3}
├── {size = 1, data = &t[1]}
└── {size = 2, data = &t[2]}
-
Split
r
onr1
,r2
andr3
:r1 └── {size = 2} ├── {size = 1, data = &t[0]} └── {size = 1, data = f} r2 └── {size = 1, data = &t[1]} r3 └── {size = 2} └── {size = 2, data = &t[2]}
-
Concatenate
r1
andr3
:r └── {size = 4} ├── {size = 2} │ ├── {size = 1, data = &t[0]} │ └── {size = 1, data = f} └── {size = 2} └── {size = 2, data = &t[2]}
-
Delete
r2
.
If we'd like to perform update operations in place, we should allocate additional buffers for intermediate results, since we can't modify the original tuple. However, if we allocate new tuple space in advance and then we do operations right in the new tuple, we can reduce additional memory allocation to few cases where it is really needed. For example, if we set field to a huge value and then, with the next operation, we reduce it to a small value, the intermediate result (after SET) can't overflow the new tuple buffer.
For memory optimization we make some changes in the algorithms. We don’t do operations in place in the rope; instead, we save the operation to the rope node. Output of the algorithm is a sorted sequence of operations which produces the same result as the input sequence.
struct rope {
/* number of fields in the rope */
int size;
/* the depth of the rope */
int depth;
/* pointer to parent (NULL for root) */
struct rope *parent;
/* pointer to left child (NULL for leaf) */
struct rope *left;
/* pointer to right child (NULL for leaf) */
struct rope *right;
/* pointer on fields data (NULL for interior nodes) */
void *fields;
/* fields length */
int fields_length;
/* The list of operations which should be applied to this field */
list *ops;
};
On the first stage we apply only INSERT and DELETE operations, another operations we just push back to node operations list and update fields_length (from leaf to parent). On the next stage, when all operations were grouped by field number and we result tuple length was calculated, we can apply operations to fields.
Pseudocode:
void
do_update(tuple, op_list)
{
/* put all operations to the rope */
rope root = rope_new(&tuple.value[0], tuple.cardinality, tuple.length);
foreach (op = op_list)
switch (op.opcode) {
case INSERT:
rope ri = insert_field_to_rope(root, op.field_no, op.value);
break;
case DELETE:
delete_field_from_rope(root, op.field_no);
break;
default:
rope r = extract_field_from_rope(root, op.field_no);
update_fields_length(r, op);
list_push_back(r.ops, op);
break;
}
tuple = tuple_new(root.field_length);
/* Apply ops to each field and write result to new tuple */
foreach_leaf (l = root)
apply_ops_to_field(tuple, l->field, l->ops);
}
- C coding guidelines ↗
- Lua coding guidelines ↗
- Python coding guidelines ↗
- Maintainer's guide
- Debugging
Architecture
- Server architecture
- R tree index quick start and usage
- LuaJIT
- Vinyl
- Vinyl Architecture
- Vinyl Disk Layout
- Vinyl math
- Vinyl Cookbook
- Bullet1
- SQL
- Appserver modules
- Testing
- Performance
- Privileges and Access control
How To ...?
- ... configure build system
- ... add new fuzzers
- ... build RPM or Deb package using packpack
- ... calculate memory size
- ... debug core dump of stripped tarantool
- ... debug core from different OS
- ... debug fuzzer
- ... generate new bootstrap snapshot
- ... use Address Sanitizer
- ... collect a coredump
- ... generate luacov report for builtin module
- ... verify modified lua files via luacheck
- ... verify Lua files in third_party?
- ... rerun failed jobs
- ... update a third party repository
- Fix wrong decimal indexing after upgrade to 2.10.1
- Caveats when upgrading a cluster on Tarantool 1.6
- Fix illegal field type in a space format when upgrading to 2.10.4
Useful links