-
Notifications
You must be signed in to change notification settings - Fork 387
Threaded interpretation and instruction merging
Student: Anastasia Neklepaeva
Mentors: Igor Munkin, Nikita Pettik, Timur Safin
Description: Tarantool uses SQL VM with conventional interpretation technique: loop over instruction, inside which dispatching switch-case executes logic of current opcode. I’ve worked on threaded-code technique. Threaded code is a programming technique where the code has a form that essentially consists entirely of calls to subroutines. It helps to avoid the second jump in code processing by jumping directly to the code handling the next instruction.
I started working on this project with learning about internals of Sqlite VDBE. I have read several articles about virtual machines and their differences and analyzed a Tarantool VDBE source code.
First steps were defining DISPATCH()
macro and replacing case OP_value
with Exec_OP_value
label. This action required
some changes for vdbe.c file parser to generate opcodes numeric values. Some compilers provide computed goto feature
(e.g. gcc ).
But there are some compilers not providing computed goto feature. That's why threaded code should be supported
by Tarantool as well as swith-technique.
For this purpose switch-based interpreter was rewritten with START_EVAL()
, EXECUTE()
macros. After that
tarantool supports both techniques: switch-based interpreter and threaded code.
Also, some macros for debug were defined. Script for generation of array of goto addresses was created.
Performance have been measured.
- Define macro
DISPATCH()
that hides internal implementation of dispatching: if flag-DWITHOUT_COMPUTED_GOTO
was specified thenDISPATCH()
implements jump to switch otherwise jump directly to the code handling the next instruction - Define macro
START_EVAL()
to hide switch switch-based interpreter and jump to the label of first instruction - Define macro
EXECUTE()
to hide case OP_value for switch-based interpreter and labelExec_OP_value
for threaded code - Replace break and case keywords with mentioned macro
- Define some macros for debug
- Generate an array of goto addresses in the same order as corresponding
OP_values
are defined - Benchmark results using recursive query that calculates sum from 1 to 1000000
Query:
WITH RECURSIVE temp (n, sum, dif) AS
(SELECT 0, 0, 0
UNION ALL
SELECT n+1, (n+1)+sum, dif+(n-1) FROM temp
WHERE n < 1000000)
SELECT * FROM temp;
Results:
CPU usage
Time of query execution
Compiler: gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Tarantool Version: 2.9.0, Linux-x86_64-RelWithDebInfo
Perf command to measure performance: sudo perf record -g -- ./src/tarantool ./src/test.lua
./src/test.lua contains recursive query
The computed goto version is faster because of two reasons:
-
the switch does a bit more per iteration because of bounds checking;
-
the effects of hardware branch prediction.
-
Add instruction merging as another optimization
-
Fix reviewed code
- C coding guidelines ↗
- Lua coding guidelines ↗
- Python coding guidelines ↗
- Maintainer's guide
- Debugging
Architecture
- Server architecture
- R tree index quick start and usage
- LuaJIT
- Vinyl
- Vinyl Architecture
- Vinyl Disk Layout
- Vinyl math
- Vinyl Cookbook
- Bullet1
- SQL
- Appserver modules
- Testing
- Performance
- Privileges and Access control
How To ...?
- ... configure build system
- ... add new fuzzers
- ... build RPM or Deb package using packpack
- ... calculate memory size
- ... debug core dump of stripped tarantool
- ... debug core from different OS
- ... debug fuzzer
- ... generate new bootstrap snapshot
- ... use Address Sanitizer
- ... collect a coredump
- ... generate luacov report for builtin module
- ... verify modified lua files via luacheck
- ... verify Lua files in third_party?
- ... rerun failed jobs
- ... update a third party repository
- Fix wrong decimal indexing after upgrade to 2.10.1
- Caveats when upgrading a cluster on Tarantool 1.6
- Fix illegal field type in a space format when upgrading to 2.10.4
Useful links