-
Notifications
You must be signed in to change notification settings - Fork 387
Numerical Computing Performance Guide
WARNING: Tarantool's developers disclaimer: This wiki page is a slightly modified copy of the missing LuaJIT wiki page taken from web archive. It may contain outdated content, use it with caution.
This is a mailing list post by Mike Pall.
The following things are needed to get the best speed out of numerical computations with LuaJIT (in order of importance):
-
Reduce the number of unbiased/unpredictable branches.
- Heavily biased branches (>95% in one direction) are fine.
- Prefer branch-free algorithms.
- Use
math.min()
andmath.max()
. - Use
bit.*
, e.g., for conditional index computations.
- Use
-
Using FFI data structures:
- Use
int32_t
, avoiduint32_t
data types. - Use
double
, avoidfloat
data types. - Metamethods are fine, but don't overuse them.
- Use
-
Calling C functions via the FFI:
- Avoid calling trivial functions, better rewrite them in Lua.
- Avoid callbacks -- use pull-style APIs (read/write) and iterators instead.
-
Use plain
for i = start, stop, step do ... end
loops.- Prefer plain array indexing, e.g.,
a[i+2]
. - Avoid pointer arithmetic.
- Prefer plain array indexing, e.g.,
-
Find the right balance for unrolling.
- Avoid inner loops with a low iteration count (< 10).
- Only unroll loops if the loop body doesn't have too many instructions.
- Consider using templates instead of hand-unrolling (see GSL Shell).
- You may have to experiment a bit.
-
Define and call only
local
(!) functions within a module. -
Cache often-used functions from other modules in upvalues.
- E.g.
local sin = math.sin ... local function foo() return 2 * sin(x) end
-
Don't do this for FFI C functions, cache the namespace instead, e.g.,
local lib = ffi.load("lib")
.
- E.g.
-
Avoid inventing your own dispatch mechanisms.
- Prefer to use built-in mechanisms, e.g., metamethods.
-
Do not try to second-guess the JIT compiler.
- It's perfectly ok to write
z = x[a+b] + y[a+b]
. - Do not try CSE (Common Subexpression Elimination) by hand, e.g.,
local c = a + b
. - It may become detrimental if the lifetime of the temporary is longer than needed. If the compiler cannot deduce that it's dead, then the useless temporary will block a register or stack slot and/or it needs to be stored on the Lua stack.
- Duplicate expressions involving basic arithmetic operators that are relatively close to each other (and likely in the same trace) should not be manually CSEd. Loads only need to be manually hoisted, if alias analysis is likely to fail.
- It's perfectly ok to write
a[i][j] = a[i][j] * a[i][j+1]
. - Do not try to cache partial FFI struct/array references (e.g.,
a[i]
) unless they are long-lived (e.g., in a big loop). - There are quite a few "easy" optimizations where the compiler is in a better position to perform them. Better focus on the difficult things, like algorithmic improvements.
- It's perfectly ok to write
-
Be careful with aliasing, especially when using multiple arrays.
- LuaJIT uses strict type-based disambiguation, but there are limits to this due to C99 conformance.
- E.g. in
x[i] = a[i] + c[i]; y[i] = a[i] + d[i]
the load ofa[i]
needs to be done twice becausex
could aliasa
. It does make sense to usedo local t = a[i] ... end
here.
-
Reduce the number of live temporary variables.
- Best to initialize on the definition, e.g.,
local y = f(x)
- Yes, this means you should interleave this with other code.
- Do not hoist variable definitions to the start of a function -- Lua is neither JavaScript nor K&R C.
- Use
do local y = f(x) ... end
to bound variable lifetimes.
- Best to initialize on the definition, e.g.,
-
Do not intersperse expensive or uncompiled operations.
-
print()
is not compiled, useio.write()
. - E.g. avoid
assert(type(x) == "number", "x is a "..mytype(x)")
- The problem is not the
assert()
or the condition (basically free). The problem is the string concatenation, which has to be executed every time, even if the assertion never fails!
- The problem is not the
- Watch the output of
-jv
and-jdump
.
-
You need to take all of these factors into account before deciding on a certain algorithm. An advanced algorithm that's fast in theory may be slower than a simpler algorithm if the simpler algorithm has much fewer unbiased branches.
- C coding guidelines ↗
- Lua coding guidelines ↗
- Python coding guidelines ↗
- Maintainer's guide
- Debugging
Architecture
- Server architecture
- R tree index quick start and usage
- LuaJIT
- Vinyl
- Vinyl Architecture
- Vinyl Disk Layout
- Vinyl math
- Vinyl Cookbook
- Bullet1
- SQL
- Appserver modules
- Testing
- Performance
- Privileges and Access control
How To ...?
- ... configure build system
- ... add new fuzzers
- ... build RPM or Deb package using packpack
- ... calculate memory size
- ... debug core dump of stripped tarantool
- ... debug core from different OS
- ... debug fuzzer
- ... generate new bootstrap snapshot
- ... use Address Sanitizer
- ... collect a coredump
- ... generate luacov report for builtin module
- ... verify modified lua files via luacheck
- ... verify Lua files in third_party?
- ... rerun failed jobs
- ... update a third party repository
- Fix wrong decimal indexing after upgrade to 2.10.1
- Caveats when upgrading a cluster on Tarantool 1.6
- Fix illegal field type in a space format when upgrading to 2.10.4
Useful links