Native Shader Compilation With LLVM PDF
Native Shader Compilation With LLVM PDF
with LLVM
Mark Leone
SIMD interpreter
For each instruction in shader:
Decode and dispatch instruction.
For each point in batch:
If runflag is on:
Load operands.
Compute.
Store result.
Why vectorize?
Consider batch execution of a compiled shader:
Why vectorize?
Consider batch execution of a vectorized shader:
load4
load4
mult4
move
add
add
r1,
r2,
r3,
r0,
r0,
r0,
[v1]
[v2]
r1, r2
r3.x
r3.y
r3.z
Vector
utilization
Shader vectorization
To vectorize, first scalarize:
float dot(vector v1, vector v2)
{
vector v0 = v1 * v2;
return v0.x + v0.y + v0.z;
}
load
load
mult
load
load
mult
load
load
mult
add
add
r1,
r2,
r0,
r1,
r2,
r3,
r1,
r2,
r3,
r0,
r0,
[v1.x]
[v2.x]
r1, r2
[v1.y]
[v2.y]
r1, r2
[v1.z]
[v2.z]
r1, r2
r0, r3
r0, r3
load4
load4
mult4
load4
load4
mult4
load4
load4
mult4
add4
add4
r1,
r2,
r0,
r1,
r2,
r3,
r1,
r2,
r3,
r0,
r0,
[v1.x]
[v2.x]
r1, r2
[v1.y]
[v2.y]
r1, r2
[v1.z]
[v2.z]
r1, r2
r0, r3
r0, r3
Vector
utilization
Masking / blending
Use a mask to avoid clobbering components of registers used
by the other branch.
No masking in SSE.
blend(a, b, mask)
{
return (a & mask) | ~(b & mask)
}
Partitioning
normalize
faceforward
Nf = faceforward(
normalize(N), I);
Ci = Os * Cs *
( Ka*ambient() +
Kd*diffuse(Nf) );
ambient
Ka
diffuse
scale
scale
mult
add
Cs
Os
scale
Ci
Kd
Issues: summary
CPU code generation (perhaps JIT)
Vectorization
GPU code generation
Multi-pass partitioning
Introduction to LLVM
Mid-level intermediate representation (IR)
High-level types: structs, arrays, vectors, functions.
Control-flow graph: basic blocks with branches
Many modular analysis and optimization passes.
Code generation for x86, x64, ARM, ...
Just-in-time (JIT) compiler too.
};
virtual llvm::Value*
Codegen(llvm::IRBuilder* builder);
switch
case
case
case
...
}
(m_operation) {
'+': return builder->CreateFAdd(L, R);
'-': return builder->CreateFSub(L, R);
'*': return builder->CreateFMul(L, R);
Advantages of LLVM
Well designed intermediate representation (IR).
Wide range of optimizations (configurable).
JIT code generation.
Interoperability.
Interoperability
Shaders can call out to renderer via C ABI.
We can inline library code into compiled shaders.
Compile C++ to LLVM IR with Clang.
This greatly simplifies code generation.
Weaknesses of LLVM
No automatic vectorization.
Poor support for vector-oriented code generation.
No predication.
Few vector instructions, must resort to SSE/AVX intrinsics.
LLVM resources
www.llvm.org/docs
Language Reference Manual
Getting Started Guide
LLVM Tutorial (section 3)
Relevant open source projects
ispc.github.com
github.com/MarkLeone/PostHaste
Questions?
Mark Leone
mleone@wetafx.co.nz