Description
The linux perf
profiler is a very powerful tool but unfortunately is not able to see Python calls (only the C stack) and therefore it cannot be used (neither its very complete ecosystem) to profile Python applications and extensions.
Turns out that node and the JVM have developed a way to leverage the perf
profiler for the Java and javascript frames. They use their JIT compilers to generate a unique area in memory where they place assembly code that in turn calls the frame evaluator function. This JIT compiled areas are unique per function/code object. They use the perf maps (perf allows to place a map in /temp/perf-PID.map
with information mapping the JIT-ed areas to a string that identifies them and this allows perf to map java/javascript names to the JIT-ed areas, basically showing the non-native function names on the stack.
We can do a simple version of this idea in Python by using a very simple JIT compiler that compiles a assembly template that is the used to jump to PyEval_EvalFrameDefault
and we can place the code names and filenames in the special perf
file. This allows perf to see Python calls as well:
And this works with all the tools in the perf ecosystem, like flamegraphs:
See also:
https://www.brendangregg.com/Slides/KernelRecipes_Perf_Events.pdf