Skip to content

Commit a667800

Browse files
authored
gh-136459: Add perf trampoline support for macOS (#136461)
1 parent b6d3242 commit a667800

File tree

10 files changed

+351
-27
lines changed

10 files changed

+351
-27
lines changed

Doc/c-api/perfmaps.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,12 @@
55
Support for Perf Maps
66
----------------------
77

8-
On supported platforms (as of this writing, only Linux), the runtime can take
8+
On supported platforms (Linux and macOS), the runtime can take
99
advantage of *perf map files* to make Python functions visible to an external
10-
profiling tool (such as `perf <https://perf.wiki.kernel.org/index.php/Main_Page>`_).
11-
A running process may create a file in the ``/tmp`` directory, which contains entries
12-
that can map a section of executable code to a name. This interface is described in the
10+
profiling tool (such as `perf <https://perf.wiki.kernel.org/index.php/Main_Page>`_ or
11+
`samply <https://github.com/mstange/samply/>`_). A running process may create a
12+
file in the ``/tmp`` directory, which contains entries that can map a section
13+
of executable code to a name. This interface is described in the
1314
`documentation of the Linux Perf tool <https://git.kernel.org/pub/scm/linux/
1415
kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jit-interface.txt>`_.
1516

Doc/howto/perf_profiling.rst

Lines changed: 41 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,34 +2,35 @@
22

33
.. _perf_profiling:
44

5-
==============================================
6-
Python support for the Linux ``perf`` profiler
7-
==============================================
5+
========================================================
6+
Python support for the ``perf map`` compatible profilers
7+
========================================================
88

99
:author: Pablo Galindo
1010

11-
`The Linux perf profiler <https://perf.wiki.kernel.org>`_
12-
is a very powerful tool that allows you to profile and obtain
13-
information about the performance of your application.
14-
``perf`` also has a very vibrant ecosystem of tools
15-
that aid with the analysis of the data that it produces.
11+
`The Linux perf profiler <https://perf.wiki.kernel.org>`_ and
12+
`samply <https://github.com/mstange/samply>`_ are powerful tools that allow you to
13+
profile and obtain information about the performance of your application.
14+
Both tools have vibrant ecosystems that aid with the analysis of the data they produce.
1615

17-
The main problem with using the ``perf`` profiler with Python applications is that
18-
``perf`` only gets information about native symbols, that is, the names of
16+
The main problem with using these profilers with Python applications is that
17+
they only get information about native symbols, that is, the names of
1918
functions and procedures written in C. This means that the names and file names
20-
of Python functions in your code will not appear in the output of ``perf``.
19+
of Python functions in your code will not appear in the profiler output.
2120

2221
Since Python 3.12, the interpreter can run in a special mode that allows Python
23-
functions to appear in the output of the ``perf`` profiler. When this mode is
22+
functions to appear in the output of compatible profilers. When this mode is
2423
enabled, the interpreter will interpose a small piece of code compiled on the
25-
fly before the execution of every Python function and it will teach ``perf`` the
24+
fly before the execution of every Python function and it will teach the profiler the
2625
relationship between this piece of code and the associated Python function using
2726
:doc:`perf map files <../c-api/perfmaps>`.
2827

2928
.. note::
3029

31-
Support for the ``perf`` profiler is currently only available for Linux on
32-
select architectures. Check the output of the ``configure`` build step or
30+
Support for profiling is available on Linux and macOS on select architectures.
31+
Perf is available on Linux, while samply can be used on both Linux and macOS.
32+
samply support on macOS is available starting from Python 3.15.
33+
Check the output of the ``configure`` build step or
3334
check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE``
3435
to see if your system is supported.
3536

@@ -148,6 +149,31 @@ Instead, if we run the same experiment with ``perf`` support enabled we get:
148149
149150
150151
152+
Using the samply profiler
153+
-------------------------
154+
155+
samply is a modern profiler that can be used as an alternative to perf.
156+
It uses the same perf map files that Python generates, making it compatible
157+
with Python's profiling support. samply is particularly useful on macOS
158+
where perf is not available.
159+
160+
To use samply with Python, first install it following the instructions at
161+
https://github.com/mstange/samply, then run::
162+
163+
$ samply record PYTHONPERFSUPPORT=1 python my_script.py
164+
165+
This will open a web interface where you can analyze the profiling data
166+
interactively. The advantage of samply is that it provides a modern
167+
web-based interface for analyzing profiling data and works on both Linux
168+
and macOS.
169+
170+
On macOS, samply support requires Python 3.15 or later. Also on macOS, samply
171+
can't profile signed Python executables due to restrictions by macOS. You can
172+
profile with Python binaries that you've compiled yourself, or which are
173+
unsigned or locally-signed (such as anything installed by Homebrew). In
174+
order to attach to running processes on macOS, run ``samply setup`` once (and
175+
every time samply is updated) to self-sign the samply binary.
176+
151177
How to enable ``perf`` profiling support
152178
----------------------------------------
153179

Lib/test/test_perfmaps.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,20 @@
11
import os
2-
import sys
2+
import sysconfig
33
import unittest
44

55
try:
66
from _testinternalcapi import perf_map_state_teardown, write_perf_map_entry
77
except ImportError:
88
raise unittest.SkipTest("requires _testinternalcapi")
99

10+
def supports_trampoline_profiling():
11+
perf_trampoline = sysconfig.get_config_var("PY_HAVE_PERF_TRAMPOLINE")
12+
if not perf_trampoline:
13+
return False
14+
return int(perf_trampoline) == 1
1015

11-
if sys.platform != 'linux':
12-
raise unittest.SkipTest('Linux only')
13-
16+
if not supports_trampoline_profiling():
17+
raise unittest.SkipTest("perf trampoline profiling not supported")
1418

1519
class TestPerfMapWriting(unittest.TestCase):
1620
def test_write_perf_map_entry(self):

Lib/test/test_samply_profiler.py

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
import unittest
2+
import subprocess
3+
import sys
4+
import sysconfig
5+
import os
6+
import pathlib
7+
from test import support
8+
from test.support.script_helper import (
9+
make_script,
10+
)
11+
from test.support.os_helper import temp_dir
12+
13+
14+
if not support.has_subprocess_support:
15+
raise unittest.SkipTest("test module requires subprocess")
16+
17+
if support.check_sanitizer(address=True, memory=True, ub=True, function=True):
18+
# gh-109580: Skip the test because it does crash randomly if Python is
19+
# built with ASAN.
20+
raise unittest.SkipTest("test crash randomly on ASAN/MSAN/UBSAN build")
21+
22+
23+
def supports_trampoline_profiling():
24+
perf_trampoline = sysconfig.get_config_var("PY_HAVE_PERF_TRAMPOLINE")
25+
if not perf_trampoline:
26+
return False
27+
return int(perf_trampoline) == 1
28+
29+
30+
if not supports_trampoline_profiling():
31+
raise unittest.SkipTest("perf trampoline profiling not supported")
32+
33+
34+
def samply_command_works():
35+
try:
36+
cmd = ["samply", "--help"]
37+
except (subprocess.SubprocessError, OSError):
38+
return False
39+
40+
# Check that we can run a simple samply run
41+
with temp_dir() as script_dir:
42+
try:
43+
output_file = script_dir + "/profile.json.gz"
44+
cmd = (
45+
"samply",
46+
"record",
47+
"--save-only",
48+
"--output",
49+
output_file,
50+
sys.executable,
51+
"-c",
52+
'print("hello")',
53+
)
54+
env = {**os.environ, "PYTHON_JIT": "0"}
55+
stdout = subprocess.check_output(
56+
cmd, cwd=script_dir, text=True, stderr=subprocess.STDOUT, env=env
57+
)
58+
except (subprocess.SubprocessError, OSError):
59+
return False
60+
61+
if "hello" not in stdout:
62+
return False
63+
64+
return True
65+
66+
67+
def run_samply(cwd, *args, **env_vars):
68+
env = os.environ.copy()
69+
if env_vars:
70+
env.update(env_vars)
71+
env["PYTHON_JIT"] = "0"
72+
output_file = cwd + "/profile.json.gz"
73+
base_cmd = (
74+
"samply",
75+
"record",
76+
"--save-only",
77+
"-o", output_file,
78+
)
79+
proc = subprocess.run(
80+
base_cmd + args,
81+
stdout=subprocess.PIPE,
82+
stderr=subprocess.PIPE,
83+
env=env,
84+
)
85+
if proc.returncode:
86+
print(proc.stderr, file=sys.stderr)
87+
raise ValueError(f"Samply failed with return code {proc.returncode}")
88+
89+
import gzip
90+
with gzip.open(output_file, mode="rt", encoding="utf-8") as f:
91+
return f.read()
92+
93+
94+
@unittest.skipUnless(samply_command_works(), "samply command doesn't work")
95+
class TestSamplyProfilerMixin:
96+
def run_samply(self, script_dir, perf_mode, script):
97+
raise NotImplementedError()
98+
99+
def test_python_calls_appear_in_the_stack_if_perf_activated(self):
100+
with temp_dir() as script_dir:
101+
code = """if 1:
102+
def foo(n):
103+
x = 0
104+
for i in range(n):
105+
x += i
106+
107+
def bar(n):
108+
foo(n)
109+
110+
def baz(n):
111+
bar(n)
112+
113+
baz(10000000)
114+
"""
115+
script = make_script(script_dir, "perftest", code)
116+
output = self.run_samply(script_dir, script)
117+
118+
self.assertIn(f"py::foo:{script}", output)
119+
self.assertIn(f"py::bar:{script}", output)
120+
self.assertIn(f"py::baz:{script}", output)
121+
122+
def test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated(self):
123+
with temp_dir() as script_dir:
124+
code = """if 1:
125+
def foo(n):
126+
x = 0
127+
for i in range(n):
128+
x += i
129+
130+
def bar(n):
131+
foo(n)
132+
133+
def baz(n):
134+
bar(n)
135+
136+
baz(10000000)
137+
"""
138+
script = make_script(script_dir, "perftest", code)
139+
output = self.run_samply(
140+
script_dir, script, activate_trampoline=False
141+
)
142+
143+
self.assertNotIn(f"py::foo:{script}", output)
144+
self.assertNotIn(f"py::bar:{script}", output)
145+
self.assertNotIn(f"py::baz:{script}", output)
146+
147+
148+
@unittest.skipUnless(samply_command_works(), "samply command doesn't work")
149+
class TestSamplyProfiler(unittest.TestCase, TestSamplyProfilerMixin):
150+
def run_samply(self, script_dir, script, activate_trampoline=True):
151+
if activate_trampoline:
152+
return run_samply(script_dir, sys.executable, "-Xperf", script)
153+
return run_samply(script_dir, sys.executable, script)
154+
155+
def setUp(self):
156+
super().setUp()
157+
self.perf_files = set(pathlib.Path("/tmp/").glob("perf-*.map"))
158+
159+
def tearDown(self) -> None:
160+
super().tearDown()
161+
files_to_delete = (
162+
set(pathlib.Path("/tmp/").glob("perf-*.map")) - self.perf_files
163+
)
164+
for file in files_to_delete:
165+
file.unlink()
166+
167+
def test_pre_fork_compile(self):
168+
code = """if 1:
169+
import sys
170+
import os
171+
import sysconfig
172+
from _testinternalcapi import (
173+
compile_perf_trampoline_entry,
174+
perf_trampoline_set_persist_after_fork,
175+
)
176+
177+
def foo_fork():
178+
pass
179+
180+
def bar_fork():
181+
foo_fork()
182+
183+
def foo():
184+
import time; time.sleep(1)
185+
186+
def bar():
187+
foo()
188+
189+
def compile_trampolines_for_all_functions():
190+
perf_trampoline_set_persist_after_fork(1)
191+
for _, obj in globals().items():
192+
if callable(obj) and hasattr(obj, '__code__'):
193+
compile_perf_trampoline_entry(obj.__code__)
194+
195+
if __name__ == "__main__":
196+
compile_trampolines_for_all_functions()
197+
pid = os.fork()
198+
if pid == 0:
199+
print(os.getpid())
200+
bar_fork()
201+
else:
202+
bar()
203+
"""
204+
205+
with temp_dir() as script_dir:
206+
script = make_script(script_dir, "perftest", code)
207+
env = {**os.environ, "PYTHON_JIT": "0"}
208+
with subprocess.Popen(
209+
[sys.executable, "-Xperf", script],
210+
universal_newlines=True,
211+
stderr=subprocess.PIPE,
212+
stdout=subprocess.PIPE,
213+
env=env,
214+
) as process:
215+
stdout, stderr = process.communicate()
216+
217+
self.assertEqual(process.returncode, 0)
218+
self.assertNotIn("Error:", stderr)
219+
child_pid = int(stdout.strip())
220+
perf_file = pathlib.Path(f"/tmp/perf-{process.pid}.map")
221+
perf_child_file = pathlib.Path(f"/tmp/perf-{child_pid}.map")
222+
self.assertTrue(perf_file.exists())
223+
self.assertTrue(perf_child_file.exists())
224+
225+
perf_file_contents = perf_file.read_text()
226+
self.assertIn(f"py::foo:{script}", perf_file_contents)
227+
self.assertIn(f"py::bar:{script}", perf_file_contents)
228+
self.assertIn(f"py::foo_fork:{script}", perf_file_contents)
229+
self.assertIn(f"py::bar_fork:{script}", perf_file_contents)
230+
231+
child_perf_file_contents = perf_child_file.read_text()
232+
self.assertIn(f"py::foo_fork:{script}", child_perf_file_contents)
233+
self.assertIn(f"py::bar_fork:{script}", child_perf_file_contents)
234+
235+
# Pre-compiled perf-map entries of a forked process must be
236+
# identical in both the parent and child perf-map files.
237+
perf_file_lines = perf_file_contents.split("\n")
238+
for line in perf_file_lines:
239+
if f"py::foo_fork:{script}" in line or f"py::bar_fork:{script}" in line:
240+
self.assertIn(line, child_perf_file_contents)
241+
242+
243+
if __name__ == "__main__":
244+
unittest.main()

Misc/ACKS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Ray Allen
4343
Billy G. Allie
4444
Jamiel Almeida
4545
Kevin Altis
46+
Nazım Can Altınova
4647
Samy Lahfa
4748
Skyler Leigh Amador
4849
Joe Amenta
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Add support for perf trampoline on macOS, to allow profilers wit JIT map
2+
support to read Python calls. While profiling, ``PYTHONPERFSUPPORT=1`` can
3+
be appended to enable the trampoline.

0 commit comments

Comments
 (0)