-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Open
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagePerformance or resource usagetopic-JITtype-featureA feature request or enhancementA feature request or enhancement
Description
Feature or enhancement
Proposal:
This is really a follow up of #115802 and more focused on the AArch64 improvements of the code generated for the JIT.
This has been discussed with @brandtbucher during PyCon 2024.
There are a series of incremental improvements that we could implement when generating AArch64 code:
- Remove duplication of trampoline section (movk) at the end of every micro op assembly code.
// 0000000000000140: R_AARCH64_MOVW_UABS_G0_NC PyObject_Free
// 144: f2a00008 movk x8, #0x0, lsl #16
// 0000000000000144: R_AARCH64_MOVW_UABS_G1_NC PyObject_Free
// 148: f2c00008 movk x8, #0x0, lsl #32
// 0000000000000148: R_AARCH64_MOVW_UABS_G2_NC PyObject_Free
// 14c: f2e00008 movk x8, #0x0, lsl #48
// 000000000000014c: R_AARCH64_MOVW_UABS_G3 PyObject_Free
// 150: d61f0100 br x8
// 154: 00 00 00 00
// 158: d2800008 mov x8, #0x0
// 0000000000000158: R_AARCH64_MOVW_UABS_G0_NC PyObject_Free
// 15c: f2a00008 movk x8, #0x0, lsl #16
// 000000000000015c: R_AARCH64_MOVW_UABS_G1_NC PyObject_Free
// 160: f2c00008 movk x8, #0x0, lsl #32
// 0000000000000160: R_AARCH64_MOVW_UABS_G2_NC PyObject_Free
// 164: f2e00008 movk x8, #0x0, lsl #48
// 0000000000000164: R_AARCH64_MOVW_UABS_G3 PyObject_Free
// 168: d61f0100 br x8
- Implement trampoline with LDR of a PC relative literal (instead of movk). It saves 8bytes in code size.
- Move the trampolines from the "code" section of a micro-op to the "data" section, so it's out-of-line.
- Emit all of the trampolines at the end of every trace, so that each opcode doesn't need its own copy of the trampolines it uses. Also write a function to generate the trampoline.
- Once we have a slab allocator from JIT: improve memory allocation #119730, a PR use one set of trampolines per-slab rather than per-trace.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
This has been discussed broadly at PyCon 2024 in person.
Linked PRs
erlend-aasland
Metadata
Metadata
Assignees
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagePerformance or resource usagetopic-JITtype-featureA feature request or enhancementA feature request or enhancement