Skip to content

JIT: improve AArch64 code generation #119726

@diegorusso

Description

@diegorusso

Feature or enhancement

Proposal:

This is really a follow up of #115802 and more focused on the AArch64 improvements of the code generated for the JIT.
This has been discussed with @brandtbucher during PyCon 2024.

There are a series of incremental improvements that we could implement when generating AArch64 code:

  • Remove duplication of trampoline section (movk) at the end of every micro op assembly code.
    // 0000000000000140:  R_AARCH64_MOVW_UABS_G0_NC    PyObject_Free
    // 144: f2a00008      movk    x8, #0x0, lsl #16
    // 0000000000000144:  R_AARCH64_MOVW_UABS_G1_NC    PyObject_Free
    // 148: f2c00008      movk    x8, #0x0, lsl #32
    // 0000000000000148:  R_AARCH64_MOVW_UABS_G2_NC    PyObject_Free
    // 14c: f2e00008      movk    x8, #0x0, lsl #48
    // 000000000000014c:  R_AARCH64_MOVW_UABS_G3       PyObject_Free
    // 150: d61f0100      br      x8
    // 154: 00 00 00 00
    // 158: d2800008      mov     x8, #0x0
    // 0000000000000158:  R_AARCH64_MOVW_UABS_G0_NC    PyObject_Free
    // 15c: f2a00008      movk    x8, #0x0, lsl #16
    // 000000000000015c:  R_AARCH64_MOVW_UABS_G1_NC    PyObject_Free
    // 160: f2c00008      movk    x8, #0x0, lsl #32
    // 0000000000000160:  R_AARCH64_MOVW_UABS_G2_NC    PyObject_Free
    // 164: f2e00008      movk    x8, #0x0, lsl #48
    // 0000000000000164:  R_AARCH64_MOVW_UABS_G3       PyObject_Free
    // 168: d61f0100      br      x8
  • Implement trampoline with LDR of a PC relative literal (instead of movk). It saves 8bytes in code size.
  • Move the trampolines from the "code" section of a micro-op to the "data" section, so it's out-of-line.
  • Emit all of the trampolines at the end of every trace, so that each opcode doesn't need its own copy of the trampolines it uses. Also write a function to generate the trampoline.
  • Once we have a slab allocator from JIT: improve memory allocation #119730, a PR use one set of trampolines per-slab rather than per-trace.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

This has been discussed broadly at PyCon 2024 in person.

Linked PRs

Metadata

Metadata

Assignees

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetopic-JITtype-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions