-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[BOLT][AArch64] Enabling Inlining for Memcpy for AArch64 in BOLT #154929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-bolt Author: (yafet-a) ChangesOverviewThe pass for inlining memcpy in BOLT was currently X86-specific. It was using the instruction This patch implements a static size analysis system for AArch64 memcpy inlining that extracts copy sizes from preceding instructions to then use it to generate the optimal width-specific load/store sequences. Testing Coverage (
|
…arly return check
Thanks for fixing that. I have created this example: https://godbolt.org/z/rnaYKarfe What you see there is that X9 is saved to the stack before the call to memcpy, and after the call it is reloaded because it is used by function Can you add this test-case to your positive tests please? You can keep the other little examples that you have, but I think it would be good to have a bigger test case where you match the whole assembly sequence that includes this caller-saved register behaviour. It would be good if you then add one more, that similarly test the whole sequence but using the FP temp register, that would then cover everything I think. |
Overview
The pass for inlining memcpy in BOLT was currently X86-specific. It was using the instruction
rep movsb
which currently has no equivalent in ARM v8-A.This patch implements a static size analysis system for AArch64 memcpy inlining that extracts copy sizes from preceding instructions to then use it to generate the optimal width-specific load/store sequences.
Testing Coverage (
inline-memcpy.s
)Positive Tests:
Negative Tests: