-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[BOLT][AArch64] Enabling Inlining for Memcpy for AArch64 in BOLT #154929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-bolt Author: (yafet-a) ChangesOverviewThe pass for inlining memcpy in BOLT was currently X86-specific. It was using the instruction This patch implements a static size analysis system for AArch64 memcpy inlining that extracts copy sizes from preceding instructions to then use it to generate the optimal width-specific load/store sequences. Testing Coverage (
|
@@ -1866,8 +1866,35 @@ Error InlineMemcpy::runOnFunctions(BinaryContext &BC) { | |||
const bool IsMemcpy8 = (CalleeSymbol->getName() == "_memcpy8"); | |||
const bool IsTailCall = BC.MIB->isTailCall(Inst); | |||
|
|||
// Extract size from preceding instructions (AArch64 only). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you abstract this all away? The isAArch64 check isn't really pretty and generic. I think we need a findMemcpySizeInBytes
virtual target function or something along those lines that is a no-op for all targets, except AArch64 and this implementation should be moved to that function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isAArch64 check isn't needed at all because of the fallback with the virtual method in MCPlusBuilder.h. I omitted it initally in the earlier commits (see db353b7)
I then added the check post review because I had taken the discussion (#154929 (comment)) on it being AArch64 specific to mean that I should have some self-documenting code to make the AArch64 specific logic in BinaryPasses clearer. I think I misunderstood what you were getting at.
To clarify: would you prefer I create the findMemcpySizeInBytes virtual method with the implementation moved to AArch64MCPlusBuilder.cpp, or keep the current logic in BinaryPasses.cpp and just drop the unnecessary isAArch64 check? (currently gone with the former in the latest commits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have added to the confusion. But yeah, I like this new implementation, because before we had a large AArch64 specific blob of code inlined in code that was shared with X86. My only remaining question is the ++II
below. Ideally we fold that the target hook (and don't think we need the isAArch64 check), but I am actually unsure why we need it in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ++II is now removed. It isn't needed now that instructions are being handled separately in findMemcpySizeInBytes()
The isAArch64()
check is still needed though since when memcpy's size comes from a register rather than an immediate, the size extraction returns nullopt. Without our check, createInlineMemcpy()
would crash when dereferencing *KnownSize on the null optional. With the check, we skip inlining and the memcpy call remains un-inlined safely (I've added a test to show this clearer).
I have also kept the check in BinaryPasses as opposed to putting it into InlineMemcpy()
so as to get a slightly earlier return since we know we can stop proceeding before even having to call InlineMemcpy
.
@@ -1866,8 +1866,35 @@ Error InlineMemcpy::runOnFunctions(BinaryContext &BC) { | |||
const bool IsMemcpy8 = (CalleeSymbol->getName() == "_memcpy8"); | |||
const bool IsTailCall = BC.MIB->isTailCall(Inst); | |||
|
|||
// Extract size from preceding instructions (AArch64 only). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have added to the confusion. But yeah, I like this new implementation, because before we had a large AArch64 specific blob of code inlined in code that was shared with X86. My only remaining question is the ++II
below. Ideally we fold that the target hook (and don't think we need the isAArch64 check), but I am actually unsure why we need it in the first place.
…arly return check
Overview
The pass for inlining memcpy in BOLT was currently X86-specific. It was using the instruction
rep movsb
which currently has no equivalent in ARM v8-A.This patch implements a static size analysis system for AArch64 memcpy inlining that extracts copy sizes from preceding instructions to then use it to generate the optimal width-specific load/store sequences.
Testing Coverage (
inline-memcpy.s
)Positive Tests:
Negative Tests: