-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[X86] Fix incorrect NOP insertion between fused instructions that breaks macro fusion #155316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-x86 Author: Aleksandr Popov (aleks-tmb) ChangesFull diff: https://github.com/llvm/llvm-project/pull/155316.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
index 3cee53a886ebf..4b0adabf57c86 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
@@ -520,7 +520,19 @@ void X86AsmBackend::emitInstructionBegin(MCObjectStreamer &OS,
if (!canPadInst(Inst, OS))
return;
- if (PendingBA && PendingBA->getNext() == OS.getCurrentFragment()) {
+ if (PendingBA) {
+ auto *NextFragment = PendingBA->getNext();
+ assert(NextFragment && "NextFragment should not be null");
+
+ if (NextFragment == OS.getCurrentFragment())
+ return;
+
+ // We eagerly create an empty fragment when inserting a fragment
+ // with a variable-size tail.
+ if (NextFragment->getKind() == MCFragment::FT_Relaxable &&
+ NextFragment->getNext() == OS.getCurrentFragment())
+ return;
+
// Macro fusion actually happens and there is no other fragment inserted
// after the previous instruction.
//
diff --git a/llvm/test/MC/X86/align-branch-fuse-2.s b/llvm/test/MC/X86/align-branch-fuse-2.s
new file mode 100644
index 0000000000000..cfa6bfd0f6f94
--- /dev/null
+++ b/llvm/test/MC/X86/align-branch-fuse-2.s
@@ -0,0 +1,21 @@
+# RUN: llvm-mc -filetype=obj -triple x86_64 -mattr=+prfchw -x86-pad-max-prefix-size=15 --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc %s | llvm-objdump -d --no-show-raw-insn - | FileCheck %s
+
+ .globl f
+f:
+ .p2align 5
+ xchgw %ax, %ax
+ pushq %rbp
+ pushq %r14
+ pushq %rbx
+ subq $16, %rsp
+ movq %rdx, %r14
+ movq %rsi, %rbx
+ cmpl $0, %gs:104
+ jne .L_EXIT
+.LBB0_1:
+# CHECK: 20: testq %rcx, %rcx
+# CHECK: 23: je
+ testq %rcx, %rcx
+ je .L_EXIT
+.L_EXIT:
+ ret
\ No newline at end of file
|
4a959a9
to
4f53f72
Compare
Please update the description about what this is changing. It's about a lost macro fusion for
? |
4f53f72
to
ab2ebb8
Compare
Thanks! Updated, PTAL |
|
||
// We eagerly create an empty fragment when inserting a fragment | ||
// with a variable-size tail. | ||
if (NextFragment->getKind() == MCFragment::FT_Relaxable && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fragment types other than Relaxable (FT_Align) might have a variable-size tail as well.
Is the condition NextFragment->getKind() == MCFragment::FT_Relaxable
required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it’s a conservative check, since I haven’t verified whether it can be applied to all fragment types. I might relax the check in a follow-up patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not required, best to remove it in this patch. The getKind API is usually not supposed to be called by backends.
7505067
to
d18a854
Compare
…aks macro fusion Fixes llvm#155045
d18a854
to
fd9e484
Compare
…structions that breaks macro fusion" (#155780) Reverts llvm/llvm-project#155316
…sion (#155784) In the 39c8cfb patch, getOrCreateDataFragment was optimized by eagerly allocating an empty fragment when adding a fragment with a variable-size tail. This means that in this case the current MC fragment is no longer the one where the instruction was inserted, and the check `PendingBA && PendingBA->getNext() == OS.getCurrentFragment()` fails, since CurrentFragment is now the empty fragment instead of the fragment containing the instruction. `PendingBA -> Fragment with a variable-size tail (contains previous instruction) -> CurrentFragment (newly allocated empty fragment)` This breaks the macro-fusion logic because it incorrectly assumes another fragment has been inserted between the fused instructions. Fixes #155045 #155316 Reland
In the 39c8cfb patch, getOrCreateDataFragment was optimized by eagerly allocating an empty fragment when adding a fragment with a variable-size tail. This means that in this case the current MC fragment is no longer the one where the instruction was inserted, and the check
PendingBA && PendingBA->getNext() == OS.getCurrentFragment()
fails, since CurrentFragment is now the empty fragment instead of the fragment containing the instruction.PendingBA -> Fragment with a variable-size tail (contains previous instruction) -> CurrentFragment (newly allocated empty fragment)
This breaks the macro-fusion logic because it incorrectly assumes another fragment has been inserted between the fused instructions.
Fixes #155045