Skip to content

Conversation

aleks-tmb
Copy link
Contributor

@aleks-tmb aleks-tmb commented Aug 25, 2025

In the 39c8cfb patch, getOrCreateDataFragment was optimized by eagerly allocating an empty fragment when adding a fragment with a variable-size tail. This means that in this case the current MC fragment is no longer the one where the instruction was inserted, and the check
PendingBA && PendingBA->getNext() == OS.getCurrentFragment() fails, since CurrentFragment is now the empty fragment instead of the fragment containing the instruction.

PendingBA -> Fragment with a variable-size tail (contains previous instruction) -> CurrentFragment (newly allocated empty fragment)

This breaks the macro-fusion logic because it incorrectly assumes another fragment has been inserted between the fused instructions.

Fixes #155045

@llvmbot
Copy link
Member

llvmbot commented Aug 25, 2025

@llvm/pr-subscribers-backend-x86

Author: Aleksandr Popov (aleks-tmb)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/155316.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+13-1)
  • (added) llvm/test/MC/X86/align-branch-fuse-2.s (+21)
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
index 3cee53a886ebf..4b0adabf57c86 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
@@ -520,7 +520,19 @@ void X86AsmBackend::emitInstructionBegin(MCObjectStreamer &OS,
   if (!canPadInst(Inst, OS))
     return;
 
-  if (PendingBA && PendingBA->getNext() == OS.getCurrentFragment()) {
+  if (PendingBA) {
+    auto *NextFragment = PendingBA->getNext();
+    assert(NextFragment && "NextFragment should not be null");
+
+    if (NextFragment == OS.getCurrentFragment())
+      return;
+
+    // We eagerly create an empty fragment when inserting a fragment
+    // with a variable-size tail.
+    if (NextFragment->getKind() == MCFragment::FT_Relaxable &&
+        NextFragment->getNext() == OS.getCurrentFragment())
+      return;
+
     // Macro fusion actually happens and there is no other fragment inserted
     // after the previous instruction.
     //
diff --git a/llvm/test/MC/X86/align-branch-fuse-2.s b/llvm/test/MC/X86/align-branch-fuse-2.s
new file mode 100644
index 0000000000000..cfa6bfd0f6f94
--- /dev/null
+++ b/llvm/test/MC/X86/align-branch-fuse-2.s
@@ -0,0 +1,21 @@
+# RUN: llvm-mc -filetype=obj -triple x86_64 -mattr=+prfchw -x86-pad-max-prefix-size=15 --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc %s | llvm-objdump -d --no-show-raw-insn - | FileCheck %s
+
+	.globl	f
+f:
+  	.p2align  5
+	xchgw	%ax, %ax
+	pushq	%rbp
+	pushq	%r14
+	pushq	%rbx
+	subq	$16, %rsp
+	movq	%rdx, %r14
+	movq	%rsi, %rbx
+	cmpl	$0, %gs:104
+	jne	.L_EXIT
+.LBB0_1:
+# CHECK:      20:       testq   %rcx, %rcx
+# CHECK:      23:       je
+	testq	%rcx, %rcx
+	je	.L_EXIT
+.L_EXIT:
+	ret
\ No newline at end of file

@aleks-tmb aleks-tmb force-pushed the apopov/x86-macro-fusion branch from 4a959a9 to 4f53f72 Compare August 25, 2025 22:00
@aleks-tmb aleks-tmb requested a review from aengelke August 25, 2025 22:02
@MaskRay
Copy link
Member

MaskRay commented Aug 26, 2025

Please update the description about what this is changing. It's about a lost macro fusion for

clang -c -malign-branch-boundary=32 -malign-branch=fused,jcc x.s

?

@aleks-tmb aleks-tmb force-pushed the apopov/x86-macro-fusion branch from 4f53f72 to ab2ebb8 Compare August 26, 2025 12:30
@aleks-tmb
Copy link
Contributor Author

Please update the description about what this is changing. It's about a lost macro fusion for

clang -c -malign-branch-boundary=32 -malign-branch=fused,jcc x.s

?

Thanks! Updated, PTAL

@aleks-tmb aleks-tmb requested a review from MaskRay August 26, 2025 13:45
@aleks-tmb aleks-tmb added the bug Indicates an unexpected problem or unintended behavior label Aug 26, 2025
@aleks-tmb aleks-tmb self-assigned this Aug 26, 2025

// We eagerly create an empty fragment when inserting a fragment
// with a variable-size tail.
if (NextFragment->getKind() == MCFragment::FT_Relaxable &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragment types other than Relaxable (FT_Align) might have a variable-size tail as well.
Is the condition NextFragment->getKind() == MCFragment::FT_Relaxable required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it’s a conservative check, since I haven’t verified whether it can be applied to all fragment types. I might relax the check in a follow-up patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not required, best to remove it in this patch. The getKind API is usually not supposed to be called by backends.

@aleks-tmb aleks-tmb force-pushed the apopov/x86-macro-fusion branch 4 times, most recently from 7505067 to d18a854 Compare August 27, 2025 09:12
@aleks-tmb aleks-tmb requested a review from MaskRay August 27, 2025 09:15
@aleks-tmb aleks-tmb force-pushed the apopov/x86-macro-fusion branch from d18a854 to fd9e484 Compare August 28, 2025 06:08
@aleks-tmb aleks-tmb merged commit 59befff into llvm:main Aug 28, 2025
9 checks passed
aleks-tmb added a commit that referenced this pull request Aug 28, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 28, 2025
aleks-tmb added a commit that referenced this pull request Aug 28, 2025
…sion (#155784)

In the
39c8cfb
patch, getOrCreateDataFragment was optimized by eagerly allocating an
empty fragment when adding a fragment with a variable-size tail. This
means that in this case the current MC fragment is no longer the one
where the instruction was inserted, and the check
`PendingBA && PendingBA->getNext() == OS.getCurrentFragment()` fails,
since CurrentFragment is now the empty fragment instead of the fragment
containing the instruction.

`PendingBA -> Fragment with a variable-size tail (contains previous
instruction) -> CurrentFragment (newly allocated empty fragment)`

This breaks the macro-fusion logic because it incorrectly assumes
another fragment has been inserted between the fused instructions.

Fixes #155045

#155316 Reland
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[X86] emitInstructionBegin macro-fusion check broken after getOrCreateDataFragment optimization
3 participants