Skip to content

Conversation

yozhu
Copy link
Contributor

@yozhu yozhu commented Aug 23, 2025

This time it is inline assembly code and function symbol does have non-zero size in symbol table. The function contains a constant island referenced by another function, which is discovered during relocation processing.

Also added an assertion in addEntryPointAtOffset() that extra entry point candidate does not point to constant data.

This time it is inline assembly code and function symbol does have
non-zero size in symbol table. The function has a constant island
that is referenced by another function, which is discovered during
relocation processing.

Also added an assertion in `addEntryPointAtOffset()` that extra entry
point candidate does not point to constant data.
@llvmbot
Copy link
Member

llvmbot commented Aug 23, 2025

@llvm/pr-subscribers-bolt

Author: YongKang Zhu (yozhu)

Changes

This time it is inline assembly code and function symbol does have non-zero size in symbol table. The function contains a constant island referenced by another function, which is discovered during relocation processing.

Also added an assertion in addEntryPointAtOffset() that extra entry point candidate does not point to constant data.


Full diff: https://github.com/llvm/llvm-project/pull/155055.diff

3 Files Affected:

  • (modified) bolt/lib/Core/BinaryFunction.cpp (+2)
  • (modified) bolt/lib/Rewrite/RewriteInstance.cpp (+2-1)
  • (modified) bolt/test/AArch64/validate-secondary-entry-point.s (+41-3)
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 8f494f105fbba..4b261ef7e6d7a 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -3771,6 +3771,8 @@ MCSymbol *BinaryFunction::addEntryPointAtOffset(uint64_t Offset) {
   assert(Offset && "cannot add primary entry point");
 
   const uint64_t EntryPointAddress = getAddress() + Offset;
+  assert(!isInConstantIsland(EntryPointAddress) &&
+         "cannot add entry point that points to constant data");
   MCSymbol *LocalSymbol = getOrCreateLocalLabel(EntryPointAddress);
 
   MCSymbol *EntrySymbol = getSecondaryEntryPointSymbol(LocalSymbol);
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 4f5a75b770ce6..a6e4dbc9c192f 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -2935,7 +2935,8 @@ void RewriteInstance::handleRelocation(const SectionRef &RelocatedSection,
         ReferencedSymbol = nullptr;
         ExtractedValue = Address;
       } else if (RefFunctionOffset) {
-        if (ContainingBF && ContainingBF != ReferencedBF) {
+        if (ContainingBF && ContainingBF != ReferencedBF &&
+            !ReferencedBF->isInConstantIsland(Address)) {
           ReferencedSymbol =
               ReferencedBF->addEntryPointAtOffset(RefFunctionOffset);
         } else {
diff --git a/bolt/test/AArch64/validate-secondary-entry-point.s b/bolt/test/AArch64/validate-secondary-entry-point.s
index 0099a0ee4fe99..3ad69467fe54f 100644
--- a/bolt/test/AArch64/validate-secondary-entry-point.s
+++ b/bolt/test/AArch64/validate-secondary-entry-point.s
@@ -1,13 +1,23 @@
 # This test is to verify that BOLT won't take a label pointing to constant
-# island as a secondary entry point (function `_start` doesn't have ELF size
-# set originally) and the function won't otherwise be mistaken as non-simple.
+# island as a secondary entry point. This could happen when function doesn't
+# have ELF size set if it is from assembly code, or a constant island is
+# referenced by another function discovered during relocation processing.
 
-# RUN: %clang %cflags -pie %s -o %t.so -Wl,-q -Wl,--init=_foo -Wl,--fini=_foo
+# RUN: split-file %s %t
+
+# RUN: %clang %cflags -pie %t/tt.asm -o %t.so \
+# RUN:   -Wl,-q -Wl,--init=_foo -Wl,--fini=_foo
 # RUN: llvm-bolt %t.so -o %t.bolt.so --print-cfg 2>&1 | FileCheck %s
 # CHECK-NOT: BOLT-WARNING: reference in the middle of instruction detected \
 # CHECK-NOT:   function _start at offset 0x{{[0-9a-f]+}}
 # CHECK: Binary Function "_start" after building cfg
 
+# RUN: %clang %cflags -ffunction-sections -shared %t/tt.c %t/ss.c -o %tt.so \
+# RUN:   -Wl,-q -Wl,--init=_start -Wl,--fini=_start \
+# RUN:   -Wl,--version-script=%t/linker_script
+# RUN: llvm-bolt %tt.so -o %tt.bolted.so
+
+;--- tt.asm
   .text
 
   .global _foo
@@ -32,3 +42,31 @@ _bar:
 
   # Dummy relocation to force relocation mode
   .reloc 0, R_AARCH64_NONE
+
+;--- tt.c
+void _start() {}
+
+__attribute__((naked)) void foo() {
+  asm("ldr x16, .L_fnptr\n"
+      "blr x16\n"
+      "ret\n"
+
+      "_rodatx:"
+      ".global _rodatx;"
+      ".quad 0;"
+      ".L_fnptr:"
+      ".quad 0;");
+}
+
+;--- ss.c
+__attribute__((visibility("hidden"))) extern void* _rodatx;
+void* bar() { return &_rodatx; }
+
+;--- linker_script
+{
+global:
+  _start;
+  foo;
+  bar;
+local: *;
+};

@maksfb maksfb changed the title [BOLT] Fix another cause of extra entry point misidentification [BOLT][AArch64] Fix another cause of extra entry point misidentification Aug 27, 2025
Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yozhu yozhu merged commit a9c1ae8 into llvm:main Aug 27, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants