Skip to content

Conversation

drodriguez
Copy link
Contributor

If an export trie is encoded incorrectly, and one of the children offsets points back to one of the nodes earlier in the serialization, the current code will end up in an infinite recursion, and eventually fail exhausting the available memory.

The failure can be avoided if, before recursing, one checks that the offset is valid, that is, that the offset is beyond the current position. This is similar to a check done by llvm-objdump which reports the trie being corrupted.

@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2025

@llvm/pr-subscribers-lld

Author: Daniel Rodríguez Troitiño (drodriguez)

Changes

If an export trie is encoded incorrectly, and one of the children offsets points back to one of the nodes earlier in the serialization, the current code will end up in an infinite recursion, and eventually fail exhausting the available memory.

The failure can be avoided if, before recursing, one checks that the offset is valid, that is, that the offset is beyond the current position. This is similar to a check done by llvm-objdump which reports the trie being corrupted.


Full diff: https://github.com/llvm/llvm-project/pull/152569.diff

5 Files Affected:

  • (modified) lld/MachO/ExportTrie.cpp (+9-5)
  • (modified) lld/MachO/ExportTrie.h (+1-1)
  • (modified) lld/MachO/InputFiles.cpp (+1-1)
  • (added) lld/test/MachO/invalid/Inputs/macho-trie-node-loop ()
  • (added) lld/test/MachO/invalid/export-trie-node-loop.s (+9)
diff --git a/lld/MachO/ExportTrie.cpp b/lld/MachO/ExportTrie.cpp
index 303eda416c235..8ba48ebc14a72 100644
--- a/lld/MachO/ExportTrie.cpp
+++ b/lld/MachO/ExportTrie.cpp
@@ -296,13 +296,14 @@ namespace {
 // Parse a serialized trie and invoke a callback for each entry.
 class TrieParser {
 public:
-  TrieParser(const uint8_t *buf, size_t size, const TrieEntryCallback &callback)
-      : start(buf), end(start + size), callback(callback) {}
+  TrieParser(const std::string &fileName, const uint8_t *buf, size_t size, const TrieEntryCallback &callback)
+      : fileName(fileName), start(buf), end(start + size), callback(callback) {}
 
   void parse(const uint8_t *buf, const Twine &cumulativeString);
 
   void parse() { parse(start, ""); }
 
+  const std::string fileName;
   const uint8_t *start;
   const uint8_t *end;
   const TrieEntryCallback &callback;
@@ -312,7 +313,7 @@ class TrieParser {
 
 void TrieParser::parse(const uint8_t *buf, const Twine &cumulativeString) {
   if (buf >= end)
-    fatal("Node offset points outside export section");
+    fatal(fileName + ": export trie node offset points outside export section");
 
   unsigned ulebSize;
   uint64_t terminalSize = decodeULEB128(buf, &ulebSize);
@@ -331,14 +332,17 @@ void TrieParser::parse(const uint8_t *buf, const Twine &cumulativeString) {
     buf += substring.size() + 1;
     offset = decodeULEB128(buf, &ulebSize);
     buf += ulebSize;
+    if (start + offset < buf)
+      fatal(fileName + ": export trie child node offset points before parent node");
     parse(start + offset, cumulativeString + substring);
   }
 }
 
-void macho::parseTrie(const uint8_t *buf, size_t size,
+void macho::parseTrie(const std::string &fileName,
+                      const uint8_t *buf, size_t size,
                       const TrieEntryCallback &callback) {
   if (size == 0)
     return;
 
-  TrieParser(buf, size, callback).parse();
+  TrieParser(fileName, buf, size, callback).parse();
 }
diff --git a/lld/MachO/ExportTrie.h b/lld/MachO/ExportTrie.h
index aa7e3b0d4a14b..5a5565d52bcc8 100644
--- a/lld/MachO/ExportTrie.h
+++ b/lld/MachO/ExportTrie.h
@@ -41,7 +41,7 @@ class TrieBuilder {
 using TrieEntryCallback =
     llvm::function_ref<void(const llvm::Twine & /*name*/, uint64_t /*flags*/)>;
 
-void parseTrie(const uint8_t *buf, size_t size, const TrieEntryCallback &);
+void parseTrie(const std::string &fileName, const uint8_t *buf, size_t size, const TrieEntryCallback &);
 
 } // namespace lld::macho
 
diff --git a/lld/MachO/InputFiles.cpp b/lld/MachO/InputFiles.cpp
index 3b3023a94166f..607867325be46 100644
--- a/lld/MachO/InputFiles.cpp
+++ b/lld/MachO/InputFiles.cpp
@@ -1789,7 +1789,7 @@ void DylibFile::parseExportedSymbols(uint32_t offset, uint32_t size) {
   auto *buf = reinterpret_cast<const uint8_t *>(mb.getBufferStart());
   std::vector<TrieEntry> entries;
   // Find all the $ld$* symbols to process first.
-  parseTrie(buf + offset, size, [&](const Twine &name, uint64_t flags) {
+  parseTrie(toString(this), buf + offset, size, [&](const Twine &name, uint64_t flags) {
     StringRef savedName = saver().save(name);
     if (handleLDSymbol(savedName))
       return;
diff --git a/lld/test/MachO/invalid/Inputs/macho-trie-node-loop b/lld/test/MachO/invalid/Inputs/macho-trie-node-loop
new file mode 100755
index 0000000000000..b94dfa2610e9f
Binary files /dev/null and b/lld/test/MachO/invalid/Inputs/macho-trie-node-loop differ
diff --git a/lld/test/MachO/invalid/export-trie-node-loop.s b/lld/test/MachO/invalid/export-trie-node-loop.s
new file mode 100644
index 0000000000000..d871709971dd4
--- /dev/null
+++ b/lld/test/MachO/invalid/export-trie-node-loop.s
@@ -0,0 +1,9 @@
+# REQUIRES: x86
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s -o %t.o
+# RUN: not %lld -o %t %t.o %S/Inputs/macho-trie-node-loop 2>&1 | FileCheck %s
+# CHECK: error:
+# CHECK-SAME: /Inputs/macho-trie-node-loop: export trie child node offset points before parent node
+
+.globl _main
+_main:
+  ret

@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2025

@llvm/pr-subscribers-lld-macho

Author: Daniel Rodríguez Troitiño (drodriguez)

Changes

If an export trie is encoded incorrectly, and one of the children offsets points back to one of the nodes earlier in the serialization, the current code will end up in an infinite recursion, and eventually fail exhausting the available memory.

The failure can be avoided if, before recursing, one checks that the offset is valid, that is, that the offset is beyond the current position. This is similar to a check done by llvm-objdump which reports the trie being corrupted.


Full diff: https://github.com/llvm/llvm-project/pull/152569.diff

5 Files Affected:

  • (modified) lld/MachO/ExportTrie.cpp (+9-5)
  • (modified) lld/MachO/ExportTrie.h (+1-1)
  • (modified) lld/MachO/InputFiles.cpp (+1-1)
  • (added) lld/test/MachO/invalid/Inputs/macho-trie-node-loop ()
  • (added) lld/test/MachO/invalid/export-trie-node-loop.s (+9)
diff --git a/lld/MachO/ExportTrie.cpp b/lld/MachO/ExportTrie.cpp
index 303eda416c235..8ba48ebc14a72 100644
--- a/lld/MachO/ExportTrie.cpp
+++ b/lld/MachO/ExportTrie.cpp
@@ -296,13 +296,14 @@ namespace {
 // Parse a serialized trie and invoke a callback for each entry.
 class TrieParser {
 public:
-  TrieParser(const uint8_t *buf, size_t size, const TrieEntryCallback &callback)
-      : start(buf), end(start + size), callback(callback) {}
+  TrieParser(const std::string &fileName, const uint8_t *buf, size_t size, const TrieEntryCallback &callback)
+      : fileName(fileName), start(buf), end(start + size), callback(callback) {}
 
   void parse(const uint8_t *buf, const Twine &cumulativeString);
 
   void parse() { parse(start, ""); }
 
+  const std::string fileName;
   const uint8_t *start;
   const uint8_t *end;
   const TrieEntryCallback &callback;
@@ -312,7 +313,7 @@ class TrieParser {
 
 void TrieParser::parse(const uint8_t *buf, const Twine &cumulativeString) {
   if (buf >= end)
-    fatal("Node offset points outside export section");
+    fatal(fileName + ": export trie node offset points outside export section");
 
   unsigned ulebSize;
   uint64_t terminalSize = decodeULEB128(buf, &ulebSize);
@@ -331,14 +332,17 @@ void TrieParser::parse(const uint8_t *buf, const Twine &cumulativeString) {
     buf += substring.size() + 1;
     offset = decodeULEB128(buf, &ulebSize);
     buf += ulebSize;
+    if (start + offset < buf)
+      fatal(fileName + ": export trie child node offset points before parent node");
     parse(start + offset, cumulativeString + substring);
   }
 }
 
-void macho::parseTrie(const uint8_t *buf, size_t size,
+void macho::parseTrie(const std::string &fileName,
+                      const uint8_t *buf, size_t size,
                       const TrieEntryCallback &callback) {
   if (size == 0)
     return;
 
-  TrieParser(buf, size, callback).parse();
+  TrieParser(fileName, buf, size, callback).parse();
 }
diff --git a/lld/MachO/ExportTrie.h b/lld/MachO/ExportTrie.h
index aa7e3b0d4a14b..5a5565d52bcc8 100644
--- a/lld/MachO/ExportTrie.h
+++ b/lld/MachO/ExportTrie.h
@@ -41,7 +41,7 @@ class TrieBuilder {
 using TrieEntryCallback =
     llvm::function_ref<void(const llvm::Twine & /*name*/, uint64_t /*flags*/)>;
 
-void parseTrie(const uint8_t *buf, size_t size, const TrieEntryCallback &);
+void parseTrie(const std::string &fileName, const uint8_t *buf, size_t size, const TrieEntryCallback &);
 
 } // namespace lld::macho
 
diff --git a/lld/MachO/InputFiles.cpp b/lld/MachO/InputFiles.cpp
index 3b3023a94166f..607867325be46 100644
--- a/lld/MachO/InputFiles.cpp
+++ b/lld/MachO/InputFiles.cpp
@@ -1789,7 +1789,7 @@ void DylibFile::parseExportedSymbols(uint32_t offset, uint32_t size) {
   auto *buf = reinterpret_cast<const uint8_t *>(mb.getBufferStart());
   std::vector<TrieEntry> entries;
   // Find all the $ld$* symbols to process first.
-  parseTrie(buf + offset, size, [&](const Twine &name, uint64_t flags) {
+  parseTrie(toString(this), buf + offset, size, [&](const Twine &name, uint64_t flags) {
     StringRef savedName = saver().save(name);
     if (handleLDSymbol(savedName))
       return;
diff --git a/lld/test/MachO/invalid/Inputs/macho-trie-node-loop b/lld/test/MachO/invalid/Inputs/macho-trie-node-loop
new file mode 100755
index 0000000000000..b94dfa2610e9f
Binary files /dev/null and b/lld/test/MachO/invalid/Inputs/macho-trie-node-loop differ
diff --git a/lld/test/MachO/invalid/export-trie-node-loop.s b/lld/test/MachO/invalid/export-trie-node-loop.s
new file mode 100644
index 0000000000000..d871709971dd4
--- /dev/null
+++ b/lld/test/MachO/invalid/export-trie-node-loop.s
@@ -0,0 +1,9 @@
+# REQUIRES: x86
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s -o %t.o
+# RUN: not %lld -o %t %t.o %S/Inputs/macho-trie-node-loop 2>&1 | FileCheck %s
+# CHECK: error:
+# CHECK-SAME: /Inputs/macho-trie-node-loop: export trie child node offset points before parent node
+
+.globl _main
+_main:
+  ret

Copy link

github-actions bot commented Aug 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@drodriguez
Copy link
Contributor Author

This is incorrect in some non-toy cases because it assumed that the children of a node were just after their parent. A possible layout, but not one that the export trie format enforces.

Please do not review until my next change.

@smeenai
Copy link
Collaborator

smeenai commented Aug 18, 2025

This is incorrect in some non-toy cases because it assumed that the children of a node were just after their parent. A possible layout, but not one that the export trie format enforces.

Please do not review until my next change.

"Convert to draft" is standard to signal this, I believe.

@drodriguez drodriguez marked this pull request as draft August 19, 2025 01:08
@drodriguez
Copy link
Contributor Author

"Convert to draft" is standard to signal this, I believe.

Couldn't find a button where the "Comment" button lives. It is a link in the sidebar 🤦‍♂️. Thanks for the clue what to search for.

@drodriguez drodriguez force-pushed the lld-error-trie-corrupted branch from 4285358 to 8df157a Compare August 20, 2025 01:19
If an export trie is encoded incorrectly, and one of the children
offsets points back to one of the nodes earlier in the serialization,
the current code will end up in an infinite recursion, and eventually
fail exhausting the available memory.

The failure can be avoided if, before recursing, one checks that the
offset is valid, that is, that the child offset is not one of the
previously visited offsets.. This is similar to a check done by llvm-objdump
which reports the trie being corrupted.
@drodriguez drodriguez force-pushed the lld-error-trie-corrupted branch from 8df157a to 6abf565 Compare August 20, 2025 01:20
@drodriguez drodriguez marked this pull request as ready for review August 22, 2025 17:28
@drodriguez drodriguez merged commit cbf10bc into llvm:main Aug 30, 2025
9 checks passed
@drodriguez drodriguez deleted the lld-error-trie-corrupted branch August 30, 2025 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants