[lldb] Add utility to create Mach-O corefile from YAML desc #153911

jasonmolenda · 2025-08-16T01:37:11Z

I've wanted a utility to create a corefile for test purposes given a bit of memory and regsters, for a while. I've written a few API tests over the years that needed exactly this capability -- we have several one-off Mach-O corefile creator utility in the API testsuite to do this. But it's a lot of boilerplate when you only want to specify some register contents and memory contents, to create an API test.

This adds yaml2mach-core, a tool that should build on any system, takes a yaml description of register values for one or more threads, optionally memory values for one or more memory regions, and can take a list of UUIDs that will be added as LC_NOTE "load binary" metadata to the corefile so binaries can be loaded into virtual address space in a test scenario.

The format of the yaml file looks like

cpu: armv7m
endian: little
threads:
  - regsets: 
     - flavor: gpr 
        registers: [{name: sp, value: 0x2000fe70}, {name: r7, value: 0x2000fe80}, 
                    {name: pc, value: 0x0020392c}, {name: lr, value: 0x0020392d}]

memory-regions:
  # stack memory
  - addr: 0x2000fe70 
    UInt32: [ 0x0000002a, 0x20010e58, 0x00203923, 
              0x00000001, 0x2000fe88, 0x00203911, 
              0x2000ffdc, 0xfffffff9 ]
  # instructions of a function
  - addr: 0x203910 
     UInt8: [ 0xf8, 0xb5, 0x04, 0xaf, 0x06, 0x4c, 0x07, 0x49, 
              0x74, 0xf0, 0x2e, 0xf8, 0x01, 0xac, 0x74, 0xf0 ]

and that's all that is needed to specify a corefile where four register values are specified (the others will be set to 0), and two memory regions will be emitted.

The memory can be specified as an array of UInt8, UInt32, or UInt64, I anticipate that some of these corefiles may have stack values constructed manually and it may be simpler for a human to write them in a particular grouping of values.

I needed this utility for an upcoming patch for ARM Cortex-M processors, to create a test for the change. I took the opportunity to remove two of the "trivial mach-o corefile" creator utilities I've written in the past, which also restricted the tests to only run on Darwin systems because I was using the system headers for Mach-O constant values.

rdar://110663219

I've wanted a utility to create a corefile for test purposes given a bit of memory and regsters, for a while. I've written a few API tests over the years that needed exactly this capability -- we have several one-off Mach-O corefile creator utility in the API testsuite to do this. But it's a lot of boilerplate when you only want to specify some register contents and memory contents, to create an API test. This adds yaml2mach-core, a tool that should build on any system, takes a yaml description of register values for one or more threads, optionally memory values for one or more memory regions, and can take a list of UUIDs that will be added as LC_NOTE "load binary" metadata to the corefile so binaries can be loaded into virtual address space in a test scenario. The format of the yaml file looks like cpu: armv7m endian: little threads: - regsets: - flavor: gpr registers: [{name: sp, value: 0x2000fe70}, {name: r7, value: 0x2000fe80}, {name: pc, value: 0x0020392c}, {name: lr, value: 0x0020392d}] memory-regions: - addr: 0x2000fe70 UInt32: [ 0x0000002a, 0x20010e58, 0x00203923, 0x00000001, 0x2000fe88, 0x00203911, 0x2000ffdc, 0xfffffff9 ] - addr: 0x203910 UInt8: [ 0xf8, 0xb5, 0x04, 0xaf, 0x06, 0x4c, 0x07, 0x49, 0x74, 0xf0, 0x2e, 0xf8, 0x01, 0xac, 0x74, 0xf0 ] and that's all that is needed to specify a corefile where four register values are specified (the others will be set to 0), and two memory regions will be emitted. The memory can be specified as an array of UInt8, UInt32, or UInt64, I anticipate that some of these corefiles may have stack values constructed manually and it may be simpler for a human to write them in a particular grouping of values. Accepting "endian" is probably a boondoggle that won't ever come to any use, and honestly I don't 100% know what the correct byte layout would be for a big endian Mach-O file any more. In a RISC-V discussion a month ago, it was noted that register byte layout will be little endian even when there is a big endian defined format for RV, so memory would be byteswapped but registers would not. It may have been better not to pretend to support this, but on the other hand it might be neat to be able to generate a big endian test case simply. I needed this utility for an upcoming patch for ARM Cortex-M processors, to create a test for the change. I took the opportunity to remove two of the "trivial mach-o corefile" creator utilities I've written in the past, which also restricted the tests to only run on Darwin systems because I was using the system headers for Mach-O constant values. rdar://110663219

llvmbot · 2025-08-16T01:37:43Z

@llvm/pr-subscribers-lldb

@llvm/pr-subscribers-backend-risc-v

Author: Jason Molenda (jasonmolenda)

Changes

I've wanted a utility to create a corefile for test purposes given a bit of memory and regsters, for a while. I've written a few API tests over the years that needed exactly this capability -- we have several one-off Mach-O corefile creator utility in the API testsuite to do this. But it's a lot of boilerplate when you only want to specify some register contents and memory contents, to create an API test.

This adds yaml2mach-core, a tool that should build on any system, takes a yaml description of register values for one or more threads, optionally memory values for one or more memory regions, and can take a list of UUIDs that will be added as LC_NOTE "load binary" metadata to the corefile so binaries can be loaded into virtual address space in a test scenario.

The format of the yaml file looks like

cpu: armv7m
endian: little
threads:

regsets: - flavor: gpr registers: [{name: sp, value: 0x2000fe70}, {name: r7, value: 0x2000fe80}, {name: pc, value: 0x0020392c}, {name: lr, value: 0x0020392d}]

memory-regions:

addr: 0x2000fe70 UInt32: [ 0x0000002a, 0x20010e58, 0x00203923, 0x00000001, 0x2000fe88, 0x00203911, 0x2000ffdc, 0xfffffff9 ]
addr: 0x203910 UInt8: [ 0xf8, 0xb5, 0x04, 0xaf, 0x06, 0x4c, 0x07, 0x49, 0x74, 0xf0, 0x2e, 0xf8, 0x01, 0xac, 0x74, 0xf0 ]

and that's all that is needed to specify a corefile where four register values are specified (the others will be set to 0), and two memory regions will be emitted.

The memory can be specified as an array of UInt8, UInt32, or UInt64, I anticipate that some of these corefiles may have stack values constructed manually and it may be simpler for a human to write them in a particular grouping of values.

Accepting "endian" is probably a boondoggle that won't ever come to any use, and honestly I don't 100% know what the correct byte layout would be for a big endian Mach-O file any more. In a RISC-V discussion a month ago, it was noted that register byte layout will be little endian even when there is a big endian defined format for RV, so memory would be byteswapped but registers would not. It may have been better not to pretend to support this, but on the other hand it might be neat to be able to generate a big endian test case simply.

I needed this utility for an upcoming patch for ARM Cortex-M processors, to create a test for the change. I took the opportunity to remove two of the "trivial mach-o corefile" creator utilities I've written in the past, which also restricted the tests to only run on Darwin systems because I was using the system headers for Mach-O constant values.

rdar://110663219

Patch is 64.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153911.diff

28 Files Affected:

(modified) lldb/packages/Python/lldbsuite/test/configuration.py (+10)
(modified) lldb/packages/Python/lldbsuite/test/dotest.py (+1)
(modified) lldb/packages/Python/lldbsuite/test/lldbtest.py (+15)
(modified) lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp (+36)
(modified) lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h (+3)
(removed) lldb/test/API/macosx/arm-corefile-regctx/Makefile (-6)
(modified) lldb/test/API/macosx/arm-corefile-regctx/TestArmMachoCorefileRegctx.py (+6-11)
(added) lldb/test/API/macosx/arm-corefile-regctx/arm64.yaml (+31)
(added) lldb/test/API/macosx/arm-corefile-regctx/armv7m.yaml (+37)
(removed) lldb/test/API/macosx/arm-corefile-regctx/create-arm-corefiles.cpp (-266)
(removed) lldb/test/API/macosx/riscv32-corefile/Makefile (-7)
(modified) lldb/test/API/macosx/riscv32-corefile/TestRV32MachOCorefile.py (+13-5)
(removed) lldb/test/API/macosx/riscv32-corefile/create-empty-riscv-corefile.cpp (-116)
(added) lldb/test/API/macosx/riscv32-corefile/riscv32-registers.yaml (+47)
(modified) lldb/tools/CMakeLists.txt (+2)
(added) lldb/tools/yaml2macho-core/CMakeLists.txt (+13)
(added) lldb/tools/yaml2macho-core/CoreSpec.h (+56)
(added) lldb/tools/yaml2macho-core/LCNoteWriter.cpp (+68)
(added) lldb/tools/yaml2macho-core/LCNoteWriter.h (+23)
(added) lldb/tools/yaml2macho-core/MemoryWriter.cpp (+57)
(added) lldb/tools/yaml2macho-core/MemoryWriter.h (+22)
(added) lldb/tools/yaml2macho-core/ThreadWriter.cpp (+190)
(added) lldb/tools/yaml2macho-core/ThreadWriter.h (+19)
(added) lldb/tools/yaml2macho-core/Utility.cpp (+57)
(added) lldb/tools/yaml2macho-core/Utility.h (+23)
(added) lldb/tools/yaml2macho-core/main.cpp (+223)
(added) lldb/tools/yaml2macho-core/yaml2corespec.cpp (+131)
(added) lldb/tools/yaml2macho-core/yaml2corespec.h (+16)

diff --git a/lldb/packages/Python/lldbsuite/test/configuration.py b/lldb/packages/Python/lldbsuite/test/configuration.py
index 5e3810992d172..1a9f25d66843a 100644
--- a/lldb/packages/Python/lldbsuite/test/configuration.py
+++ b/lldb/packages/Python/lldbsuite/test/configuration.py
@@ -64,6 +64,9 @@
 # Path to the yaml2obj tool. Not optional.
 yaml2obj = None
 
+# Path to the yaml2macho-core tool. Not optional.
+yaml2macho_core = None
+
 # The arch might dictate some specific CFLAGS to be passed to the toolchain to build
 # the inferior programs.  The global variable cflags_extras provides a hook to do
 # just that.
@@ -174,3 +177,10 @@ def get_yaml2obj_path():
     """
     if yaml2obj and os.path.lexists(yaml2obj):
         return yaml2obj
+
+def get_yaml2macho_core_path():
+    """
+    Get the path to the yaml2macho-core tool.
+    """
+    if yaml2macho_core and os.path.lexists(yaml2macho_core):
+        return yaml2macho_core
diff --git a/lldb/packages/Python/lldbsuite/test/dotest.py b/lldb/packages/Python/lldbsuite/test/dotest.py
index 47a3c2ed2fc9d..89b6807b41075 100644
--- a/lldb/packages/Python/lldbsuite/test/dotest.py
+++ b/lldb/packages/Python/lldbsuite/test/dotest.py
@@ -280,6 +280,7 @@ def parseOptionsAndInitTestdirs():
         configuration.llvm_tools_dir = args.llvm_tools_dir
         configuration.filecheck = shutil.which("FileCheck", path=args.llvm_tools_dir)
         configuration.yaml2obj = shutil.which("yaml2obj", path=args.llvm_tools_dir)
+        configuration.yaml2macho_core = shutil.which("yaml2macho-core", path=args.llvm_tools_dir)
 
     if not configuration.get_filecheck_path():
         logging.warning("No valid FileCheck executable; some tests may fail...")
diff --git a/lldb/packages/Python/lldbsuite/test/lldbtest.py b/lldb/packages/Python/lldbsuite/test/lldbtest.py
index 0fc85fcc4d2d6..599b019f0df8c 100644
--- a/lldb/packages/Python/lldbsuite/test/lldbtest.py
+++ b/lldb/packages/Python/lldbsuite/test/lldbtest.py
@@ -1702,6 +1702,21 @@ def yaml2obj(self, yaml_path, obj_path, max_size=None):
             command += ["--max-size=%d" % max_size]
         self.runBuildCommand(command)
 
+    def yaml2macho_core(self, yaml_path, obj_path, uuids=None):
+        """
+        Create a Mach-O corefile at the given path from a yaml file.
+
+        Throws subprocess.CalledProcessError if the object could not be created.
+        """
+        yaml2macho_core_bin = configuration.get_yaml2macho_core_path()
+        if not yaml2macho_core_bin:
+            self.assertTrue(False, "No valid yaml2macho-core executable specified")
+        if uuids != None:
+          command = [yaml2macho_core_bin, "-i", yaml_path, "-o", obj_path, "-u", uuids]
+        else:
+          command = [yaml2macho_core_bin, "-i", yaml_path, "-o", obj_path]
+        self.runBuildCommand(command)
+
     def cleanup(self, dictionary=None):
         """Platform specific way to do cleanup after build."""
         module = builder_module()
diff --git a/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp b/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp
index cb8ba05d461d4..0aff98078120e 100644
--- a/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp
+++ b/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp
@@ -12,6 +12,7 @@
 #include "lldb/Core/PluginManager.h"
 #include "lldb/Core/Section.h"
 #include "lldb/Symbol/Symbol.h"
+#include "lldb/Target/Target.h"
 #include "lldb/Utility/LLDBLog.h"
 #include "lldb/Utility/Log.h"
 #include "llvm/ADT/DenseSet.h"
@@ -233,6 +234,41 @@ void ObjectFileJSON::CreateSections(SectionList &unified_section_list) {
   }
 }
 
+bool ObjectFileJSON::SetLoadAddress(Target &target, lldb::addr_t value,
+                                    bool value_is_offset) {
+  Log *log(GetLog(LLDBLog::DynamicLoader));
+  if (!m_sections_up)
+    return true;
+
+  const bool warn_multiple = true;
+
+  addr_t slide = value;
+  if (!value_is_offset) {
+    addr_t lowest_addr = LLDB_INVALID_ADDRESS;
+    for (const SectionSP &section_sp : *m_sections_up) {
+      addr_t section_load_addr = section_sp->GetFileAddress();
+      lowest_addr = std::min(lowest_addr, section_load_addr);
+    }
+    if (lowest_addr == LLDB_INVALID_ADDRESS)
+      return false;
+    slide = value - lowest_addr;
+  }
+
+  // Apply slide to each section's file address.
+  for (const SectionSP &section_sp : *m_sections_up) {
+    addr_t section_load_addr = section_sp->GetFileAddress();
+    if (section_load_addr != LLDB_INVALID_ADDRESS) {
+      LLDB_LOGF(
+          log,
+          "ObjectFileJSON::SetLoadAddress section %s to load addr 0x%" PRIx64,
+          section_sp->GetName().AsCString(), section_load_addr + slide);
+      target.SetSectionLoadAddress(section_sp, section_load_addr + slide,
+                                   warn_multiple);
+    }
+  }
+  return true;
+}
+
 bool ObjectFileJSON::MagicBytesMatch(DataBufferSP data_sp,
                                      lldb::addr_t data_offset,
                                      lldb::addr_t data_length) {
diff --git a/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h b/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h
index b72565f468862..029c8ff188934 100644
--- a/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h
+++ b/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h
@@ -86,6 +86,9 @@ class ObjectFileJSON : public ObjectFile {
 
   Strata CalculateStrata() override { return eStrataUser; }
 
+  bool SetLoadAddress(Target &target, lldb::addr_t value,
+                      bool value_is_offset) override;
+
   static bool MagicBytesMatch(lldb::DataBufferSP data_sp, lldb::addr_t offset,
                               lldb::addr_t length);
 
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/Makefile b/lldb/test/API/macosx/arm-corefile-regctx/Makefile
deleted file mode 100644
index e1d0354441cd4..0000000000000
--- a/lldb/test/API/macosx/arm-corefile-regctx/Makefile
+++ /dev/null
@@ -1,6 +0,0 @@
-MAKE_DSYM := NO
-
-CXX_SOURCES := create-arm-corefiles.cpp
-
-include Makefile.rules
-
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/TestArmMachoCorefileRegctx.py b/lldb/test/API/macosx/arm-corefile-regctx/TestArmMachoCorefileRegctx.py
index 6754288a65e1a..a2890cdfeaa44 100644
--- a/lldb/test/API/macosx/arm-corefile-regctx/TestArmMachoCorefileRegctx.py
+++ b/lldb/test/API/macosx/arm-corefile-regctx/TestArmMachoCorefileRegctx.py
@@ -13,20 +13,14 @@
 class TestArmMachoCorefileRegctx(TestBase):
     NO_DEBUG_INFO_TESTCASE = True
 
-    @skipUnlessDarwin
-    def setUp(self):
-        TestBase.setUp(self)
-        self.build()
-        self.create_corefile = self.getBuildArtifact("a.out")
-        self.corefile = self.getBuildArtifact("core")
-
     def test_armv7_corefile(self):
         ### Create corefile
-        retcode = call(self.create_corefile + " armv7 " + self.corefile, shell=True)
+        corefile = self.getBuildArtifact("core")
+        self.yaml2macho_core("armv7m.yaml", corefile)
 
         target = self.dbg.CreateTarget("")
         err = lldb.SBError()
-        process = target.LoadCore(self.corefile)
+        process = target.LoadCore(corefile)
         self.assertTrue(process.IsValid())
         thread = process.GetSelectedThread()
         frame = thread.GetSelectedFrame()
@@ -50,11 +44,12 @@ def test_armv7_corefile(self):
 
     def test_arm64_corefile(self):
         ### Create corefile
-        retcode = call(self.create_corefile + " arm64 " + self.corefile, shell=True)
+        corefile = self.getBuildArtifact("core")
+        self.yaml2macho_core("arm64.yaml", corefile)
 
         target = self.dbg.CreateTarget("")
         err = lldb.SBError()
-        process = target.LoadCore(self.corefile)
+        process = target.LoadCore(corefile)
         self.assertTrue(process.IsValid())
         thread = process.GetSelectedThread()
         frame = thread.GetSelectedFrame()
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/arm64.yaml b/lldb/test/API/macosx/arm-corefile-regctx/arm64.yaml
new file mode 100644
index 0000000000000..4c23b69302a02
--- /dev/null
+++ b/lldb/test/API/macosx/arm-corefile-regctx/arm64.yaml
@@ -0,0 +1,31 @@
+cpu: arm64
+endian: little
+threads:
+  # (lldb) reg read
+  # % pbpaste | grep = | sed 's, ,,g' | awk -F= '{print "{name: " $1 ", value: " $2 "},"}'
+  - regsets:
+      - flavor: gpr
+        registers: [
+           {name: x0, value: 0x0000000000000001}, {name: x1, value: 0x000000016fdff3c0},
+           {name: x2, value: 0x000000016fdff3d0}, {name: x3, value: 0x000000016fdff510},
+           {name: x4, value: 0x0000000000000000}, {name: x5, value: 0x0000000000000000},
+           {name: x6, value: 0x0000000000000000}, {name: x7, value: 0x0000000000000000},
+           {name: x8, value: 0x000000010000d910}, {name: x9, value: 0x0000000000000001},
+           {name: x10, value: 0xe1e88de000000000}, {name: x11, value: 0x0000000000000003},
+           {name: x12, value: 0x0000000000000148}, {name: x13, value: 0x0000000000004000},
+           {name: x14, value: 0x0000000000000008}, {name: x15, value: 0x0000000000000000},
+           {name: x16, value: 0x0000000000000000}, {name: x17, value: 0x0000000100003f5c},
+           {name: x18, value: 0x0000000000000000}, {name: x19, value: 0x0000000100003f5c},
+           {name: x20, value: 0x000000010000c000}, {name: x21, value: 0x000000010000d910},
+           {name: x22, value: 0x000000016fdff250}, {name: x23, value: 0x000000018ce12366},
+           {name: x24, value: 0x000000016fdff1d0}, {name: x25, value: 0x0000000000000001},
+           {name: x26, value: 0x0000000000000000}, {name: x27, value: 0x0000000000000000},
+           {name: x28, value: 0x0000000000000000}, {name: fp, value: 0x000000016fdff3a0},
+           {name: lr, value: 0x000000018cd97f28}, {name: sp, value: 0x000000016fdff140},
+           {name: pc, value: 0x0000000100003f5c}, {name: cpsr, value: 0x80001000}
+        ]
+      - flavor: exc
+        registers: [ {name: far, value: 0x0000000100003f5c}, 
+                     {name: esr, value: 0xf2000000}, 
+                     {name: exception, value: 0x00000000}
+                   ]
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/armv7m.yaml b/lldb/test/API/macosx/arm-corefile-regctx/armv7m.yaml
new file mode 100644
index 0000000000000..1351056ed0999
--- /dev/null
+++ b/lldb/test/API/macosx/arm-corefile-regctx/armv7m.yaml
@@ -0,0 +1,37 @@
+cpu: armv7m
+endian: little
+threads:
+  # (lldb) reg read
+  # % pbpaste | grep = | sed 's, ,,g' | awk -F= '{print "{name: " $1 ", value: " $2 "},"}'
+  - regsets:
+      - flavor: gpr
+        registers: [
+          {name: r0, value: 0x00010000}, {name: r1, value: 0x00020000},
+          {name: r2, value: 0x00030000}, {name: r3, value: 0x00040000},
+          {name: r4, value: 0x00050000}, {name: r5, value: 0x00060000},
+          {name: r6, value: 0x00070000}, {name: r7, value: 0x00080000},
+          {name: r8, value: 0x00090000}, {name: r9, value: 0x000a0000},
+          {name: r10, value: 0x000b0000}, {name: r11, value: 0x000c0000},
+          {name: r12, value: 0x000d0000}, {name: sp, value: 0x000e0000},
+          {name: lr, value: 0x000f0000}, {name: pc, value: 0x00100000},
+          {name: cpsr, value: 0x00110000}
+        ]
+      - flavor: exc
+        registers: [ {name: far, value: 0x00003f5c},
+                     {name: esr, value: 0xf2000000},
+                     {name: exception, value: 0x00000000}
+                   ]
+
+memory-regions:
+  # $sp is 0x000e0000, have bytes surrounding that address
+  - addr: 0x000dffe0
+    UInt8: [
+            0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
+            0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11,
+            0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a,
+            0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23,
+            0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c,
+            0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35,
+            0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 
+            0x3f
+           ]
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/create-arm-corefiles.cpp b/lldb/test/API/macosx/arm-corefile-regctx/create-arm-corefiles.cpp
deleted file mode 100644
index db39f12ecfb7e..0000000000000
--- a/lldb/test/API/macosx/arm-corefile-regctx/create-arm-corefiles.cpp
+++ /dev/null
@@ -1,266 +0,0 @@
-#include <mach-o/loader.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string>
-#include <vector>
-
-
-// Normally these are picked up by including <mach/thread_status.h>
-// but that does a compile time check for the build host arch and
-// only defines the ARM register context constants when building on
-// an arm system.  We're creating fake corefiles, and might be
-// creating them on an intel system.
-#ifndef ARM_THREAD_STATE
-#define ARM_THREAD_STATE 1
-#endif
-#ifndef ARM_THREAD_STATE_COUNT
-#define ARM_THREAD_STATE_COUNT 17
-#endif
-#ifndef ARM_EXCEPTION_STATE
-#define ARM_EXCEPTION_STATE 3
-#endif
-#ifndef ARM_EXCEPTION_STATE_COUNT
-#define ARM_EXCEPTION_STATE_COUNT 3
-#endif
-#ifndef ARM_THREAD_STATE64
-#define ARM_THREAD_STATE64 6
-#endif
-#ifndef ARM_THREAD_STATE64_COUNT
-#define ARM_THREAD_STATE64_COUNT 68
-#endif
-#ifndef ARM_EXCEPTION_STATE64
-#define ARM_EXCEPTION_STATE64 7
-#endif
-#ifndef ARM_EXCEPTION_STATE64_COUNT
-#define ARM_EXCEPTION_STATE64_COUNT 4
-#endif
-
-union uint32_buf {
-  uint8_t bytebuf[4];
-  uint32_t val;
-};
-
-union uint64_buf {
-  uint8_t bytebuf[8];
-  uint64_t val;
-};
-
-void add_uint64(std::vector<uint8_t> &buf, uint64_t val) {
-  uint64_buf conv;
-  conv.val = val;
-  for (int i = 0; i < 8; i++)
-    buf.push_back(conv.bytebuf[i]);
-}
-
-void add_uint32(std::vector<uint8_t> &buf, uint32_t val) {
-  uint32_buf conv;
-  conv.val = val;
-  for (int i = 0; i < 4; i++)
-    buf.push_back(conv.bytebuf[i]);
-}
-
-std::vector<uint8_t> armv7_lc_thread_load_command() {
-  std::vector<uint8_t> data;
-  add_uint32(data, LC_THREAD);              // thread_command.cmd
-  add_uint32(data, 104);                    // thread_command.cmdsize
-  add_uint32(data, ARM_THREAD_STATE);       // thread_command.flavor
-  add_uint32(data, ARM_THREAD_STATE_COUNT); // thread_command.count
-  add_uint32(data, 0x00010000);             // r0
-  add_uint32(data, 0x00020000);             // r1
-  add_uint32(data, 0x00030000);             // r2
-  add_uint32(data, 0x00040000);             // r3
-  add_uint32(data, 0x00050000);             // r4
-  add_uint32(data, 0x00060000);             // r5
-  add_uint32(data, 0x00070000);             // r6
-  add_uint32(data, 0x00080000);             // r7
-  add_uint32(data, 0x00090000);             // r8
-  add_uint32(data, 0x000a0000);             // r9
-  add_uint32(data, 0x000b0000);             // r10
-  add_uint32(data, 0x000c0000);             // r11
-  add_uint32(data, 0x000d0000);             // r12
-  add_uint32(data, 0x000e0000);             // sp
-  add_uint32(data, 0x000f0000);             // lr
-  add_uint32(data, 0x00100000);             // pc
-  add_uint32(data, 0x00110000);             // cpsr
-
-  add_uint32(data, ARM_EXCEPTION_STATE);       // thread_command.flavor
-  add_uint32(data, ARM_EXCEPTION_STATE_COUNT); // thread_command.count
-  add_uint32(data, 0x00003f5c);                // far
-  add_uint32(data, 0xf2000000);                // esr
-  add_uint32(data, 0x00000000);                // exception
-
-  return data;
-}
-
-std::vector<uint8_t> arm64_lc_thread_load_command() {
-  std::vector<uint8_t> data;
-  add_uint32(data, LC_THREAD);                // thread_command.cmd
-  add_uint32(data, 312);                      // thread_command.cmdsize
-  add_uint32(data, ARM_THREAD_STATE64);       // thread_command.flavor
-  add_uint32(data, ARM_THREAD_STATE64_COUNT); // thread_command.count
-  add_uint64(data, 0x0000000000000001);       // x0
-  add_uint64(data, 0x000000016fdff3c0);       // x1
-  add_uint64(data, 0x000000016fdff3d0);       // x2
-  add_uint64(data, 0x000000016fdff510);       // x3
-  add_uint64(data, 0x0000000000000000);       // x4
-  add_uint64(data, 0x0000000000000000);       // x5
-  add_uint64(data, 0x0000000000000000);       // x6
-  add_uint64(data, 0x0000000000000000);       // x7
-  add_uint64(data, 0x000000010000d910);       // x8
-  add_uint64(data, 0x0000000000000001);       // x9
-  add_uint64(data, 0xe1e88de000000000);       // x10
-  add_uint64(data, 0x0000000000000003);       // x11
-  add_uint64(data, 0x0000000000000148);       // x12
-  add_uint64(data, 0x0000000000004000);       // x13
-  add_uint64(data, 0x0000000000000008);       // x14
-  add_uint64(data, 0x0000000000000000);       // x15
-  add_uint64(data, 0x0000000000000000);       // x16
-  add_uint64(data, 0x0000000100003f5c);       // x17
-  add_uint64(data, 0x0000000000000000);       // x18
-  add_uint64(data, 0x0000000100003f5c);       // x19
-  add_uint64(data, 0x000000010000c000);       // x20
-  add_uint64(data, 0x000000010000d910);       // x21
-  add_uint64(data, 0x000000016fdff250);       // x22
-  add_uint64(data, 0x000000018ce12366);       // x23
-  add_uint64(data, 0x000000016fdff1d0);       // x24
-  add_uint64(data, 0x0000000000000001);       // x25
-  add_uint64(data, 0x0000000000000000);       // x26
-  add_uint64(data, 0x0000000000000000);       // x27
-  add_uint64(data, 0x0000000000000000);       // x28
-  add_uint64(data, 0x000000016fdff3a0);       // fp
-  add_uint64(data, 0x000000018cd97f28);       // lr
-  add_uint64(data, 0x000000016fdff140);       // sp
-  add_uint64(data, 0x0000000100003f5c);       // pc
-  add_uint32(data, 0x80001000);               // cpsr
-
-  add_uint32(data, 0x00000000); // padding
-
-  add_uint32(data, ARM_EXCEPTION_STATE64);       // thread_command.flavor
-  add_uint32(data, ARM_EXCEPTION_STATE64_COUNT); // thread_command.count
-  add_uint64(data, 0x0000000100003f5c);          // far
-  add_uint32(data, 0xf2000000);                  // esr
-  add_uint32(data, 0x00000000);                  // exception
-
-  return data;
-}
-
-std::vector<uint8_t> lc_segment(uint32_t fileoff,
-                                uint32_t lc_segment_data_size) {
-  std::vector<uint8_t> data;
-  // 0x000e0000 is the value of $sp in the armv7 LC_THREAD
-  uint32_t start_vmaddr = 0x000e0000 - (lc_segment_data_size / 2);
-  add_uint32(data, LC_SEGMENT);                     // segment_command.cmd
-  add_uint32(data, sizeof(struct segment_command)); // segment_command.cmdsize
-  for (int i = 0; i < 16; i++)
-    data.push_back(0);                    // segment_command.segname[16]
-  add_uint32(data, start_vmaddr);         // segment_command.vmaddr
-  add_uint32(data, lc_segment_data_size); // segment_command.vmsize
-  add_uint32(data, fileoff);              // segment_command.fileoff
-  add_uint32(data, lc_segment_data_size); // segment_command.filesize
-  add_uint32(data, 3);                    // segment_command.maxprot
-  add_uint32(data, 3);                    // segment_command.initprot
-  add_uint32(data, 0);                    // segment_command.nsects
-  add_uint32(data, 0);                    // segment_command.flags
-
-  return data;
-}
-
-enum arch { unspecified, armv7, arm64 };
-
-int main(int argc, char **argv) {
-  if (argc != 3) {
-    fprintf(stderr,
-            "usage: create-arm-corefiles [armv7|arm64] <output-core-name>\n");
-    exit(1);
-  }
-
-  arch arch = unspecified;
-
-  if (strcmp(argv[1], "armv7") == 0)
-    arch = armv7;
-  else if (strcmp(argv[1], "arm64") == 0)
-    arch = arm64;
-  else {
-    fprintf(stderr, "unrecognized architecture %s\n", argv[1]);
-    exit(1);
-  }
-
-  // An array of load commands (in the form of byte arrays)
-  std::vector<std::vector<uint8_t>> load_commands;
-
-  // An array of corefile contents (page data, lc_note data, etc)
-  std::vector<uint8_t> payload;
-
-  // First add all the load commands / payload so we can figure out how large
-  // the load commands will actually be.
-  if (arch == armv7) {
-    load_commands.push_back(armv7_lc_thread_load_command());
-    load_commands.push_back(lc_segment(0, 0));
-  } else if (arch == arm64) {
-    load_commands.push_back(arm64_lc_thread_load_command());
-  }
-
-  int size_of_load_commands = 0;
-  for (const auto &lc : load_commands)
-    size_of_load_commands += lc.size();
-
-  int header_and_load_cmd_room =
-      sizeof(struct mach_header_64) + size_of_load_commands;
-
-  // Erase the load commands / payload now that we know how much space...
[truncated]

github-actions · 2025-08-16T01:40:02Z

✅ With the latest revision this PR passed the Python code formatter.

jasonmolenda · 2025-08-16T02:05:43Z

One thing I wasn't thrilled about with llvm's yaml MappingTraits parser was that I need to define register values like

       registers: [
           {name: x0, value: 0x0000000000000001}, {name: x1, value: 0x000000016fdff3c0},
            {name: x2, value: 0x000000016fdff3d0}, {name: x3, value: 0x000000016fdff510},

instead of a more natural style of registers = { "x0": 0x1, "x1": 0x16fdff3c0, "x2": 0x16fdff3d0} or so.
At least I couldn't figure out how to do this. It makes the yaml descriptions noisier than they really need to be.

jasonmolenda · 2025-08-16T03:15:45Z

The Linux PR pre-merge testing is failing because lldb/tool/yaml2macho-core is not being built. I think I need to add a dependency maybe in test/CMakeLists.txt? I was doing all of my development with simply ninja to build everything, but I prob need to have this added to the lldb or check-lldb-api targets.

When a processor faults/is interrupted/gets an exception, it will stop running code and jump to an exception catcher routine. Most processors will store the pc that was executing in a system register, and the catcher functions have special instructions to retrieve that & possibly other registers. It may then save those values to stack, and the author can add .cfi directives to tell lldb's unwinder where to find those saved values. ARM Cortex-M (microcontroller) processors have a simpler mechanism where a fixed set of registers are saved to the stack on an exception, and a unique value is put in the link register to indicate to the caller that this has taken place. No special handling needs to be written into the exception catcher, unless it wants to inspect these preserved values. And it is possible for a general stack walker to walk the stack with no special knowledge about what the catch function does. This patch adds an Architecture plugin method to allow an Architecture to override/augment the UnwindPlan that lldb would use for a stack frame, given the contents of the return address register. It resembles a feature where the LanguageRuntime can replace/augment the unwind plan for a function, but it is doing it at offset by one level. The LanguageRuntime is looking at the local register context and/or symbol name to decide if it will override the unwind rules. For the Cortex-M exception unwinds, we need to modify THIS frame's unwind plan if the CALLER's LR had a specific value. RegisterContextUnwind has to retrieve the caller's LR value before it has completely decided on the UnwindPlan it will use for THIS stack frame. This does mean that we will need one additional read of stack memory than we currently use when unwinding. The unwinder walks the stack lazily, as stack frames are requested, and so now if you ask for 2 stack frames, we will read enough stack to walk 2 frames, plus we will read one extra word of memory, the spilled RA value from the stack (see RegisterContextUnwind::AdoptArchitectureUnwindPlan()). In practice, with 512-byte memory cache reads, this is unlikely to be a problem, but I'm wondering if I should add an Architecture method of "does this Architecture implement `GetArchitectureUnwindPlan`" method -- and only do the memory read if it does. So the performance impact would be limited to armv7/Cortex-M debug sessions. This PR includes a test with a yaml corefile description and a JSON ObjectFile, incorporating all of the necessary stack memory and symbol names from a real debug session I worked on. The architectural default unwind plans are used for all stack frames except the 0th because there's no instructions for the functions, and no unwind info. I may need to add an encoding of unwind fules to ObjectFileJSON in the future as we create more test cases like this. This PR depends on the yaml2macho-core utility from llvm#153911 rdar://110663219

jasonmolenda · 2025-08-16T06:52:57Z

I think @labath would point out that I'm doing an end-run around making a sufficient Mock Process capability, with memory and threads and symbols, to write unit tests. @medismailben would point out that we could write a Scripted Process python script that would ingest this same information and vend a Process, just as well as using the corefile container for the information. For that matter, a little gdb remote serial protocol stub written in python could present this same information as if it were a live process with threads, registers, and memory.

Because I already had several mach-o corefile creator tools (and needed a new one each time I needed to test another part of the mach-o corefile reader part of lldb), it seemed most natural to go that route, to me.

The most important part for me is the simplicity of taking a real world debug problem situation, live or corefile, which may involve giant binaries/corefile and cannot be used in a test for size or confidentiality reasons, but we can extract the core bits of registers and memory that are sufficient to show the issue being fixed. We can't test issues dealing with debug info with this mechanism -- say something specific to firmware debugging that can't be replicated in a userland process -- but for a lot of memory-and-stack-and-register type bugs, I think this could be a handy tool.

new yaml2macho-core tool. Thanks to Felipe for the guidance.

uint32_t et al, for some reason I thought those were built-in with C++.

bulbazord

This is a really cool utility! I'm no expert, but I made a few suggestions inline. Thanks for sharing. :)

bulbazord · 2025-08-18T17:26:20Z

lldb/tools/yaml2macho-core/CoreSpec.h

+
+enum Endian { Big = 0, Little = 1 };
+
+enum MemoryType { UInt8 = 0, UInt32, UInt64 };


suggestion: MemoryType -> WordSize? MemoryType seems a little vague.

lldb/tools/yaml2macho-core/LCNoteWriter.cpp

lldb/tools/yaml2macho-core/Utility.cpp

I'm not convinced that it was emitting big-endian Mach-O files correctly, and until this is actually needed, there's no point in carrying around dubious code.

JDevlieghere

I know this is a simple tool, but it seems like adding a bit of organization could go a long way. Essentially, the tool consists of 3 parts:

A reader that takes YAML and creates an in-memory/intermediate representation (CoreSpec).
A writer that takes a CoreSpec and emit a binary.
The glue that holds (1) and (2) together as well as command-line parsing and I/O.

If it were up to me, that's how I would structure this tool. I think that will make it a lot easier to understand and extend in the future.

lldb/tools/yaml2macho-core/CoreSpec.h

JDevlieghere · 2025-08-22T16:33:05Z

lldb/tools/yaml2macho-core/CMakeLists.txt

+    main.cpp
+    yaml2corespec.cpp


Nit: filename suggestions.

Suggested change

main.cpp

yaml2corespec.cpp

yaml2macho.cpp

CoreSpec.cpp

JDevlieghere · 2025-08-22T16:35:25Z

lldb/tools/yaml2macho-core/CoreSpec.h

+  std::vector<uint8_t> bytes;
+  std::vector<uint32_t> words;
+  std::vector<uint64_t> doublewords;


If they're mutually exclusive, you could do:

Suggested change

std::vector<uint8_t> bytes;

std::vector<uint32_t> words;

std::vector<uint64_t> doublewords;

using Bytes = std::vector<uint8_t>;

using Words = std::vector<uint32_t>;

using Doublewords = std::vector<uint64_t>;

std::variant<Bytes, Words, Doublewords> data;

but this might complicate the YAML traits. We do this for protocol messages in DAP and MCP.

yeah, I wasn't super thrilled with having these three, but wanted to maintain the input formatting for endian switching (which I then later abandoned lol). At one point it was an anonymous union. I ended up bailing on all of that and just having three members, only one of which is active; they're not accessed directly in many places.

JDevlieghere · 2025-08-22T16:35:48Z

lldb/tools/yaml2macho-core/LCNoteWriter.cpp

+//===----------------------------------------------------------------------===//
+
+#include "lldb/Utility/UUID.h"
+


Nit: no newline between header includes, the order is handled by clang-format.

Suggested change

JDevlieghere · 2025-08-22T16:40:04Z

lldb/tools/yaml2macho-core/main.cpp

+int main(int argc, char **argv) {
+
+  const char *const short_opts = "i:o:u:h";
+  const option long_opts[] = {{"input", required_argument, nullptr, 'i'},


This should use cl::opt (https://llvm.org/docs/CommandLine.html). Even if the current implementation is slightly simpler, it makes it harder to extend in the future. It's used by every llvm tool (that doesn't use tablegen).

Hm, I took a stab at this but it's a little confusing how it works, I tried looking at how lldb-test uses this but it's also not very straightforward. Maybe i'll let that sit for now, getopt is very simlple...

JDevlieghere · 2025-08-22T16:42:39Z

lldb/tools/yaml2macho-core/LCNoteWriter.cpp

+
+#include "llvm/BinaryFormat/MachO.h"
+
+void create_lc_note_binary_load_cmd(const CoreSpec &spec,


Surprised this lives in its own file? Why not merge this with utility?

It's a tiny one method file today, but I can see me adding additional LC_NOTEs in the future. So I wanted the "thread writer" file which emits LC_THREAD load commands, the "memory writer" file which emits LC_SEGMENTs and the blocks of memory, and the "LC_NOTE writer" which emits LC_NOTE load commands and bytes of the note command payloads. It makes sense in my head for these three to be in their own files.

JDevlieghere · 2025-08-22T16:43:34Z

lldb/tools/yaml2macho-core/Utility.cpp

+#include "Utility.h"
+#include "CoreSpec.h"
+
+void add_uint64(const CoreSpec &spec, std::vector<uint8_t> &buf, uint64_t val) {


This takes a CoreSpec but doesn't actually use it? Same for add_uint32 below.

Originally you could specify the Endianness in the YAML and I needed to pass that to the add_uint methods to fix byte ordering in the output file. I eventually lost confidence that I was actually creating correctly-formatted Big Endian mach-o files and I removed that feature, forgot to clean up and remove this argument.

jasonmolenda · 2025-08-26T01:50:45Z

I know this is a simple tool, but it seems like adding a bit of organization could go a long way. Essentially, the tool consists of 3 parts:

A reader that takes YAML and creates an in-memory/intermediate representation (CoreSpec).

A writer that takes a CoreSpec and emit a binary.

The glue that holds (1) and (2) together as well as command-line parsing and I/O.

If it were up to me, that's how I would structure this tool. I think that will make it a lot easier to understand and extend in the future.

Thanks for all the comments, addressing them now.

Yeah before I wrote this I didn't have a clear idea of what it would look like when finished (for some reason, it seems obvious now, but in the beginning there were some poor choices made and fixed along the way). You could imagine someone making an ELF corefile output capability, for instance. The YAML files I'm using as input is just a way of specifying an architecture, some registers for threads, and some memory.

I don't know if I want to structure it more generally yet, with subdirectories, or whatever, for the YAML to intermediate representation and for the Mach-O corefile writing. If anyone does want to restructure it for a additional input/output methods, I think it will be easy to restructure it at that point. It may end up never growing beyond this simple set of features (I know, unlikely)

github-actions · 2025-08-26T02:25:13Z

✅ With the latest revision this PR passed the C/C++ code formatter.

I think it's a little easier to follow the register context writing methods when it's more explicit what is being written.

And the ability to specify the number of bits used in addressing on this cpu.

jasonmolenda requested a review from JDevlieghere as a code owner August 16, 2025 01:37

llvmbot added lldb backend:RISC-V labels Aug 16, 2025

jasonmolenda added 2 commits August 15, 2025 18:58

Whitespace fix.

11896d8

more ws fix

6a272fc

Revert the ObjectFileJSON changes, will land separately.

9b28021

jasonmolenda mentioned this pull request Aug 16, 2025

[lldb] Unwind through ARM Cortex-M exceptions automatically #153922

Open

jasonmolenda added 2 commits August 17, 2025 12:32

Correct my cmake details so the check-lldb-api target builds the

46c1e91

new yaml2macho-core tool. Thanks to Felipe for the guidance.

Seems like I need <cstdint> for

129fb3f

uint32_t et al, for some reason I thought those were built-in with C++.

bulbazord reviewed Aug 18, 2025

View reviewed changes

jasonmolenda added 2 commits August 18, 2025 18:56

Remove the endian specifier.

02d6408

I'm not convinced that it was emitting big-endian Mach-O files correctly, and until this is actually needed, there's no point in carrying around dubious code.

Use lldb's UUID class.

0dbcf22

JDevlieghere reviewed Aug 22, 2025

View reviewed changes

Integrate Jonas' suggestions.

57e807c

jasonmolenda added 5 commits August 25, 2025 20:14

ws fix

28c3d9a

Add separate methods for writing 32-bit and 64-bit registers,

2822c1d

I think it's a little easier to follow the register context writing methods when it's more explicit what is being written.

Add the ability to specify a list of UUID & virtual addresses.

5e924d0

And the ability to specify the number of bits used in addressing on this cpu.

Merge branch 'main' into add-yaml2macho-core-test-utility

3080ddb

ws fix

f636fec


		enum Endian { Big = 0, Little = 1 };

		enum MemoryType { UInt8 = 0, UInt32, UInt64 };

-  std::vector<uint8_t> bytes;
-  std::vector<uint32_t> words;
-  std::vector<uint64_t> doublewords;
+  using Bytes = std::vector<uint8_t>;
+  using Words = std::vector<uint32_t>;
+  using Doublewords = std::vector<uint64_t>;
+  std::variant<Bytes, Words, Doublewords> data;

		//===----------------------------------------------------------------------===//

		#include "lldb/Utility/UUID.h"


		#include "llvm/BinaryFormat/MachO.h"

		void create_lc_note_binary_load_cmd(const CoreSpec &spec,

[lldb] Add utility to create Mach-O corefile from YAML desc #153911

Are you sure you want to change the base?

[lldb] Add utility to create Mach-O corefile from YAML desc #153911

Uh oh!

Conversation

jasonmolenda commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonmolenda commented Aug 16, 2025

Uh oh!

jasonmolenda commented Aug 16, 2025

Uh oh!

jasonmolenda commented Aug 16, 2025

Uh oh!

bulbazord left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JDevlieghere left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasonmolenda commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jasonmolenda commented Aug 16, 2025 •

edited

Loading

llvmbot commented Aug 16, 2025 •

edited

Loading

github-actions bot commented Aug 16, 2025 •

edited

Loading

github-actions bot commented Aug 26, 2025 •

edited

Loading