-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[lldb] Add utility to create Mach-O corefile from YAML desc #153911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[lldb] Add utility to create Mach-O corefile from YAML desc #153911
Conversation
I've wanted a utility to create a corefile for test purposes given a bit of memory and regsters, for a while. I've written a few API tests over the years that needed exactly this capability -- we have several one-off Mach-O corefile creator utility in the API testsuite to do this. But it's a lot of boilerplate when you only want to specify some register contents and memory contents, to create an API test. This adds yaml2mach-core, a tool that should build on any system, takes a yaml description of register values for one or more threads, optionally memory values for one or more memory regions, and can take a list of UUIDs that will be added as LC_NOTE "load binary" metadata to the corefile so binaries can be loaded into virtual address space in a test scenario. The format of the yaml file looks like cpu: armv7m endian: little threads: - regsets: - flavor: gpr registers: [{name: sp, value: 0x2000fe70}, {name: r7, value: 0x2000fe80}, {name: pc, value: 0x0020392c}, {name: lr, value: 0x0020392d}] memory-regions: - addr: 0x2000fe70 UInt32: [ 0x0000002a, 0x20010e58, 0x00203923, 0x00000001, 0x2000fe88, 0x00203911, 0x2000ffdc, 0xfffffff9 ] - addr: 0x203910 UInt8: [ 0xf8, 0xb5, 0x04, 0xaf, 0x06, 0x4c, 0x07, 0x49, 0x74, 0xf0, 0x2e, 0xf8, 0x01, 0xac, 0x74, 0xf0 ] and that's all that is needed to specify a corefile where four register values are specified (the others will be set to 0), and two memory regions will be emitted. The memory can be specified as an array of UInt8, UInt32, or UInt64, I anticipate that some of these corefiles may have stack values constructed manually and it may be simpler for a human to write them in a particular grouping of values. Accepting "endian" is probably a boondoggle that won't ever come to any use, and honestly I don't 100% know what the correct byte layout would be for a big endian Mach-O file any more. In a RISC-V discussion a month ago, it was noted that register byte layout will be little endian even when there is a big endian defined format for RV, so memory would be byteswapped but registers would not. It may have been better not to pretend to support this, but on the other hand it might be neat to be able to generate a big endian test case simply. I needed this utility for an upcoming patch for ARM Cortex-M processors, to create a test for the change. I took the opportunity to remove two of the "trivial mach-o corefile" creator utilities I've written in the past, which also restricted the tests to only run on Darwin systems because I was using the system headers for Mach-O constant values. rdar://110663219
@llvm/pr-subscribers-lldb @llvm/pr-subscribers-backend-risc-v Author: Jason Molenda (jasonmolenda) ChangesI've wanted a utility to create a corefile for test purposes given a bit of memory and regsters, for a while. I've written a few API tests over the years that needed exactly this capability -- we have several one-off Mach-O corefile creator utility in the API testsuite to do this. But it's a lot of boilerplate when you only want to specify some register contents and memory contents, to create an API test. This adds yaml2mach-core, a tool that should build on any system, takes a yaml description of register values for one or more threads, optionally memory values for one or more memory regions, and can take a list of UUIDs that will be added as LC_NOTE "load binary" metadata to the corefile so binaries can be loaded into virtual address space in a test scenario. The format of the yaml file looks like cpu: armv7m
memory-regions:
and that's all that is needed to specify a corefile where four register values are specified (the others will be set to 0), and two memory regions will be emitted. The memory can be specified as an array of UInt8, UInt32, or UInt64, I anticipate that some of these corefiles may have stack values constructed manually and it may be simpler for a human to write them in a particular grouping of values. Accepting "endian" is probably a boondoggle that won't ever come to any use, and honestly I don't 100% know what the correct byte layout would be for a big endian Mach-O file any more. In a RISC-V discussion a month ago, it was noted that register byte layout will be little endian even when there is a big endian defined format for RV, so memory would be byteswapped but registers would not. It may have been better not to pretend to support this, but on the other hand it might be neat to be able to generate a big endian test case simply. I needed this utility for an upcoming patch for ARM Cortex-M processors, to create a test for the change. I took the opportunity to remove two of the "trivial mach-o corefile" creator utilities I've written in the past, which also restricted the tests to only run on Darwin systems because I was using the system headers for Mach-O constant values. rdar://110663219 Patch is 64.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153911.diff 28 Files Affected:
diff --git a/lldb/packages/Python/lldbsuite/test/configuration.py b/lldb/packages/Python/lldbsuite/test/configuration.py
index 5e3810992d172..1a9f25d66843a 100644
--- a/lldb/packages/Python/lldbsuite/test/configuration.py
+++ b/lldb/packages/Python/lldbsuite/test/configuration.py
@@ -64,6 +64,9 @@
# Path to the yaml2obj tool. Not optional.
yaml2obj = None
+# Path to the yaml2macho-core tool. Not optional.
+yaml2macho_core = None
+
# The arch might dictate some specific CFLAGS to be passed to the toolchain to build
# the inferior programs. The global variable cflags_extras provides a hook to do
# just that.
@@ -174,3 +177,10 @@ def get_yaml2obj_path():
"""
if yaml2obj and os.path.lexists(yaml2obj):
return yaml2obj
+
+def get_yaml2macho_core_path():
+ """
+ Get the path to the yaml2macho-core tool.
+ """
+ if yaml2macho_core and os.path.lexists(yaml2macho_core):
+ return yaml2macho_core
diff --git a/lldb/packages/Python/lldbsuite/test/dotest.py b/lldb/packages/Python/lldbsuite/test/dotest.py
index 47a3c2ed2fc9d..89b6807b41075 100644
--- a/lldb/packages/Python/lldbsuite/test/dotest.py
+++ b/lldb/packages/Python/lldbsuite/test/dotest.py
@@ -280,6 +280,7 @@ def parseOptionsAndInitTestdirs():
configuration.llvm_tools_dir = args.llvm_tools_dir
configuration.filecheck = shutil.which("FileCheck", path=args.llvm_tools_dir)
configuration.yaml2obj = shutil.which("yaml2obj", path=args.llvm_tools_dir)
+ configuration.yaml2macho_core = shutil.which("yaml2macho-core", path=args.llvm_tools_dir)
if not configuration.get_filecheck_path():
logging.warning("No valid FileCheck executable; some tests may fail...")
diff --git a/lldb/packages/Python/lldbsuite/test/lldbtest.py b/lldb/packages/Python/lldbsuite/test/lldbtest.py
index 0fc85fcc4d2d6..599b019f0df8c 100644
--- a/lldb/packages/Python/lldbsuite/test/lldbtest.py
+++ b/lldb/packages/Python/lldbsuite/test/lldbtest.py
@@ -1702,6 +1702,21 @@ def yaml2obj(self, yaml_path, obj_path, max_size=None):
command += ["--max-size=%d" % max_size]
self.runBuildCommand(command)
+ def yaml2macho_core(self, yaml_path, obj_path, uuids=None):
+ """
+ Create a Mach-O corefile at the given path from a yaml file.
+
+ Throws subprocess.CalledProcessError if the object could not be created.
+ """
+ yaml2macho_core_bin = configuration.get_yaml2macho_core_path()
+ if not yaml2macho_core_bin:
+ self.assertTrue(False, "No valid yaml2macho-core executable specified")
+ if uuids != None:
+ command = [yaml2macho_core_bin, "-i", yaml_path, "-o", obj_path, "-u", uuids]
+ else:
+ command = [yaml2macho_core_bin, "-i", yaml_path, "-o", obj_path]
+ self.runBuildCommand(command)
+
def cleanup(self, dictionary=None):
"""Platform specific way to do cleanup after build."""
module = builder_module()
diff --git a/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp b/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp
index cb8ba05d461d4..0aff98078120e 100644
--- a/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp
+++ b/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp
@@ -12,6 +12,7 @@
#include "lldb/Core/PluginManager.h"
#include "lldb/Core/Section.h"
#include "lldb/Symbol/Symbol.h"
+#include "lldb/Target/Target.h"
#include "lldb/Utility/LLDBLog.h"
#include "lldb/Utility/Log.h"
#include "llvm/ADT/DenseSet.h"
@@ -233,6 +234,41 @@ void ObjectFileJSON::CreateSections(SectionList &unified_section_list) {
}
}
+bool ObjectFileJSON::SetLoadAddress(Target &target, lldb::addr_t value,
+ bool value_is_offset) {
+ Log *log(GetLog(LLDBLog::DynamicLoader));
+ if (!m_sections_up)
+ return true;
+
+ const bool warn_multiple = true;
+
+ addr_t slide = value;
+ if (!value_is_offset) {
+ addr_t lowest_addr = LLDB_INVALID_ADDRESS;
+ for (const SectionSP §ion_sp : *m_sections_up) {
+ addr_t section_load_addr = section_sp->GetFileAddress();
+ lowest_addr = std::min(lowest_addr, section_load_addr);
+ }
+ if (lowest_addr == LLDB_INVALID_ADDRESS)
+ return false;
+ slide = value - lowest_addr;
+ }
+
+ // Apply slide to each section's file address.
+ for (const SectionSP §ion_sp : *m_sections_up) {
+ addr_t section_load_addr = section_sp->GetFileAddress();
+ if (section_load_addr != LLDB_INVALID_ADDRESS) {
+ LLDB_LOGF(
+ log,
+ "ObjectFileJSON::SetLoadAddress section %s to load addr 0x%" PRIx64,
+ section_sp->GetName().AsCString(), section_load_addr + slide);
+ target.SetSectionLoadAddress(section_sp, section_load_addr + slide,
+ warn_multiple);
+ }
+ }
+ return true;
+}
+
bool ObjectFileJSON::MagicBytesMatch(DataBufferSP data_sp,
lldb::addr_t data_offset,
lldb::addr_t data_length) {
diff --git a/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h b/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h
index b72565f468862..029c8ff188934 100644
--- a/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h
+++ b/lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h
@@ -86,6 +86,9 @@ class ObjectFileJSON : public ObjectFile {
Strata CalculateStrata() override { return eStrataUser; }
+ bool SetLoadAddress(Target &target, lldb::addr_t value,
+ bool value_is_offset) override;
+
static bool MagicBytesMatch(lldb::DataBufferSP data_sp, lldb::addr_t offset,
lldb::addr_t length);
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/Makefile b/lldb/test/API/macosx/arm-corefile-regctx/Makefile
deleted file mode 100644
index e1d0354441cd4..0000000000000
--- a/lldb/test/API/macosx/arm-corefile-regctx/Makefile
+++ /dev/null
@@ -1,6 +0,0 @@
-MAKE_DSYM := NO
-
-CXX_SOURCES := create-arm-corefiles.cpp
-
-include Makefile.rules
-
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/TestArmMachoCorefileRegctx.py b/lldb/test/API/macosx/arm-corefile-regctx/TestArmMachoCorefileRegctx.py
index 6754288a65e1a..a2890cdfeaa44 100644
--- a/lldb/test/API/macosx/arm-corefile-regctx/TestArmMachoCorefileRegctx.py
+++ b/lldb/test/API/macosx/arm-corefile-regctx/TestArmMachoCorefileRegctx.py
@@ -13,20 +13,14 @@
class TestArmMachoCorefileRegctx(TestBase):
NO_DEBUG_INFO_TESTCASE = True
- @skipUnlessDarwin
- def setUp(self):
- TestBase.setUp(self)
- self.build()
- self.create_corefile = self.getBuildArtifact("a.out")
- self.corefile = self.getBuildArtifact("core")
-
def test_armv7_corefile(self):
### Create corefile
- retcode = call(self.create_corefile + " armv7 " + self.corefile, shell=True)
+ corefile = self.getBuildArtifact("core")
+ self.yaml2macho_core("armv7m.yaml", corefile)
target = self.dbg.CreateTarget("")
err = lldb.SBError()
- process = target.LoadCore(self.corefile)
+ process = target.LoadCore(corefile)
self.assertTrue(process.IsValid())
thread = process.GetSelectedThread()
frame = thread.GetSelectedFrame()
@@ -50,11 +44,12 @@ def test_armv7_corefile(self):
def test_arm64_corefile(self):
### Create corefile
- retcode = call(self.create_corefile + " arm64 " + self.corefile, shell=True)
+ corefile = self.getBuildArtifact("core")
+ self.yaml2macho_core("arm64.yaml", corefile)
target = self.dbg.CreateTarget("")
err = lldb.SBError()
- process = target.LoadCore(self.corefile)
+ process = target.LoadCore(corefile)
self.assertTrue(process.IsValid())
thread = process.GetSelectedThread()
frame = thread.GetSelectedFrame()
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/arm64.yaml b/lldb/test/API/macosx/arm-corefile-regctx/arm64.yaml
new file mode 100644
index 0000000000000..4c23b69302a02
--- /dev/null
+++ b/lldb/test/API/macosx/arm-corefile-regctx/arm64.yaml
@@ -0,0 +1,31 @@
+cpu: arm64
+endian: little
+threads:
+ # (lldb) reg read
+ # % pbpaste | grep = | sed 's, ,,g' | awk -F= '{print "{name: " $1 ", value: " $2 "},"}'
+ - regsets:
+ - flavor: gpr
+ registers: [
+ {name: x0, value: 0x0000000000000001}, {name: x1, value: 0x000000016fdff3c0},
+ {name: x2, value: 0x000000016fdff3d0}, {name: x3, value: 0x000000016fdff510},
+ {name: x4, value: 0x0000000000000000}, {name: x5, value: 0x0000000000000000},
+ {name: x6, value: 0x0000000000000000}, {name: x7, value: 0x0000000000000000},
+ {name: x8, value: 0x000000010000d910}, {name: x9, value: 0x0000000000000001},
+ {name: x10, value: 0xe1e88de000000000}, {name: x11, value: 0x0000000000000003},
+ {name: x12, value: 0x0000000000000148}, {name: x13, value: 0x0000000000004000},
+ {name: x14, value: 0x0000000000000008}, {name: x15, value: 0x0000000000000000},
+ {name: x16, value: 0x0000000000000000}, {name: x17, value: 0x0000000100003f5c},
+ {name: x18, value: 0x0000000000000000}, {name: x19, value: 0x0000000100003f5c},
+ {name: x20, value: 0x000000010000c000}, {name: x21, value: 0x000000010000d910},
+ {name: x22, value: 0x000000016fdff250}, {name: x23, value: 0x000000018ce12366},
+ {name: x24, value: 0x000000016fdff1d0}, {name: x25, value: 0x0000000000000001},
+ {name: x26, value: 0x0000000000000000}, {name: x27, value: 0x0000000000000000},
+ {name: x28, value: 0x0000000000000000}, {name: fp, value: 0x000000016fdff3a0},
+ {name: lr, value: 0x000000018cd97f28}, {name: sp, value: 0x000000016fdff140},
+ {name: pc, value: 0x0000000100003f5c}, {name: cpsr, value: 0x80001000}
+ ]
+ - flavor: exc
+ registers: [ {name: far, value: 0x0000000100003f5c},
+ {name: esr, value: 0xf2000000},
+ {name: exception, value: 0x00000000}
+ ]
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/armv7m.yaml b/lldb/test/API/macosx/arm-corefile-regctx/armv7m.yaml
new file mode 100644
index 0000000000000..1351056ed0999
--- /dev/null
+++ b/lldb/test/API/macosx/arm-corefile-regctx/armv7m.yaml
@@ -0,0 +1,37 @@
+cpu: armv7m
+endian: little
+threads:
+ # (lldb) reg read
+ # % pbpaste | grep = | sed 's, ,,g' | awk -F= '{print "{name: " $1 ", value: " $2 "},"}'
+ - regsets:
+ - flavor: gpr
+ registers: [
+ {name: r0, value: 0x00010000}, {name: r1, value: 0x00020000},
+ {name: r2, value: 0x00030000}, {name: r3, value: 0x00040000},
+ {name: r4, value: 0x00050000}, {name: r5, value: 0x00060000},
+ {name: r6, value: 0x00070000}, {name: r7, value: 0x00080000},
+ {name: r8, value: 0x00090000}, {name: r9, value: 0x000a0000},
+ {name: r10, value: 0x000b0000}, {name: r11, value: 0x000c0000},
+ {name: r12, value: 0x000d0000}, {name: sp, value: 0x000e0000},
+ {name: lr, value: 0x000f0000}, {name: pc, value: 0x00100000},
+ {name: cpsr, value: 0x00110000}
+ ]
+ - flavor: exc
+ registers: [ {name: far, value: 0x00003f5c},
+ {name: esr, value: 0xf2000000},
+ {name: exception, value: 0x00000000}
+ ]
+
+memory-regions:
+ # $sp is 0x000e0000, have bytes surrounding that address
+ - addr: 0x000dffe0
+ UInt8: [
+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
+ 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11,
+ 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a,
+ 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23,
+ 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c,
+ 0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35,
+ 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e,
+ 0x3f
+ ]
diff --git a/lldb/test/API/macosx/arm-corefile-regctx/create-arm-corefiles.cpp b/lldb/test/API/macosx/arm-corefile-regctx/create-arm-corefiles.cpp
deleted file mode 100644
index db39f12ecfb7e..0000000000000
--- a/lldb/test/API/macosx/arm-corefile-regctx/create-arm-corefiles.cpp
+++ /dev/null
@@ -1,266 +0,0 @@
-#include <mach-o/loader.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string>
-#include <vector>
-
-
-// Normally these are picked up by including <mach/thread_status.h>
-// but that does a compile time check for the build host arch and
-// only defines the ARM register context constants when building on
-// an arm system. We're creating fake corefiles, and might be
-// creating them on an intel system.
-#ifndef ARM_THREAD_STATE
-#define ARM_THREAD_STATE 1
-#endif
-#ifndef ARM_THREAD_STATE_COUNT
-#define ARM_THREAD_STATE_COUNT 17
-#endif
-#ifndef ARM_EXCEPTION_STATE
-#define ARM_EXCEPTION_STATE 3
-#endif
-#ifndef ARM_EXCEPTION_STATE_COUNT
-#define ARM_EXCEPTION_STATE_COUNT 3
-#endif
-#ifndef ARM_THREAD_STATE64
-#define ARM_THREAD_STATE64 6
-#endif
-#ifndef ARM_THREAD_STATE64_COUNT
-#define ARM_THREAD_STATE64_COUNT 68
-#endif
-#ifndef ARM_EXCEPTION_STATE64
-#define ARM_EXCEPTION_STATE64 7
-#endif
-#ifndef ARM_EXCEPTION_STATE64_COUNT
-#define ARM_EXCEPTION_STATE64_COUNT 4
-#endif
-
-union uint32_buf {
- uint8_t bytebuf[4];
- uint32_t val;
-};
-
-union uint64_buf {
- uint8_t bytebuf[8];
- uint64_t val;
-};
-
-void add_uint64(std::vector<uint8_t> &buf, uint64_t val) {
- uint64_buf conv;
- conv.val = val;
- for (int i = 0; i < 8; i++)
- buf.push_back(conv.bytebuf[i]);
-}
-
-void add_uint32(std::vector<uint8_t> &buf, uint32_t val) {
- uint32_buf conv;
- conv.val = val;
- for (int i = 0; i < 4; i++)
- buf.push_back(conv.bytebuf[i]);
-}
-
-std::vector<uint8_t> armv7_lc_thread_load_command() {
- std::vector<uint8_t> data;
- add_uint32(data, LC_THREAD); // thread_command.cmd
- add_uint32(data, 104); // thread_command.cmdsize
- add_uint32(data, ARM_THREAD_STATE); // thread_command.flavor
- add_uint32(data, ARM_THREAD_STATE_COUNT); // thread_command.count
- add_uint32(data, 0x00010000); // r0
- add_uint32(data, 0x00020000); // r1
- add_uint32(data, 0x00030000); // r2
- add_uint32(data, 0x00040000); // r3
- add_uint32(data, 0x00050000); // r4
- add_uint32(data, 0x00060000); // r5
- add_uint32(data, 0x00070000); // r6
- add_uint32(data, 0x00080000); // r7
- add_uint32(data, 0x00090000); // r8
- add_uint32(data, 0x000a0000); // r9
- add_uint32(data, 0x000b0000); // r10
- add_uint32(data, 0x000c0000); // r11
- add_uint32(data, 0x000d0000); // r12
- add_uint32(data, 0x000e0000); // sp
- add_uint32(data, 0x000f0000); // lr
- add_uint32(data, 0x00100000); // pc
- add_uint32(data, 0x00110000); // cpsr
-
- add_uint32(data, ARM_EXCEPTION_STATE); // thread_command.flavor
- add_uint32(data, ARM_EXCEPTION_STATE_COUNT); // thread_command.count
- add_uint32(data, 0x00003f5c); // far
- add_uint32(data, 0xf2000000); // esr
- add_uint32(data, 0x00000000); // exception
-
- return data;
-}
-
-std::vector<uint8_t> arm64_lc_thread_load_command() {
- std::vector<uint8_t> data;
- add_uint32(data, LC_THREAD); // thread_command.cmd
- add_uint32(data, 312); // thread_command.cmdsize
- add_uint32(data, ARM_THREAD_STATE64); // thread_command.flavor
- add_uint32(data, ARM_THREAD_STATE64_COUNT); // thread_command.count
- add_uint64(data, 0x0000000000000001); // x0
- add_uint64(data, 0x000000016fdff3c0); // x1
- add_uint64(data, 0x000000016fdff3d0); // x2
- add_uint64(data, 0x000000016fdff510); // x3
- add_uint64(data, 0x0000000000000000); // x4
- add_uint64(data, 0x0000000000000000); // x5
- add_uint64(data, 0x0000000000000000); // x6
- add_uint64(data, 0x0000000000000000); // x7
- add_uint64(data, 0x000000010000d910); // x8
- add_uint64(data, 0x0000000000000001); // x9
- add_uint64(data, 0xe1e88de000000000); // x10
- add_uint64(data, 0x0000000000000003); // x11
- add_uint64(data, 0x0000000000000148); // x12
- add_uint64(data, 0x0000000000004000); // x13
- add_uint64(data, 0x0000000000000008); // x14
- add_uint64(data, 0x0000000000000000); // x15
- add_uint64(data, 0x0000000000000000); // x16
- add_uint64(data, 0x0000000100003f5c); // x17
- add_uint64(data, 0x0000000000000000); // x18
- add_uint64(data, 0x0000000100003f5c); // x19
- add_uint64(data, 0x000000010000c000); // x20
- add_uint64(data, 0x000000010000d910); // x21
- add_uint64(data, 0x000000016fdff250); // x22
- add_uint64(data, 0x000000018ce12366); // x23
- add_uint64(data, 0x000000016fdff1d0); // x24
- add_uint64(data, 0x0000000000000001); // x25
- add_uint64(data, 0x0000000000000000); // x26
- add_uint64(data, 0x0000000000000000); // x27
- add_uint64(data, 0x0000000000000000); // x28
- add_uint64(data, 0x000000016fdff3a0); // fp
- add_uint64(data, 0x000000018cd97f28); // lr
- add_uint64(data, 0x000000016fdff140); // sp
- add_uint64(data, 0x0000000100003f5c); // pc
- add_uint32(data, 0x80001000); // cpsr
-
- add_uint32(data, 0x00000000); // padding
-
- add_uint32(data, ARM_EXCEPTION_STATE64); // thread_command.flavor
- add_uint32(data, ARM_EXCEPTION_STATE64_COUNT); // thread_command.count
- add_uint64(data, 0x0000000100003f5c); // far
- add_uint32(data, 0xf2000000); // esr
- add_uint32(data, 0x00000000); // exception
-
- return data;
-}
-
-std::vector<uint8_t> lc_segment(uint32_t fileoff,
- uint32_t lc_segment_data_size) {
- std::vector<uint8_t> data;
- // 0x000e0000 is the value of $sp in the armv7 LC_THREAD
- uint32_t start_vmaddr = 0x000e0000 - (lc_segment_data_size / 2);
- add_uint32(data, LC_SEGMENT); // segment_command.cmd
- add_uint32(data, sizeof(struct segment_command)); // segment_command.cmdsize
- for (int i = 0; i < 16; i++)
- data.push_back(0); // segment_command.segname[16]
- add_uint32(data, start_vmaddr); // segment_command.vmaddr
- add_uint32(data, lc_segment_data_size); // segment_command.vmsize
- add_uint32(data, fileoff); // segment_command.fileoff
- add_uint32(data, lc_segment_data_size); // segment_command.filesize
- add_uint32(data, 3); // segment_command.maxprot
- add_uint32(data, 3); // segment_command.initprot
- add_uint32(data, 0); // segment_command.nsects
- add_uint32(data, 0); // segment_command.flags
-
- return data;
-}
-
-enum arch { unspecified, armv7, arm64 };
-
-int main(int argc, char **argv) {
- if (argc != 3) {
- fprintf(stderr,
- "usage: create-arm-corefiles [armv7|arm64] <output-core-name>\n");
- exit(1);
- }
-
- arch arch = unspecified;
-
- if (strcmp(argv[1], "armv7") == 0)
- arch = armv7;
- else if (strcmp(argv[1], "arm64") == 0)
- arch = arm64;
- else {
- fprintf(stderr, "unrecognized architecture %s\n", argv[1]);
- exit(1);
- }
-
- // An array of load commands (in the form of byte arrays)
- std::vector<std::vector<uint8_t>> load_commands;
-
- // An array of corefile contents (page data, lc_note data, etc)
- std::vector<uint8_t> payload;
-
- // First add all the load commands / payload so we can figure out how large
- // the load commands will actually be.
- if (arch == armv7) {
- load_commands.push_back(armv7_lc_thread_load_command());
- load_commands.push_back(lc_segment(0, 0));
- } else if (arch == arm64) {
- load_commands.push_back(arm64_lc_thread_load_command());
- }
-
- int size_of_load_commands = 0;
- for (const auto &lc : load_commands)
- size_of_load_commands += lc.size();
-
- int header_and_load_cmd_room =
- sizeof(struct mach_header_64) + size_of_load_commands;
-
- // Erase the load commands / payload now that we know how much space...
[truncated]
|
✅ With the latest revision this PR passed the Python code formatter. |
One thing I wasn't thrilled about with llvm's yaml MappingTraits parser was that I need to define register values like
instead of a more natural style of |
The Linux PR pre-merge testing is failing because lldb/tool/yaml2macho-core is not being built. I think I need to add a dependency maybe in test/CMakeLists.txt? I was doing all of my development with simply |
When a processor faults/is interrupted/gets an exception, it will stop running code and jump to an exception catcher routine. Most processors will store the pc that was executing in a system register, and the catcher functions have special instructions to retrieve that & possibly other registers. It may then save those values to stack, and the author can add .cfi directives to tell lldb's unwinder where to find those saved values. ARM Cortex-M (microcontroller) processors have a simpler mechanism where a fixed set of registers are saved to the stack on an exception, and a unique value is put in the link register to indicate to the caller that this has taken place. No special handling needs to be written into the exception catcher, unless it wants to inspect these preserved values. And it is possible for a general stack walker to walk the stack with no special knowledge about what the catch function does. This patch adds an Architecture plugin method to allow an Architecture to override/augment the UnwindPlan that lldb would use for a stack frame, given the contents of the return address register. It resembles a feature where the LanguageRuntime can replace/augment the unwind plan for a function, but it is doing it at offset by one level. The LanguageRuntime is looking at the local register context and/or symbol name to decide if it will override the unwind rules. For the Cortex-M exception unwinds, we need to modify THIS frame's unwind plan if the CALLER's LR had a specific value. RegisterContextUnwind has to retrieve the caller's LR value before it has completely decided on the UnwindPlan it will use for THIS stack frame. This does mean that we will need one additional read of stack memory than we currently use when unwinding. The unwinder walks the stack lazily, as stack frames are requested, and so now if you ask for 2 stack frames, we will read enough stack to walk 2 frames, plus we will read one extra word of memory, the spilled RA value from the stack (see RegisterContextUnwind::AdoptArchitectureUnwindPlan()). In practice, with 512-byte memory cache reads, this is unlikely to be a problem, but I'm wondering if I should add an Architecture method of "does this Architecture implement `GetArchitectureUnwindPlan`" method -- and only do the memory read if it does. So the performance impact would be limited to armv7/Cortex-M debug sessions. This PR includes a test with a yaml corefile description and a JSON ObjectFile, incorporating all of the necessary stack memory and symbol names from a real debug session I worked on. The architectural default unwind plans are used for all stack frames except the 0th because there's no instructions for the functions, and no unwind info. I may need to add an encoding of unwind fules to ObjectFileJSON in the future as we create more test cases like this. This PR depends on the yaml2macho-core utility from llvm#153911 rdar://110663219
I think @labath would point out that I'm doing an end-run around making a sufficient Mock Process capability, with memory and threads and symbols, to write unit tests. @medismailben would point out that we could write a Scripted Process python script that would ingest this same information and vend a Process, just as well as using the corefile container for the information. For that matter, a little gdb remote serial protocol stub written in python could present this same information as if it were a live process with threads, registers, and memory. Because I already had several mach-o corefile creator tools (and needed a new one each time I needed to test another part of the mach-o corefile reader part of lldb), it seemed most natural to go that route, to me. The most important part for me is the simplicity of taking a real world debug problem situation, live or corefile, which may involve giant binaries/corefile and cannot be used in a test for size or confidentiality reasons, but we can extract the core bits of registers and memory that are sufficient to show the issue being fixed. We can't test issues dealing with debug info with this mechanism -- say something specific to firmware debugging that can't be replicated in a userland process -- but for a lot of memory-and-stack-and-register type bugs, I think this could be a handy tool. |
new yaml2macho-core tool. Thanks to Felipe for the guidance.
uint32_t et al, for some reason I thought those were built-in with C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really cool utility! I'm no expert, but I made a few suggestions inline. Thanks for sharing. :)
|
||
enum Endian { Big = 0, Little = 1 }; | ||
|
||
enum MemoryType { UInt8 = 0, UInt32, UInt64 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: MemoryType
-> WordSize
? MemoryType
seems a little vague.
I'm not convinced that it was emitting big-endian Mach-O files correctly, and until this is actually needed, there's no point in carrying around dubious code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is a simple tool, but it seems like adding a bit of organization could go a long way. Essentially, the tool consists of 3 parts:
- A reader that takes YAML and creates an in-memory/intermediate representation (CoreSpec).
- A writer that takes a CoreSpec and emit a binary.
- The glue that holds (1) and (2) together as well as command-line parsing and I/O.
If it were up to me, that's how I would structure this tool. I think that will make it a lot easier to understand and extend in the future.
main.cpp | ||
yaml2corespec.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: filename suggestions.
main.cpp | |
yaml2corespec.cpp | |
yaml2macho.cpp | |
CoreSpec.cpp |
std::vector<uint8_t> bytes; | ||
std::vector<uint32_t> words; | ||
std::vector<uint64_t> doublewords; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they're mutually exclusive, you could do:
std::vector<uint8_t> bytes; | |
std::vector<uint32_t> words; | |
std::vector<uint64_t> doublewords; | |
using Bytes = std::vector<uint8_t>; | |
using Words = std::vector<uint32_t>; | |
using Doublewords = std::vector<uint64_t>; | |
std::variant<Bytes, Words, Doublewords> data; |
but this might complicate the YAML traits. We do this for protocol messages in DAP and MCP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I wasn't super thrilled with having these three, but wanted to maintain the input formatting for endian switching (which I then later abandoned lol). At one point it was an anonymous union. I ended up bailing on all of that and just having three members, only one of which is active; they're not accessed directly in many places.
//===----------------------------------------------------------------------===// | ||
|
||
#include "lldb/Utility/UUID.h" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no newline between header includes, the order is handled by clang-format.
lldb/tools/yaml2macho-core/main.cpp
Outdated
int main(int argc, char **argv) { | ||
|
||
const char *const short_opts = "i:o:u:h"; | ||
const option long_opts[] = {{"input", required_argument, nullptr, 'i'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use cl::opt
(https://llvm.org/docs/CommandLine.html). Even if the current implementation is slightly simpler, it makes it harder to extend in the future. It's used by every llvm tool (that doesn't use tablegen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I took a stab at this but it's a little confusing how it works, I tried looking at how lldb-test uses this but it's also not very straightforward. Maybe i'll let that sit for now, getopt is very simlple...
|
||
#include "llvm/BinaryFormat/MachO.h" | ||
|
||
void create_lc_note_binary_load_cmd(const CoreSpec &spec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprised this lives in its own file? Why not merge this with utility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a tiny one method file today, but I can see me adding additional LC_NOTEs in the future. So I wanted the "thread writer" file which emits LC_THREAD load commands, the "memory writer" file which emits LC_SEGMENTs and the blocks of memory, and the "LC_NOTE writer" which emits LC_NOTE load commands and bytes of the note command payloads. It makes sense in my head for these three to be in their own files.
#include "Utility.h" | ||
#include "CoreSpec.h" | ||
|
||
void add_uint64(const CoreSpec &spec, std::vector<uint8_t> &buf, uint64_t val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes a CoreSpec
but doesn't actually use it? Same for add_uint32
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally you could specify the Endianness in the YAML and I needed to pass that to the add_uint methods to fix byte ordering in the output file. I eventually lost confidence that I was actually creating correctly-formatted Big Endian mach-o files and I removed that feature, forgot to clean up and remove this argument.
Thanks for all the comments, addressing them now. Yeah before I wrote this I didn't have a clear idea of what it would look like when finished (for some reason, it seems obvious now, but in the beginning there were some poor choices made and fixed along the way). You could imagine someone making an ELF corefile output capability, for instance. The YAML files I'm using as input is just a way of specifying an architecture, some registers for threads, and some memory. I don't know if I want to structure it more generally yet, with subdirectories, or whatever, for the YAML to intermediate representation and for the Mach-O corefile writing. If anyone does want to restructure it for a additional input/output methods, I think it will be easy to restructure it at that point. It may end up never growing beyond this simple set of features (I know, unlikely) |
✅ With the latest revision this PR passed the C/C++ code formatter. |
I think it's a little easier to follow the register context writing methods when it's more explicit what is being written.
And the ability to specify the number of bits used in addressing on this cpu.
I've wanted a utility to create a corefile for test purposes given a bit of memory and regsters, for a while. I've written a few API tests over the years that needed exactly this capability -- we have several one-off Mach-O corefile creator utility in the API testsuite to do this. But it's a lot of boilerplate when you only want to specify some register contents and memory contents, to create an API test.
This adds yaml2mach-core, a tool that should build on any system, takes a yaml description of register values for one or more threads, optionally memory values for one or more memory regions, and can take a list of UUIDs that will be added as LC_NOTE "load binary" metadata to the corefile so binaries can be loaded into virtual address space in a test scenario.
The format of the yaml file looks like
and that's all that is needed to specify a corefile where four register values are specified (the others will be set to 0), and two memory regions will be emitted.
The memory can be specified as an array of UInt8, UInt32, or UInt64, I anticipate that some of these corefiles may have stack values constructed manually and it may be simpler for a human to write them in a particular grouping of values.
I needed this utility for an upcoming patch for ARM Cortex-M processors, to create a test for the change. I took the opportunity to remove two of the "trivial mach-o corefile" creator utilities I've written in the past, which also restricted the tests to only run on Darwin systems because I was using the system headers for Mach-O constant values.
rdar://110663219