-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[lldb/crashlog] Implement speculative binary lookup for target creation #154975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[lldb/crashlog] Implement speculative binary lookup for target creation #154975
Conversation
@llvm/pr-subscribers-lldb Author: Med Ismail Bennani (medismailben) ChangesThis patch changes the way the main executable gets looked up when creating the target in the crashlog script. On macOS, when a process crashes while being built from a user home directory, the path to the binary gets obfuscated in the crashlog to avoid revealing any user personal information like the username of even the user file system. This safety feature breaks the main target creating lookup for crashlogs since the obfuscated path in the crashlog doesn't exists on disk. This patch addresses this issue by speculatively patching the obfuscated binary patch and scanning every executable and subdirectories (in parallel) in said path, and looking for a matching UUID as the one provided in the crash report. This usually enough to find the executable locally. Full diff: https://github.com/llvm/llvm-project/pull/154975.diff 1 Files Affected:
diff --git a/lldb/examples/python/crashlog.py b/lldb/examples/python/crashlog.py
index b466be6a62428..d4cdd084a9830 100755
--- a/lldb/examples/python/crashlog.py
+++ b/lldb/examples/python/crashlog.py
@@ -38,6 +38,7 @@
import plistlib
import re
import shlex
+import stat
import string
import subprocess
import sys
@@ -320,9 +321,9 @@ def show_symbol_progress(self):
or self.path.startswith("/usr/lib/")
)
- def find_matching_slice(self):
+ def find_matching_slice(self, path):
dwarfdump_cmd_output = subprocess.check_output(
- 'dwarfdump --uuid "%s"' % self.path, shell=True
+ 'dwarfdump --uuid "%s"' % path, shell=True
).decode("utf-8")
self_uuid = self.get_uuid()
for line in dwarfdump_cmd_output.splitlines():
@@ -331,7 +332,7 @@ def find_matching_slice(self):
dwarf_uuid_str = match.group(1)
dwarf_uuid = uuid.UUID(dwarf_uuid_str)
if self_uuid == dwarf_uuid:
- self.resolved_path = self.path
+ self.resolved_path = path
self.arch = match.group(2)
return True
if not self.resolved_path:
@@ -340,11 +341,101 @@ def find_matching_slice(self):
print(
(
"error\n error: unable to locate '%s' with UUID %s"
- % (self.path, self.get_normalized_uuid_string())
+ % (path, self.get_normalized_uuid_string())
)
)
return False
+ def patch_binary_search_path(self):
+ home = os.path.expanduser("~")
+
+ patched_search_path = self.path.replace("/Users/USER", home)
+
+ if "*" in patched_search_path:
+ patched_search_path = patched_search_path[
+ : patched_search_path.index("*")
+ ]
+
+ return patched_search_path
+
+ def find_binary_with_speculative_path(self, target_uuid):
+ search_path = self.patch_binary_search_path()
+
+ target_uuid = target_uuid.lower()
+ stop_flag = {"found": False}
+
+ with print_lock:
+ print(
+ "Scanning for '%s' for '%s' ... (ˆC to interrupt)"
+ % (search_path, os.path.basename(self.path))
+ )
+
+ def is_executable(path):
+ try:
+ st = os.stat(path)
+ return stat.S_ISREG(st.st_mode) and (
+ st.st_mode & (stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)
+ )
+ except:
+ return False
+
+ def check_uuid(path, target_uuid):
+ try:
+ output = subprocess.check_output(
+ ["dwarfdump", "--uuid", path],
+ text=True,
+ stderr=subprocess.DEVNULL,
+ )
+ for line in output.splitlines():
+ if target_uuid in line.lower():
+ return path
+ except:
+ return None
+ return None
+
+ def scan_directory_recursive(dirpath, target_uuid, stop_flag):
+ for root, _, files in os.walk(dirpath):
+ if stop_flag.get("found"):
+ break
+ for name in files:
+ path = os.path.join(root, name)
+ if stop_flag.get("found"):
+ break
+ if is_executable(path):
+ result = check_uuid(path, target_uuid)
+ if result:
+ stop_flag["found"] = True
+ return result
+ return None
+
+ try:
+ subdirs = [
+ os.path.join(search_path, d)
+ for d in os.listdir(search_path)
+ if os.path.isdir(os.path.join(search_path, d))
+ ]
+ except PermissionError:
+ subdirs = []
+
+ # Include root itself in case it contains files
+ subdirs.insert(0, search_path)
+
+ with concurrent.futures.ThreadPoolExecutor() as executor:
+ futures = {
+ executor.submit(
+ scan_directory_recursive, d, target_uuid, stop_flag
+ ): d
+ for d in subdirs
+ }
+ for fut in concurrent.futures.as_completed(futures):
+ result = fut.result()
+ if result:
+ print("Found:", result)
+ return result
+
+ print("No match found.")
+ return None
+
def locate_module_and_debug_symbols(self):
# Don't load a module twice...
if self.resolved:
@@ -397,7 +488,11 @@ def locate_module_and_debug_symbols(self):
if not os.path.exists(source_path):
unavailable_source_paths.add(source_path)
if not self.resolved_path and os.path.exists(self.path):
- if not self.find_matching_slice():
+ if not self.find_matching_slice(self.path):
+ return False
+ speculative_path = self.find_binary_with_speculative_path(uuid_str)
+ if not self.resolved_path and os.path.exists(speculative_path):
+ if not self.find_matching_slice(speculative_path):
return False
if not self.resolved_path and not os.path.exists(self.path):
try:
|
e4a9b13
to
97d05a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say I'm an expert but I don't see any glaring issues here. LGTM
97d05a3
to
6c50a83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's appropriate for the crashlog script to recurse through someone's home directory. This is going to result in TCC (Transparency Consent and Control) popups when trying to access certain directories (like Photos, Downloads, etc). That's by design and a good indication that this isn't a good idea. As a user, if I saw a script/program do this without my explicit direction to do so, I would be very unhappy.
Potential alternatives that I think are fine:
- Replacing
/Users/USER
by~
but presumably everything after is stripped to, which is why you're doing this in the first place. - Using mdfind to find the dSYM for the binary, and looking next to it, similar to what LLDB does.
- Relying on
dsymForUUID
to give you the symbol rich executable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jonas makes a great point, I didn't consider the privacy aspect of this change.
I agree with him, I don't think this change should go in as-is.
This patch changes the way the main executable gets looked up when creating the target in the crashlog script. On macOS, when a process crashes while being built from a user home directory, the path to the binary gets obfuscated in the crashlog to avoid revealing any user personal information like the username of even the user file system. This safety feature breaks the main target creating lookup for crashlogs since the obfuscated path in the crashlog doesn't exists on disk. This patch addresses this issue by speculatively patching the obfuscated binary patch and scanning every executable and subdirectories (in parallel) in said path, and looking for a matching UUID as the one provided in the crash report. This usually enough to find the executable locally. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
6c50a83
to
9c737f0
Compare
This patch changes the way the main executable gets looked up when creating the target in the crashlog script.
On macOS, when a process crashes while being built from a user home directory, the path to the binary gets obfuscated in the crashlog to avoid revealing any user personal information like the username of even the user file system.
This safety feature breaks the main target creating lookup for crashlogs since the obfuscated path in the crashlog doesn't exists on disk.
This patch addresses this issue by speculatively patching the obfuscated binary patch and scanning every executable and subdirectories (in parallel) in said path, and looking for a matching UUID as the one provided in the crash report.
This usually enough to find the executable locally.