-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Stateful variable-location annotations in Disassembler::PrintInstructions() (follow-up to #147460) #152887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stateful variable-location annotations in Disassembler::PrintInstructions() (follow-up to #147460) #152887
Conversation
…DWARFExpressionList This introduces a new API for retrieving DWARF expression metadata associated with variable location entries at a given PC address. It provides the base, end, and expression pointer for downstream consumers such as disassembler annotations. Intended for use in richer instruction annotations in Instruction::Dump().
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Updated comment for GetExpressionEntryAtAddress to directly refer to struct DWARFExpressionEntry Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
updating code style for function GetExpressionEntryAtAddress Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Replace raw base/end with `AddressRange` in `DWARFExpressionEntry` and cleans up helper comments to follow Doxygen convention. Using `AddressRange` makes the intent clearer, avoids duplication of basic `AddressRange` logic usage
Converts `GetExpressionEntryAtAddress` to return `llvm::Expected<DWARFExpressionEntry>` using the updated `DWARFExpressionEntry`. Updates the implementation to compute a single `AddressRange file_range` for each DWARF location interval.
Updated commenting style for struct DWARFExpressionEntry
Updated function `llvm::Expected<DWARFExpressionList::DWARFExpressionEntry> DWARFExpressionList::GetExpressionEntryAtAddress` to use `FindEntryThatContains` instead of `FindEntryIndexThatContains`
Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>
This patch adds explicit checks: - ensure `load_addr >= func_load_addr` to avoid underflow, - compute and verify a temporary delta variable, then verify `delta + m_func_file_addr` does not exceed `addr_t` max to avoid overflow.
…bly output. Right now just checks if DW annotations show up for a basic program and that a variable location is annotated (i.e 'a = DW_OP_reg...').
- Fixed an issue where variable location annotations were not shown if the current instruction address did not exactly match the DWARF base address. Now, annotations are shown as long as the PC is within the valid range. - Improved alignment of annotation comments in Instruction::Dump(). While `FillLastLineToColumn` can sometimes overcompensate due to internal formatting or byte-width mismatches, the overall alignment is now significantly more consistent and readable.
Previously, when a DWARF expression contained any decoding error, the entire variable location annotation was printed with the error, e.g. `c = DW_OP_addr 0x0, <decoding error> 00 00 00`. This was misleading and cluttered the disassembly view. This patch improves the formatting by stripping out the `<decoding error ...>` fragments while preserving the valid portions of the expression, so that partial information like `c = DW_OP_addr 0x0` can still be shown. This allows the rich disassembler to give more meaningful variable annotations, especially in optimized (-O1/-O2) builds where partial DWARF corruption or unsupported expressions may occur.
Handled edge case where the entire DWARF expression is a `<decoding error>`, ensuring no misleading or empty annotations are printed for such variables.
This patch adds API tests to verify that DWARF variable location annotations are correctly displayed in the disassembly output. The tests cover: - Local variables in loops and functions - Multiple stack variables - Control flow edge cases - Different optimization levels (-O0, -O1, -O2) - Ensuring decoding errors are excluded from output
…try API This rebases the `add-disassembler-annotations` work onto the latest `add-dwarfexprentry-api` branch so that the instruction annotation patches sit cleanly atop the new DWARFExpressionEntry struct and helper API. All conflicts have been resolved and the annotation code now integrates with the updated std::optional<AddressRange>-based GetExpressionEntryAtAddress signature.
…w `DWARFExpression::DumpLocationWithOptions` for simplified expression printing This patch introduces a PrintRegisterOnly flag to the DIDumpOptions struct, enabling concise rendering of DWARF expressions for disassembler annotations. Key changes: - Added DumpLocationWithOptions to DWARFExpression for flexible dumping with DIDumpOptions. - Updated DWARFExpression::print and Operation::print to respect PrintRegisterOnly, rendering registers like RDI without DW_OP_ or llvm: prefixes. - Suppressed <decoding error> output when PrintRegisterOnly is true to avoid clutter during register-only disassembly output. These changes are motivated by LLDB’s rich disassembler feature, where annotations should match user-facing register names without DWARF-level detail. Test impact: Some rich-disassembler tests that relied on DW_OP_ for validation were deprecated. Updated tests aligned with the new formatting will be added next.
…n Instruction::Dump
…tate_variable` lambda function
…IDumpOptions instead of having to introduce new function
Resolve conflicts in DWARFExpression files after upstream refactor
Should this file be deleted? |
|
@@ -154,6 +154,10 @@ Status CommandObjectDisassemble::CommandOptions::SetOptionValue( | |||
} | |||
} break; | |||
|
|||
case 'v': //< --rich variable annotations | |||
enable_rich_annotations = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename rich
to variable
in most places?
lldb/source/Core/Disassembler.cpp
Outdated
|
||
// Track live variables across instructions (keyed by stable LLDB user_id_t. 8 | ||
// is a good small-buffer guess. | ||
llvm::SmallDenseMap<lldb::user_id_t, VarState, 8> live_vars; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would just use a DenseMap here, I doubt there are any performance reasons to use a SmallDenseMap here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's fair.
lldb/test/API/functionalities/disassembler-variables/d_original_example
Outdated
Show resolved
Hide resolved
Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>
…tionsDisassembler.
Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the outstanding comments are addressed.
Thanks for all the hard work!
Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>
Thank you for all the feedback! I'm addressing the outstanding comments right now. Really appreciate all the guidance and support throughout the whole process it's been a great learning experience! |
struct VarState { | ||
std::string name; ///< Display name. | ||
std::string last_loc; ///< Last printed location (empty means <undef>). | ||
bool seen_this_inst = false; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's define this as a private class type, or at least outside the function. Given our column limit, we also prefer the comments to come before the member to avoid wrapping.
struct VarState { | |
std::string name; ///< Display name. | |
std::string last_loc; ///< Last printed location (empty means <undef>). | |
bool seen_this_inst = false; | |
}; | |
struct VarState { | |
/// Display name. | |
std::string name; | |
/// Last printed location (empty means <undef>). | |
std::string last_loc; | |
bool seen_this_inst = false; | |
}; |
lldb::ProcessSP no_process; | ||
lldb::ABISP abi_sp = ABI::FindPlugin(no_process, target.GetArchitecture()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nullptr is more obvious:
lldb::ProcessSP no_process; | |
lldb::ABISP abi_sp = ABI::FindPlugin(no_process, target.GetArchitecture()); | |
lldb::ABISP abi_sp = ABI::FindPlugin(nullptr, target.GetArchitecture()); |
// | ||
// The goal is to give users helpful live variable hints alongside the | ||
// disassembled instruction stream, similar to how debug information | ||
// enhances source-level debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it could be its own abstraction, like a small subclass in the disassembler file? Especially if there's state that needs to be shared.
// Reset per-instruction seen flags. | ||
for (auto &kv : live_vars) | ||
kv.second.seen_this_inst = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to remember the key still? Can you just clear the map?
const Address &iaddr = inst.GetAddress(); | ||
if (!module_sp) { | ||
// Everything previously live becomes <undef>. | ||
for (auto I = live_vars.begin(), E = live_vars.end(); I != E;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is using the LLVM coding style.
// No function context: everything dies here. | ||
for (auto I = live_vars.begin(), E = live_vars.end(); I != E;) { | ||
auto Cur = I++; | ||
events.push_back( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
events.push_back( | |
events.emplace_back( |
Block *B = sc.block; ///< Innermost block containing iaddr. | ||
VariableList var_list; | ||
if (B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block *B = sc.block; ///< Innermost block containing iaddr. | |
VariableList var_list; | |
if (B) { | |
VariableList var_list; | |
// Innermost block containing iaddr. | |
if (Block *B = sc.block) { |
if (abi) | ||
opts.PrintRegisterOnly = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (abi) | |
opts.PrintRegisterOnly = true; | |
opts.PrintRegisterOnly = static_cast<bool>(abi_sp); |
Sorry, now that this PR is merged, how should I go about making the suggested changes? |
@UltimateForce21 , the
would you take care of it? |
Were these changes AI generated? |
I believe the failures on the lldb-remote-linux-ubuntu and lldb-remote-linux-win builders are happening because the .s test input in TestVariableAnnotationsDisassembler.py was generated for x86_64 and uses assembly syntax that isn’t accepted by the assemblers on AArch64/other targets (for example, directives like .asciz with # comments). That’s why the test builds fine on x86_64 but fails on those other builders. I’ll follow up with a change that skips this test on unsupported architectures. That way, it will still run where the assembly input is valid (x86_64), but won’t cause failures on platforms where the test input can’t assemble. |
Do I have to create a new PR, to push this change in the tester? |
I was experimenting with some AI to help me build the LLDB test API code, since I’m still getting up to speed with the tester. But I made sure to adapt, review, and test everything before committing. Appreciate everyone's patience as I’m still learning the LLVM development process here. |
Yes, would you? Thank you. |
I can do that now. |
Here is the new pull request: #155942 (new with only the tester change) |
Context
Follow-up to #147460, which added the ability to surface register-resident variable locations.
This PR moves the annotation logic out of
Instruction::Dump()
and intoDisassembler::PrintInstructions()
, and adds lightweight state tracking so we only print changes at range starts and when variables go out of scope.What this does
While iterating the instructions for a function, we maintain a “live variable map” keyed by
lldb::user_id_t
(theVariable
’s ID) to remember each variable’s last emitted location string. For each instruction:name = <location>
once at the start of its DWARF location range, cache it.name = <undef>
and drop it.This produces concise, stateful annotations that highlight variable lifetime transitions without spamming every line.
Why in
PrintInstructions()
?Instruction
stateless and avoids changing theInstruction::Dump()
virtual API.prev → current
) inside the single driver loop.How it works (high-level)
StackFrame::GetInScopeVariableList(/*get_parent=*/true)
.Variable
, queryDWARFExpressionList::GetExpressionEntryAtAddress(func_load_addr, current_pc)
(added in [lldb] Add DWARFExpressionEntry and GetExpressionEntryAtAddress() to … #144238).DumpLocation(..., eDescriptionLevelBrief, abi)
to get a short, ABI-aware location string (e.g.,DW_OP_reg3 RBX → RBX
).name = <location>
and record it.name = <undef>
for any that disappeared.Internally:
DWARFExpressionList
.Example output (x86_64, simplified)
Only transitions are shown: the start of a location, changes, and end-of-lifetime.
Scope & limitations (by design)
DumpLocation
).Implementation notes
Disassembler::PrintInstructions()
.std::unordered_map<lldb::user_id_t, std::string>
as the live map.Instruction
interface.Requested feedback
<undef>
marker.ExecutionContext
).Thanks for reviewing! Happy to adjust behavior/format based on feedback.