Stateful variable-location annotations in Disassembler::PrintInstructions() (follow-up to #147460) #152887

UltimateForce21 · 2025-08-10T02:48:45Z

Context
Follow-up to #147460, which added the ability to surface register-resident variable locations.
This PR moves the annotation logic out of Instruction::Dump() and into Disassembler::PrintInstructions(), and adds lightweight state tracking so we only print changes at range starts and when variables go out of scope.

What this does

While iterating the instructions for a function, we maintain a “live variable map” keyed by lldb::user_id_t (the Variable’s ID) to remember each variable’s last emitted location string. For each instruction:

New (or newly visible) variable → print name = <location> once at the start of its DWARF location range, cache it.
Location changed (e.g., DWARF range switched to a different register/const) → print the updated mapping.
Out of scope (was tracked previously but not found for the current PC) → print name = <undef> and drop it.

This produces concise, stateful annotations that highlight variable lifetime transitions without spamming every line.

Why in `PrintInstructions()`?

Keeps Instruction stateless and avoids changing the Instruction::Dump() virtual API.
Makes it straightforward to diff state across instructions (prev → current) inside the single driver loop.

How it works (high-level)

For the current PC, get in-scope variables via StackFrame::GetInScopeVariableList(/*get_parent=*/true).
For each Variable, query DWARFExpressionList::GetExpressionEntryAtAddress(func_load_addr, current_pc) (added in [lldb] Add DWARFExpressionEntry and GetExpressionEntryAtAddress() to … #144238).
If the entry exists, call DumpLocation(..., eDescriptionLevelBrief, abi) to get a short, ABI-aware location string (e.g., DW_OP_reg3 RBX → RBX).
Compare against the last emitted location in the live map:
- If not present → emit name = <location> and record it.
- If different → emit updated mapping and record it.
After processing current in-scope variables, compute the set difference vs. the previous map and emit name = <undef> for any that disappeared.

Internally:

We respect file↔load address translation already provided by DWARFExpressionList.
We reuse the ABI to map LLVM register numbers to arch register names.

Example output (x86_64, simplified)

->  0x55c6f5f6a140 <+0>:  cmpl   $0x2, %edi                                                         ; argc = RDI, argv = RSI
    0x55c6f5f6a143 <+3>:  jl     0x55c6f5f6a176            ; <+54> at d_original_example.c:6:3
    0x55c6f5f6a145 <+5>:  pushq  %r15
    0x55c6f5f6a147 <+7>:  pushq  %r14
    0x55c6f5f6a149 <+9>:  pushq  %rbx
    0x55c6f5f6a14a <+10>: movq   %rsi, %rbx
    0x55c6f5f6a14d <+13>: movl   %edi, %r14d
    0x55c6f5f6a150 <+16>: movl   $0x1, %r15d                                                        ; argc = R14
    0x55c6f5f6a156 <+22>: nopw   %cs:(%rax,%rax)                                                    ; i = R15, argv = RBX
    0x55c6f5f6a160 <+32>: movq   (%rbx,%r15,8), %rdi
    0x55c6f5f6a164 <+36>: callq  0x55c6f5f6a030            ; symbol stub for: puts
    0x55c6f5f6a169 <+41>: incq   %r15
    0x55c6f5f6a16c <+44>: cmpq   %r15, %r14
    0x55c6f5f6a16f <+47>: jne    0x55c6f5f6a160            ; <+32> at d_original_example.c:5:10
    0x55c6f5f6a171 <+49>: popq   %rbx                                                               ; i = <undef>
    0x55c6f5f6a172 <+50>: popq   %r14                                                               ; argv = RSI
    0x55c6f5f6a174 <+52>: popq   %r15                                                               ; argc = RDI
    0x55c6f5f6a176 <+54>: xorl   %eax, %eax
    0x55c6f5f6a178 <+56>: retq

Only transitions are shown: the start of a location, changes, and end-of-lifetime.

Scope & limitations (by design)

Handles simple locations first (registers, const-in-register cases surfaced by DumpLocation).
Memory/composite locations are out of scope for this PR.
Annotations appear only at range boundaries (start/change/end) to minimize noise.
Output is target-independent; register names come from the target ABI.

Implementation notes

All annotation printing now happens in Disassembler::PrintInstructions().
Uses std::unordered_map<lldb::user_id_t, std::string> as the live map.
No persistent state across calls; the map is rebuilt while walking instruction by instruction.
No changes to the Instruction interface.

Requested feedback

Placement and wording of the <undef> marker.
Whether we should optionally gate this behind a setting (currently always on when disassembling with an ExecutionContext).
Preference for immediate inclusion of tests vs. follow-up patch.

Thanks for reviewing! Happy to adjust behavior/format based on feedback.

…DWARFExpressionList This introduces a new API for retrieving DWARF expression metadata associated with variable location entries at a given PC address. It provides the base, end, and expression pointer for downstream consumers such as disassembler annotations. Intended for use in richer instruction annotations in Instruction::Dump().

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>

Updated comment for GetExpressionEntryAtAddress to directly refer to struct DWARFExpressionEntry Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>

updating code style for function GetExpressionEntryAtAddress Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>

Replace raw base/end with `AddressRange` in `DWARFExpressionEntry` and cleans up helper comments to follow Doxygen convention. Using `AddressRange` makes the intent clearer, avoids duplication of basic `AddressRange` logic usage

Converts `GetExpressionEntryAtAddress` to return `llvm::Expected<DWARFExpressionEntry>` using the updated `DWARFExpressionEntry`. Updates the implementation to compute a single `AddressRange file_range` for each DWARF location interval.

Updated commenting style for struct DWARFExpressionEntry

Updated function `llvm::Expected<DWARFExpressionList::DWARFExpressionEntry> DWARFExpressionList::GetExpressionEntryAtAddress` to use `FindEntryThatContains` instead of `FindEntryIndexThatContains`

… llvm::Expected

Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>

This patch adds explicit checks: - ensure `load_addr >= func_load_addr` to avoid underflow, - compute and verify a temporary delta variable, then verify `delta + m_func_file_addr` does not exceed `addr_t` max to avoid overflow.

…bly output. Right now just checks if DW annotations show up for a basic program and that a variable location is annotated (i.e 'a = DW_OP_reg...').

- Fixed an issue where variable location annotations were not shown if the current instruction address did not exactly match the DWARF base address. Now, annotations are shown as long as the PC is within the valid range. - Improved alignment of annotation comments in Instruction::Dump(). While `FillLastLineToColumn` can sometimes overcompensate due to internal formatting or byte-width mismatches, the overall alignment is now significantly more consistent and readable.

Previously, when a DWARF expression contained any decoding error, the entire variable location annotation was printed with the error, e.g. `c = DW_OP_addr 0x0, <decoding error> 00 00 00`. This was misleading and cluttered the disassembly view. This patch improves the formatting by stripping out the `<decoding error ...>` fragments while preserving the valid portions of the expression, so that partial information like `c = DW_OP_addr 0x0` can still be shown. This allows the rich disassembler to give more meaningful variable annotations, especially in optimized (-O1/-O2) builds where partial DWARF corruption or unsupported expressions may occur.

Handled edge case where the entire DWARF expression is a `<decoding error>`, ensuring no misleading or empty annotations are printed for such variables.

This patch adds API tests to verify that DWARF variable location annotations are correctly displayed in the disassembly output. The tests cover: - Local variables in loops and functions - Multiple stack variables - Control flow edge cases - Different optimization levels (-O0, -O1, -O2) - Ensuring decoding errors are excluded from output

…try API This rebases the `add-disassembler-annotations` work onto the latest `add-dwarfexprentry-api` branch so that the instruction annotation patches sit cleanly atop the new DWARFExpressionEntry struct and helper API. All conflicts have been resolved and the annotation code now integrates with the updated std::optional<AddressRange>-based GetExpressionEntryAtAddress signature.

…w `DWARFExpression::DumpLocationWithOptions` for simplified expression printing This patch introduces a PrintRegisterOnly flag to the DIDumpOptions struct, enabling concise rendering of DWARF expressions for disassembler annotations. Key changes: - Added DumpLocationWithOptions to DWARFExpression for flexible dumping with DIDumpOptions. - Updated DWARFExpression::print and Operation::print to respect PrintRegisterOnly, rendering registers like RDI without DW_OP_ or llvm: prefixes. - Suppressed <decoding error> output when PrintRegisterOnly is true to avoid clutter during register-only disassembly output. These changes are motivated by LLDB’s rich disassembler feature, where annotations should match user-facing register names without DWARF-level detail. Test impact: Some rich-disassembler tests that relied on DW_OP_ for validation were deprecated. Updated tests aligned with the new formatting will be added next.

…n Instruction::Dump

…:Dump

…tate_variable` lambda function

…Dump

…IDumpOptions instead of having to introduce new function

Resolve conflicts in DWARFExpression files after upstream refactor

adrian-prantl · 2025-08-26T19:19:15Z

Should this file be deleted?

adrian-prantl · 2025-08-26T19:19:22Z

[lldb/test/API/functionalities/disassembler-variables/d_original_example](https://github.com/llvm/llvm-project/pull/152887#diff-b7942b849ea60f4e92ce3d85f3eb55811b20ea9507da4380e9c1b77f4336d61d)

adrian-prantl · 2025-08-26T19:20:27Z

lldb/source/Commands/CommandObjectDisassemble.cpp

@@ -154,6 +154,10 @@ Status CommandObjectDisassemble::CommandOptions::SetOptionValue(
    }
  } break;

+  case 'v': //< --rich variable annotations
+    enable_rich_annotations = true;


Can you rename rich to variable in most places?

lldb/source/Core/Disassembler.cpp

adrian-prantl · 2025-08-26T19:22:18Z

lldb/source/Core/Disassembler.cpp

+
+  // Track live variables across instructions (keyed by stable LLDB user_id_t. 8
+  // is a good small-buffer guess.
+  llvm::SmallDenseMap<lldb::user_id_t, VarState, 8> live_vars;


I think I would just use a DenseMap here, I doubt there are any performance reasons to use a SmallDenseMap here, right?

Yes, that's fair.

lldb/test/API/functionalities/disassembler-variables/d_original_example

Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>

…tionsDisassembler.

lldb/source/Commands/CommandObjectDisassemble.cpp

Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>

adrian-prantl

LGTM once the outstanding comments are addressed.
Thanks for all the hard work!

Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>

UltimateForce21 · 2025-08-27T17:00:05Z

LGTM once the outstanding comments are addressed. Thanks for all the hard work!

Thank you for all the feedback! I'm addressing the outstanding comments right now. Really appreciate all the guidance and support throughout the whole process it's been a great learning experience!

JDevlieghere · 2025-08-28T18:35:10Z

lldb/source/Core/Disassembler.cpp

+  struct VarState {
+    std::string name;     ///< Display name.
+    std::string last_loc; ///< Last printed location (empty means <undef>).
+    bool seen_this_inst = false;
+  };


Let's define this as a private class type, or at least outside the function. Given our column limit, we also prefer the comments to come before the member to avoid wrapping.

Suggested change

struct VarState {

std::string name; ///< Display name.

std::string last_loc; ///< Last printed location (empty means <undef>).

bool seen_this_inst = false;

};

struct VarState {

/// Display name.

std::string name;

/// Last printed location (empty means <undef>).

std::string last_loc;

bool seen_this_inst = false;

};

JDevlieghere · 2025-08-28T18:36:49Z

lldb/source/Core/Disassembler.cpp

+    lldb::ProcessSP no_process;
+    lldb::ABISP abi_sp = ABI::FindPlugin(no_process, target.GetArchitecture());


I think nullptr is more obvious:

Suggested change

lldb::ProcessSP no_process;

lldb::ABISP abi_sp = ABI::FindPlugin(no_process, target.GetArchitecture());

lldb::ABISP abi_sp = ABI::FindPlugin(nullptr, target.GetArchitecture());

JDevlieghere · 2025-08-28T18:37:04Z

lldb/source/Core/Disassembler.cpp

+  //
+  // The goal is to give users helpful live variable hints alongside the
+  // disassembled instruction stream, similar to how debug information
+  // enhances source-level debugging.


This feels like it could be its own abstraction, like a small subclass in the disassembler file? Especially if there's state that needs to be shared.

JDevlieghere · 2025-08-28T18:37:57Z

lldb/source/Core/Disassembler.cpp

+    // Reset per-instruction seen flags.
+    for (auto &kv : live_vars)
+      kv.second.seen_this_inst = false;


Do you need to remember the key still? Can you just clear the map?

JDevlieghere · 2025-08-28T18:38:23Z

lldb/source/Core/Disassembler.cpp

+    const Address &iaddr = inst.GetAddress();
+    if (!module_sp) {
+      // Everything previously live becomes <undef>.
+      for (auto I = live_vars.begin(), E = live_vars.end(); I != E;) {


This is using the LLVM coding style.

JDevlieghere · 2025-08-28T18:38:55Z

lldb/source/Core/Disassembler.cpp

+      // No function context: everything dies here.
+      for (auto I = live_vars.begin(), E = live_vars.end(); I != E;) {
+        auto Cur = I++;
+        events.push_back(


Suggested change

events.push_back(

events.emplace_back(

JDevlieghere · 2025-08-28T18:39:12Z

lldb/source/Core/Disassembler.cpp

+    Block *B = sc.block; ///< Innermost block containing iaddr.
+    VariableList var_list;
+    if (B) {


Suggested change

Block *B = sc.block; ///< Innermost block containing iaddr.

VariableList var_list;

if (B) {

VariableList var_list;

// Innermost block containing iaddr.

if (Block *B = sc.block) {

JDevlieghere · 2025-08-28T18:40:01Z

lldb/source/Core/Disassembler.cpp

+    if (abi)
+      opts.PrintRegisterOnly = true;


Suggested change

if (abi)

opts.PrintRegisterOnly = true;

opts.PrintRegisterOnly = static_cast<bool>(abi_sp);

UltimateForce21 · 2025-08-28T20:19:41Z

Sorry, now that this PR is merged, how should I go about making the suggested changes?

vvereschaka · 2025-08-28T21:44:24Z

@UltimateForce21 , the lldb-api::TestVariableAnnotationsDisassembler.py test gets failed on the lldb-remote-linux-ubuntu/win builders:

would you take care of it?

tstellar · 2025-08-28T21:46:30Z

Were these changes AI generated?

UltimateForce21 · 2025-08-28T22:23:08Z

@UltimateForce21 , the lldb-api::TestVariableAnnotationsDisassembler.py test gets failed on the lldb-remote-linux-ubuntu/win builders:

https://lab.llvm.org/buildbot/#/builders/195/builds/13890

https://lab.llvm.org/buildbot/#/builders/197/builds/8515

would you take care of it?

I believe the failures on the lldb-remote-linux-ubuntu and lldb-remote-linux-win builders are happening because the .s test input in TestVariableAnnotationsDisassembler.py was generated for x86_64 and uses assembly syntax that isn’t accepted by the assemblers on AArch64/other targets (for example, directives like .asciz with # comments). That’s why the test builds fine on x86_64 but fails on those other builders.

I’ll follow up with a change that skips this test on unsupported architectures. That way, it will still run where the assembly input is valid (x86_64), but won’t cause failures on platforms where the test input can’t assemble.

UltimateForce21 · 2025-08-28T22:38:02Z

Do I have to create a new PR, to push this change in the tester?
Here is the link to the committed change on my fork: (ad70817)

UltimateForce21 · 2025-08-28T22:43:38Z

Were these changes AI generated?

I was experimenting with some AI to help me build the LLDB test API code, since I’m still getting up to speed with the tester. But I made sure to adapt, review, and test everything before committing.

Appreciate everyone's patience as I’m still learning the LLVM development process here.

vvereschaka · 2025-08-28T22:47:10Z

Do I have to create a new PR, to push this change in the tester?

Yes, would you? Thank you.

UltimateForce21 · 2025-08-28T22:47:54Z

Yes, would you? Thank you.

I can do that now.

UltimateForce21 · 2025-08-28T22:52:10Z

Here is the new pull request: #155942 (new with only the tester change)

UltimateForce21 and others added 30 commits June 11, 2025 00:29

Update lldb/include/lldb/Expression/DWARFExpressionList.h

1db5002

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>

Update lldb/include/lldb/Expression/DWARFExpressionList.h

a26010b

Updated comment for GetExpressionEntryAtAddress to directly refer to struct DWARFExpressionEntry Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>

Update lldb/source/Expression/DWARFExpressionList.cpp

72237b7

updating code style for function GetExpressionEntryAtAddress Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>

Update DWARFExpressionList.h

94e4951

Replace raw base/end with `AddressRange` in `DWARFExpressionEntry` and cleans up helper comments to follow Doxygen convention. Using `AddressRange` makes the intent clearer, avoids duplication of basic `AddressRange` logic usage

Update DWARFExpressionList.cpp

e8142da

Converts `GetExpressionEntryAtAddress` to return `llvm::Expected<DWARFExpressionEntry>` using the updated `DWARFExpressionEntry`. Updates the implementation to compute a single `AddressRange file_range` for each DWARF location interval.

Update DWARFExpressionList.h

7e8741e

Updated commenting style for struct DWARFExpressionEntry

Update DWARFExpressionList.cpp

c4cd77f

Updated function `llvm::Expected<DWARFExpressionList::DWARFExpressionEntry> DWARFExpressionList::GetExpressionEntryAtAddress` to use `FindEntryThatContains` instead of `FindEntryIndexThatContains`

Change GetExpressionEntryAtAddress to return std::optional instead of…

62c02a9

… llvm::Expected

Update DWARFExpressionList.cpp

d015971

Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>

Make file_range optional in DWARFExpressionEntry for always-valid expr

3462165

Annotate Instruction::Dump() with DWARF variable locations

2ed8443

Added Initial Basic API test for rich variable annotation in disassem…

8c6b22d

…bly output. Right now just checks if DW annotations show up for a basic program and that a variable location is annotated (i.e 'a = DW_OP_reg...').

Ignore annotations with only decoding errors

6bbc8aa

Handled edge case where the entire DWARF expression is a `<decoding error>`, ensuring no misleading or empty annotations are printed for such variables.

Add high-level comment explaining rich disassembly annotation logic i…

09c4d04

…n Instruction::Dump

Add comment clarifying annotation column length check in Instruction:…

6e17f77

…:Dump

Refactor variable annotation logic in Instruction::Dump using `anno…

31431c0

…tate_variable` lambda function

Use range-based for loop for variable list iteration in Instruction::…

9c5cb8f

…Dump

Consolidated DumpLocation and DumpLocationWithOptions using default D…

ca8510c

…IDumpOptions instead of having to introduce new function

Use llvm::join to simplify annotation output formatting

ffefe5f

Merge branch 'main' into add-disassembler-annotations

fae745a

Resolve conflicts in DWARFExpression files after upstream refactor

Fix formatting to match LLVM style

dcddf16

More formatting fixes

7bac074

Fix formatting for code and tests

79c0a9e

adrian-prantl reviewed Aug 26, 2025

View reviewed changes

lldb/source/Core/Disassembler.cpp Outdated Show resolved Hide resolved

adrian-prantl reviewed Aug 26, 2025

View reviewed changes

lldb/source/Core/Disassembler.cpp Outdated Show resolved Hide resolved

adrian-prantl reviewed Aug 26, 2025

View reviewed changes

lldb/test/API/functionalities/disassembler-variables/d_original_example Outdated Show resolved Hide resolved

UltimateForce21 and others added 4 commits August 27, 2025 12:18

Update lldb/source/Core/Disassembler.cpp

11d85ea

Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>

Changed all naming rich annotations for variables to variable.

c7d489b

Removed executable that was added by accident for test cases.

a82957c

Changed name of tester for TestRichDisassembler to TestVariableAnnota…

2cda69c

…tionsDisassembler.

adrian-prantl reviewed Aug 27, 2025

View reviewed changes

lldb/source/Commands/CommandObjectDisassemble.cpp Outdated Show resolved Hide resolved

Update lldb/source/Core/Disassembler.cpp

6b24fff

Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>

adrian-prantl approved these changes Aug 27, 2025

View reviewed changes

Update lldb/source/Commands/CommandObjectDisassemble.cpp

cc0aee9

Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>

UltimateForce21 and others added 2 commits August 27, 2025 13:08

Switched live_vars from llvm::SmallDenseMap to llvm::DenseMap.

e1d0f91

Merge branch 'main' into live-state-disassmbler-annotations

c0babd5

adrian-prantl enabled auto-merge (squash) August 28, 2025 17:41

adrian-prantl merged commit 8ec4db5 into llvm:main Aug 28, 2025
9 checks passed

JDevlieghere reviewed Aug 28, 2025

View reviewed changes

		lldb::ProcessSP no_process;
		lldb::ABISP abi_sp = ABI::FindPlugin(no_process, target.GetArchitecture());

	lldb::ProcessSP no_process;
	lldb::ABISP abi_sp = ABI::FindPlugin(no_process, target.GetArchitecture());
	lldb::ABISP abi_sp = ABI::FindPlugin(nullptr, target.GetArchitecture());

	if (abi)
	opts.PrintRegisterOnly = true;
	opts.PrintRegisterOnly = static_cast<bool>(abi_sp);

Stateful variable-location annotations in Disassembler::PrintInstructions() (follow-up to #147460) #152887

Stateful variable-location annotations in Disassembler::PrintInstructions() (follow-up to #147460) #152887

Conversation

UltimateForce21 commented Aug 10, 2025

What this does

Why in PrintInstructions()?

How it works (high-level)

Example output (x86_64, simplified)

Scope & limitations (by design)

Implementation notes

Requested feedback

Uh oh!

adrian-prantl commented Aug 26, 2025

Uh oh!

adrian-prantl commented Aug 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adrian-prantl left a comment

Choose a reason for hiding this comment

Uh oh!

UltimateForce21 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

UltimateForce21 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vvereschaka commented Aug 28, 2025

Uh oh!

tstellar commented Aug 28, 2025

Uh oh!

UltimateForce21 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

UltimateForce21 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

UltimateForce21 commented Aug 28, 2025

Uh oh!

vvereschaka commented Aug 28, 2025

Uh oh!

UltimateForce21 commented Aug 28, 2025

Uh oh!

UltimateForce21 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Why in `PrintInstructions()`?

UltimateForce21 commented Aug 27, 2025 •

edited

Loading

UltimateForce21 commented Aug 28, 2025 •

edited

Loading

UltimateForce21 commented Aug 28, 2025 •

edited

Loading

UltimateForce21 commented Aug 28, 2025 •

edited

Loading

UltimateForce21 commented Aug 28, 2025 •

edited

Loading