Skip to content

Conversation

sjoerdmeijer
Copy link
Collaborator

Subject says it all: implement the loop iterator decrement and jump function functions, and reserve X19 for the loop counter.

@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2025

@llvm/pr-subscribers-tools-llvm-exegesis

Author: Sjoerd Meijer (sjoerdmeijer)

Changes

Subject says it all: implement the loop iterator decrement and jump function functions, and reserve X19 for the loop counter.


Full diff: https://github.com/llvm/llvm-project/pull/154751.diff

2 Files Affected:

  • (added) llvm/test/tools/llvm-exegesis/AArch64/loop-register.s (+17)
  • (modified) llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp (+31)
diff --git a/llvm/test/tools/llvm-exegesis/AArch64/loop-register.s b/llvm/test/tools/llvm-exegesis/AArch64/loop-register.s
new file mode 100644
index 0000000000000..2e67937ad0ef6
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/AArch64/loop-register.s
@@ -0,0 +1,17 @@
+REQUIRES: aarch64-registered-target, asserts
+
+RUN: llvm-exegesis -mcpu=neoverse-v2 --use-dummy-perf-counters --mode=latency --debug-only=print-gen-assembly --opcode-name=ADDVv4i16v -repetition-mode=loop 2>&1 | FileCheck %s
+
+CHECK:        0:  {{.*}}  str     x19, [sp, #-16]!
+CHECK-NEXT:   4:  {{.*}}  movi    d[[REG:[0-9]+]], #0000000000000000
+CHECK-NEXT:   8:  {{.*}}  mov     x19, #10000
+CHECK-NEXT:   c:  {{.*}}  nop
+CHECK-NEXT:   10: {{.*}}  nop
+CHECK-NEXT:   14: {{.*}}  nop
+CHECK-NEXT:   18: {{.*}}  nop
+CHECK-NEXT:   1c: {{.*}}  nop
+CHECK-NEXT:   20: {{.*}}  addv    h[[REG]], v[[REG]].4h
+CHECK-NEXT:   24: {{.*}}  subs    x19, x19, #1
+CHECK-NEXT:   28: {{.*}}  cbnz    x19, #-8
+CHECK-NEXT:   2c: {{.*}}  ldr     x19, [sp], #16
+CHECK-NEXT:   30: {{.*}}  ret
diff --git a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
index 3a0021e3c132d..d59dd1688dfa4 100644
--- a/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
+++ b/llvm/tools/llvm-exegesis/lib/AArch64/Target.cpp
@@ -28,6 +28,8 @@
 #define GET_AVAILABLE_OPCODE_CHECKER
 #include "AArch64GenInstrInfo.inc"
 
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+
 namespace llvm {
 namespace exegesis {
 
@@ -109,6 +111,10 @@ static MCInst loadFPImmediate(MCRegister Reg, unsigned RegBitWidth,
 
 namespace {
 
+// Use X19 as the loop counter register since it's a callee-saved register
+// that's available for temporary use.
+constexpr const MCPhysReg kDefaultLoopCounterReg = AArch64::X19;
+
 class ExegesisAArch64Target : public ExegesisTarget {
 public:
   ExegesisAArch64Target()
@@ -141,6 +147,31 @@ class ExegesisAArch64Target : public ExegesisTarget {
     errs() << "setRegTo is not implemented, results will be unreliable\n";
     return {};
   }
+  MCRegister getDefaultLoopCounterRegister(const Triple &) const override {
+    return kDefaultLoopCounterReg;
+  }
+
+  void decrementLoopCounterAndJump(
+      MachineBasicBlock &MBB, MachineBasicBlock &TargetMBB,
+      const MCInstrInfo &MII, MCRegister LoopRegister) const override {
+     // subs LoopRegister, LoopRegister, #1
+    BuildMI(&MBB, DebugLoc(), MII.get(AArch64::SUBSXri))
+        .addDef(LoopRegister)
+        .addUse(LoopRegister)
+        .addImm(1)   // Subtract 1
+        .addImm(0);  // No shift amount
+    // cbnz LoopRegister, TargetMBB
+    BuildMI(&MBB, DebugLoc(), MII.get(AArch64::CBNZX))
+        .addUse(LoopRegister)
+        .addMBB(&TargetMBB);
+  }
+
+
+  // Registers that should not be selected for use in snippets.
+  const MCPhysReg UnavailableRegisters[1] = {kDefaultLoopCounterReg};
+  ArrayRef<MCPhysReg> getUnavailableRegisters() const override {
+    return UnavailableRegisters;
+  }
 
   bool matchesArch(Triple::ArchType Arch) const override {
     return Arch == Triple::aarch64 || Arch == Triple::aarch64_be;

Copy link

github-actions bot commented Aug 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM module one nit and code formatting fixes.

@@ -28,6 +28,8 @@
#define GET_AVAILABLE_OPCODE_CHECKER
#include "AArch64GenInstrInfo.inc"

#include "llvm/CodeGen/MachineInstrBuilder.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This can probably go with the group of includes at the top of the file?

@sjoerdmeijer
Copy link
Collaborator Author

sjoerdmeijer commented Aug 21, 2025

Thanks for the review @boomanaiden154 ! I will fix that before merging this.

One semi-related question to this: do you know if --min-instructions is supposed to work with -repetition-mode=loop? I would like to increase the loop body, but --min-instructions only seem to have an effect when in duplicate mode. Option middle-half-loop seem to achieve what I want, it gives me more accurate numbers for the couple of examples I've tried, but was wondering if I could control this sort of behaviour myself with min-instructions.

@boomanaiden154
Copy link
Contributor

One semi-related question to this: do you know if --min-instructions is supposed to work with -repetition-mode=loop? I would like to increase the loop body, but --min-instructions only seem to have an effect when in duplicate mode. Option middle-half-loop seem to achieve what I want, it gives me more accurate numbers for the couple of examples I've tried, but was wondering if I could control this sort of behaviour myself with min-instructions.

I haven't touched the code in a while, but I'm pretty sure --min-instructions in loop mode changes the number of loop iterations rather than the number of instructions in the loop. middle-half-loop will do two runs, one with double the number of loop iterations of the other and then subtract them to subtract out setup costs.

I believe you're looking for the --loop-body-size flag.

@sjoerdmeijer
Copy link
Collaborator Author

I believe you're looking for the --loop-body-size flag.

Oh wow, I have somehow missed that. Yes, that looks like it. Thanks for the info!

.addImm(1) // Subtract 1
.addImm(0); // No shift amount
// cbnz LoopRegister, TargetMBB
BuildMI(&MBB, DebugLoc(), MII.get(AArch64::CBNZX))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be either SUB+CBNZ or SUBS+Bcc? I don't know if there is a lot of difference between the two if LoopRegister needs to get updated. (There might be a chance of fusing the SUBS+Bcc, I'm not sure when that does and doesn't happen).

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my comment about which inst it can use this LGTM.

Subject says it all: implement the loop iterator decrement and jump
function functions, and reserve X19 for the loop counter.
@sjoerdmeijer sjoerdmeijer merged commit 8e4d2b5 into llvm:main Aug 26, 2025
9 checks passed
@sjoerdmeijer sjoerdmeijer deleted the exegesis-loop-mode branch August 26, 2025 13:07
sjoerdmeijer added a commit that referenced this pull request Aug 26, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants