Skip to content

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Aug 28, 2025

Use a template function for the implementation, and use the macro
to define a constant function pointer with the expected name. Not
sure if there's a cleaner way to do this. This worked out to less
code using variadic templates to forward the arguments, but it added
a noticable ~10 seconds to compilation time on this file.

This will help avoid another copy-paste version of this function
in a future change.

Copy link
Contributor Author

arsenm commented Aug 28, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Use a template function for the implementation, and use the macro
to define a constant function pointer with the expected name. Not
sure if there's a cleaner way to do this. This worked out to less
code using variadic templates to forward the arguments, but it added
a noticable ~10 seconds to compilation time on this file.

This will help avoid another copy-paste version of this function
in a future change.


Full diff: https://github.com/llvm/llvm-project/pull/155755.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+15-10)
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 6a2beeed41dfd..8651ddc89dce2 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -146,17 +146,22 @@ static DecodeStatus decodeDpp8FI(MCInst &Inst, unsigned Val, uint64_t Addr,
     return addOperand(Inst, DAsm->DecoderName(Imm));                           \
   }
 
-// Decoder for registers, decode directly using RegClassID. Imm(8-bit) is
-// number of register. Used by VGPR only and AGPR only operands.
+// Decoder for registers, decode directly using RegClassID. Imm(8-bit) is number
+// of register. Used by VGPR only and AGPR only operands.
+template <unsigned RegClassID>
+static DecodeStatus decodeRegisterClassImpl(MCInst &Inst, unsigned Imm,
+                                            uint64_t /*Addr*/,
+                                            const MCDisassembler *Decoder) {
+  assert(Imm < (1 << 8) && "8-bit encoding");
+  auto DAsm = static_cast<const AMDGPUDisassembler *>(Decoder);
+  return addOperand(Inst, DAsm->createRegOperand(RegClassID, Imm));
+}
+
+using RegClassDecoder = decltype(&decodeRegisterClassImpl<0>);
+
 #define DECODE_OPERAND_REG_8(RegClass)                                         \
-  static DecodeStatus Decode##RegClass##RegisterClass(                         \
-      MCInst &Inst, unsigned Imm, uint64_t /*Addr*/,                           \
-      const MCDisassembler *Decoder) {                                         \
-    assert(Imm < (1 << 8) && "8-bit encoding");                                \
-    auto DAsm = static_cast<const AMDGPUDisassembler *>(Decoder);              \
-    return addOperand(                                                         \
-        Inst, DAsm->createRegOperand(AMDGPU::RegClass##RegClassID, Imm));      \
-  }
+  static const constexpr RegClassDecoder Decode##RegClass##RegisterClass =     \
+      decodeRegisterClassImpl<AMDGPU::RegClass##RegClassID>;
 
 #define DECODE_SrcOp(Name, EncSize, OpWidth, EncImm)                           \
   static DecodeStatus Name(MCInst &Inst, unsigned Imm, uint64_t /*Addr*/,      \

@arsenm arsenm marked this pull request as ready for review August 28, 2025 05:42
Comment on lines +163 to +164
static const constexpr RegClassDecoder Decode##RegClass##RegisterClass = \
decodeRegisterClassImpl<AMDGPU::RegClass##RegClassID>;
Copy link
Contributor

@Pierre-vh Pierre-vh Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static const constexpr RegClassDecoder Decode##RegClass##RegisterClass = \
decodeRegisterClassImpl<AMDGPU::RegClass##RegClassID>;
static const constexpr RegClassDecoder Decode##RegClass##RegisterClass = [](/* params go here*/){return decodeRegisterClassImpl(AMDGPU::RegClass##RegClassID, /* forward rest here */);

What about a constexpr lambda + no template ?
Could use std::bind too but I don't know if that's constexpr everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might as well be a macro at that point, the reason I did it this was was to avoid repeating the long argument list 3 times

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static const constexpr RegClassDecoder Decode##RegClass##RegisterClass = \
decodeRegisterClassImpl<AMDGPU::RegClass##RegClassID>;
static const constexpr RegClassDecoder Decode##RegClass##RegisterClass = [](auto... args) { return decodeRegisterClassImpl(AMDGPU::RegClass##RegClassID, args...); }

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though with that one I think you need to make the RegClassDecoder signature more explicit, or even manually type it out, so you still need to type the args twice unfortunately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define DECODE_OPERAND_REG_8(RegClass)                                         \
  static const constexpr auto Decode##RegClass##RegisterClass =                \
      [](auto... args) {                                                       \
        return decodeRegisterClassImpl<AMDGPU::RegClass##RegClassID>(args...); \
      };

Works, but it seems to cost about 5 seconds of build time

arsenm added 3 commits August 28, 2025 22:07
Use a template function for the implementation, and use the macro
to define a constant function pointer with the expected name. Not
sure if there's a cleaner way to do this. This worked out to less
code using variadic templates to forward the arguments, but it added
a noticable ~10 seconds to compilation time on this file.

This will help avoid another copy-paste version of this function
in a future change.
This seems to cost 4-5 seconds in build time in the file
This reverts commit 79d03ff80ed3ac9d8f072955c2bae5d7567ff56a.
@arsenm arsenm force-pushed the users/arsenm/amdgpu/disassembler-reduce-code-in-macro branch from 0f88c29 to e6cd6ef Compare August 28, 2025 13:08
@arsenm
Copy link
Contributor Author

arsenm commented Sep 2, 2025

ping, due to the compile time issue I think this should avoid the template. I'm thinking of opening a clang bug for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants