-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth #154865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
46bbe78
to
3297671
Compare
Hi @s-barannikov and @topperc, this is the new simplified version of the earlier change to compile out decoders that are unused for a particular bitwidth. This approach of using a type-trait seems simpler and is much less code changes than before. Note that it includes both the per-bitwidth decoder specialization as well as numbering decoders for each bitwidth, yielding some savings on the size of the decoder tables as well. I'll fill in the description soon to explain the motivation and changes. |
For AMDGPU, for example, the generated code looks like:
|
69c8e70
to
0930ffe
Compare
Just fixing some comments. |
Looks good to me overall. One thing I'm not sure of is using std::bitset where a larger integer type would do (e.g., using std::bitset<48> instead of uint64_t). Probably not a big deal, but I thought it is worth pointing out. Secondly, what will happen if I use a "wrong" type of instruction when calling decodeInstruction? Will it silently use the wrong decodeToMCInst? If so, this is not great, and it should be possible to add a runtime check that calling decodeInstruction with a particular table resolves to the correct decodeToMCInst. |
Right, for the bitwidth specialization to work though, we need different types, so using the same type for different bitwidth (like uint64_t for both 48 and 64 bit instructions) is not supported. I guess for RISCV, if 64-bit instructions are not supported, there is a choice is just using 64-bits. I'd hope though that std::bitset<48> would internally just use a uint64_t. One thing I thought of is generalizing this to allow > 1 bitwidth for the same type. The trait can be a per-type function like:
This will allow using the same C++ type for > 1 bitwidths. Then we would need to remove the current specializations in MCDecoder.h as they assume a specific usage and then backends can choose. Not sure if that generalization is needed.
Yeah, @topperc had suggested that in the earlier version. I plan to do that in a follow-on MR. The first byte of the generated decoder table will be the supported bitwidth and we can do an initial runtime check. |
LMK if this is something to pursue. I suspect it may not be too much effort to do this and offers additional flexibility for backends. |
I did and now RISCV can continue using uint64_t for 48-bit insts. Will update the PR soon |
The windows build failure seems to be this : https://stackoverflow.com/questions/29502052/template-specialization-and-enable-if-problems |
The specific form that seems to work with MSVC (and clang and gcc) was derived here: https://godbolt.org/z/Eqdf4Tehz And this was the solution: https://stackoverflow.com/questions/75686697/how-to-work-around-function-template-has-already-been-defined-when-using-std |
Looks like MSVC builds with compiler version 14.x do not support either form, so maybe not possible to pursue this at the moment. |
@s-barannikov this is ready for another round of review. The delta is that there are no longer default implementations of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please for @topperc review
Meanwhile, a couple of tests wouldn't hurt.
The description needs to be updated |
Yes thanks. Will add tests and update the description with one final set of numbers. |
Updated description and added a unit test (which found that this option did not work with |
@llvm/pr-subscribers-tablegen Author: Rahul Joshi (jurahul) ChangesThis change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. Current state: Several backends call New state: Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 4.5% reduction in the .rodata size. For RISCV, which today uses a single Actual measured sizes are as follows:
Patch is 35.73 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154865.diff 14 Files Affected:
diff --git a/llvm/include/llvm/MC/MCDecoder.h b/llvm/include/llvm/MC/MCDecoder.h
index 70762a4a5ebae..459c8a6a5ea34 100644
--- a/llvm/include/llvm/MC/MCDecoder.h
+++ b/llvm/include/llvm/MC/MCDecoder.h
@@ -12,6 +12,7 @@
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/Support/MathExtras.h"
+#include <bitset>
#include <cassert>
namespace llvm::MCD {
@@ -48,6 +49,15 @@ fieldFromInstruction(const InsnType &Insn, unsigned StartBit,
return Insn.extractBitsAsZExtValue(NumBits, StartBit);
}
+template <size_t N>
+uint64_t fieldFromInstruction(const std::bitset<N> &Insn, unsigned StartBit,
+ unsigned NumBits) {
+ assert(StartBit + NumBits <= N && "Instruction field out of bounds!");
+ assert(NumBits <= 64 && "Cannot support >64-bit extractions!");
+ const std::bitset<N> Mask(maskTrailingOnes<uint64_t>(NumBits));
+ return ((Insn >> StartBit) & Mask).to_ullong();
+}
+
// Helper function for inserting bits extracted from an encoded instruction into
// an integer-typed field.
template <typename IntType>
@@ -62,6 +72,13 @@ insertBits(IntType &field, IntType bits, unsigned startBit, unsigned numBits) {
field |= bits << startBit;
}
+// InsnBitWidth is essentially a type trait used by the decoder emitter to query
+// the supported bitwidth for a given type. But default, the value is 0, making
+// it an invalid type for use as `InsnType` when instantiating the decoder.
+// Individual targets are expected to provide specializations for these based
+// on their usage.
+template <typename T> static constexpr uint32_t InsnBitWidth = 0;
+
} // namespace llvm::MCD
#endif // LLVM_MC_MCDECODER_H
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index dc9dd220130ea..56dc6f9e8e6e9 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -6,7 +6,8 @@ tablegen(LLVM AMDGPUGenAsmMatcher.inc -gen-asm-matcher)
tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
tablegen(LLVM AMDGPUGenCallingConv.inc -gen-callingconv)
tablegen(LLVM AMDGPUGenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler
+ --specialize-decoders-per-bitwidth)
tablegen(LLVM AMDGPUGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM AMDGPUGenMCCodeEmitter.inc -gen-emitter)
tablegen(LLVM AMDGPUGenMCPseudoLowering.inc -gen-pseudo-lowering)
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 6a2beeed41dfd..80d194afa926b 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -38,6 +38,7 @@
#include "llvm/Support/Compiler.h"
using namespace llvm;
+using namespace llvm::MCD;
#define DEBUG_TYPE "amdgpu-disassembler"
@@ -446,6 +447,14 @@ static DecodeStatus decodeVersionImm(MCInst &Inst, unsigned Imm,
#include "AMDGPUGenDisassemblerTables.inc"
+// Define bitwidths for various types used to instantiate the decoder.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 64;
+template <>
+static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<96>> = 96;
+template <>
+static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<128>> = 128;
+
//===----------------------------------------------------------------------===//
//
//===----------------------------------------------------------------------===//
@@ -498,26 +507,24 @@ template <typename T> static inline T eatBytes(ArrayRef<uint8_t>& Bytes) {
return Res;
}
-static inline DecoderUInt128 eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<96> eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+ using namespace llvm::support::endian;
assert(Bytes.size() >= 12);
- uint64_t Lo =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<96> Lo(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- uint64_t Hi =
- support::endian::read<uint32_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<96> Hi(read<uint32_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(4);
- return DecoderUInt128(Lo, Hi);
+ return (Hi << 64) | Lo;
}
-static inline DecoderUInt128 eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<128> eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+ using namespace llvm::support::endian;
assert(Bytes.size() >= 16);
- uint64_t Lo =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<128> Lo(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- uint64_t Hi =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<128> Hi(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- return DecoderUInt128(Lo, Hi);
+ return (Hi << 64) | Lo;
}
void AMDGPUDisassembler::decodeImmOperands(MCInst &MI,
@@ -600,14 +607,14 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
// Try to decode DPP and SDWA first to solve conflict with VOP1 and VOP2
// encodings
if (isGFX1250() && Bytes.size() >= 16) {
- DecoderUInt128 DecW = eat16Bytes(Bytes);
+ std::bitset<128> DecW = eat16Bytes(Bytes);
if (tryDecodeInst(DecoderTableGFX1250128, MI, DecW, Address, CS))
break;
Bytes = Bytes_.slice(0, MaxInstBytesNum);
}
if (isGFX11Plus() && Bytes.size() >= 12) {
- DecoderUInt128 DecW = eat12Bytes(Bytes);
+ std::bitset<96> DecW = eat12Bytes(Bytes);
if (isGFX11() &&
tryDecodeInst(DecoderTableGFX1196, DecoderTableGFX11_FAKE1696, MI,
@@ -642,7 +649,7 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
} else if (Bytes.size() >= 16 &&
STI.hasFeature(AMDGPU::FeatureGFX950Insts)) {
- DecoderUInt128 DecW = eat16Bytes(Bytes);
+ std::bitset<128> DecW = eat16Bytes(Bytes);
if (tryDecodeInst(DecoderTableGFX940128, MI, DecW, Address, CS))
break;
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
index f4d164bf10c3c..ded447b6f8d5a 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
@@ -32,44 +32,6 @@ class MCOperand;
class MCSubtargetInfo;
class Twine;
-// Exposes an interface expected by autogenerated code in
-// FixedLenDecoderEmitter
-class DecoderUInt128 {
-private:
- uint64_t Lo = 0;
- uint64_t Hi = 0;
-
-public:
- DecoderUInt128() = default;
- DecoderUInt128(uint64_t Lo, uint64_t Hi = 0) : Lo(Lo), Hi(Hi) {}
- operator bool() const { return Lo || Hi; }
- uint64_t extractBitsAsZExtValue(unsigned NumBits,
- unsigned BitPosition) const {
- assert(NumBits && NumBits <= 64);
- assert(BitPosition < 128);
- uint64_t Val;
- if (BitPosition < 64)
- Val = Lo >> BitPosition | Hi << 1 << (63 - BitPosition);
- else
- Val = Hi >> (BitPosition - 64);
- return Val & ((uint64_t(2) << (NumBits - 1)) - 1);
- }
- DecoderUInt128 operator&(const DecoderUInt128 &RHS) const {
- return DecoderUInt128(Lo & RHS.Lo, Hi & RHS.Hi);
- }
- DecoderUInt128 operator&(const uint64_t &RHS) const {
- return *this & DecoderUInt128(RHS);
- }
- DecoderUInt128 operator~() const { return DecoderUInt128(~Lo, ~Hi); }
- bool operator==(const DecoderUInt128 &RHS) {
- return Lo == RHS.Lo && Hi == RHS.Hi;
- }
- bool operator!=(const DecoderUInt128 &RHS) {
- return Lo != RHS.Lo || Hi != RHS.Hi;
- }
- bool operator!=(const int &RHS) { return *this != DecoderUInt128(RHS); }
-};
-
//===----------------------------------------------------------------------===//
// AMDGPUDisassembler
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/DirectX/CMakeLists.txt b/llvm/lib/Target/DirectX/CMakeLists.txt
index 8100f941c8d94..6c079517e22d6 100644
--- a/llvm/lib/Target/DirectX/CMakeLists.txt
+++ b/llvm/lib/Target/DirectX/CMakeLists.txt
@@ -41,6 +41,7 @@ add_llvm_target(DirectXCodeGen
LINK_COMPONENTS
Analysis
AsmPrinter
+ BinaryFormat
CodeGen
CodeGenTypes
Core
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index 47329b2c2f4d2..531238ae85029 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -7,7 +7,8 @@ tablegen(LLVM RISCVGenAsmWriter.inc -gen-asm-writer)
tablegen(LLVM RISCVGenCompressInstEmitter.inc -gen-compress-inst-emitter)
tablegen(LLVM RISCVGenMacroFusion.inc -gen-macro-fusion-pred)
tablegen(LLVM RISCVGenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler
+ --specialize-decoders-per-bitwidth)
tablegen(LLVM RISCVGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM RISCVGenMCCodeEmitter.inc -gen-emitter)
tablegen(LLVM RISCVGenMCPseudoLowering.inc -gen-pseudo-lowering)
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index dbb16fce8390a..4becd3bbb25f2 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -558,7 +558,7 @@ static DecodeStatus decodeXqccmpRlistS0(MCInst &Inst, uint32_t Imm,
return decodeZcmpRlist(Inst, Imm, Address, Decoder);
}
-static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint32_t Insn,
+static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint16_t Insn,
uint64_t Address,
const MCDisassembler *Decoder) {
uint32_t Rs1 = fieldFromInstruction(Insn, 7, 5);
@@ -700,6 +700,12 @@ static constexpr DecoderListEntry DecoderList32[]{
{DecoderTableZdinxRV32Only32, {}, "RV32-only Zdinx (Double in Integer)"},
};
+// Define bitwidths for various types used to instantiate the decoder.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint16_t> = 16;
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
+// Use uint64_t to represent 48 bit instructions.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 48;
+
DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
ArrayRef<uint8_t> Bytes,
uint64_t Address,
@@ -710,9 +716,7 @@ DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
}
Size = 4;
- // Use uint64_t to match getInstruction48. decodeInstruction is templated
- // on the Insn type.
- uint64_t Insn = support::endian::read32le(Bytes.data());
+ uint32_t Insn = support::endian::read32le(Bytes.data());
for (const DecoderListEntry &Entry : DecoderList32) {
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
@@ -758,9 +762,7 @@ DecodeStatus RISCVDisassembler::getInstruction16(MCInst &MI, uint64_t &Size,
}
Size = 2;
- // Use uint64_t to match getInstruction48. decodeInstruction is templated
- // on the Insn type.
- uint64_t Insn = support::endian::read16le(Bytes.data());
+ uint16_t Insn = support::endian::read16le(Bytes.data());
for (const DecoderListEntry &Entry : DecoderList16) {
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
diff --git a/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td b/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td
new file mode 100644
index 0000000000000..6fd9676939bde
--- /dev/null
+++ b/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td
@@ -0,0 +1,169 @@
+// RUN: llvm-tblgen -gen-disassembler -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-DEFAULT
+// RUN: llvm-tblgen -gen-disassembler -specialize-decoders-per-bitwidth -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-SPECIALIZE-NO-TABLE
+// RUN: llvm-tblgen -gen-disassembler -specialize-decoders-per-bitwidth -use-fn-table-in-decode-to-mcinst -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-SPECIALIZE-TABLE
+
+
+include "llvm/Target/Target.td"
+
+def archInstrInfo : InstrInfo { }
+
+def arch : Target {
+ let InstructionSet = archInstrInfo;
+}
+
+let Namespace = "arch" in {
+ def R0 : Register<"r0">;
+ def R1 : Register<"r1">;
+ def R2 : Register<"r2">;
+ def R3 : Register<"r3">;
+}
+def Regs : RegisterClass<"Regs", [i32], 32, (add R0, R1, R2, R3)>;
+
+// Bit 0 of the encoding determines the size (8 or 16 bits).
+// Bits {3..1} define the number define the number of operands encoded.
+class Instruction8Bit<int NumOps> : Instruction {
+ let Size = 1;
+ let OutOperandList = (outs);
+ field bits<8> Inst;
+ let Inst{0} = 0;
+ let Inst{3-1} = NumOps;
+}
+
+class Instruction16Bit<int NumOps> : Instruction {
+ let Size = 2;
+ let OutOperandList = (outs);
+ field bits<16> Inst;
+ let Inst{0} = 1;
+ let Inst{3-1} = NumOps;
+}
+
+// Define instructions to generate 4 cases in decodeToMCInst.
+// Each register operand needs 2 bits to encode.
+
+// An instruction with no inputs.
+def Inst0 : Instruction8Bit<0> {
+ let Inst{7-4} = 0;
+ let InOperandList = (ins);
+ let AsmString = "Inst0";
+}
+
+// An instruction with a single input.
+def Inst1 : Instruction8Bit<1> {
+ bits<2> r0;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = 0;
+ let InOperandList = (ins Regs:$r0);
+ let AsmString = "Inst1";
+}
+
+// An instruction with two inputs.
+def Inst2 : Instruction16Bit<2> {
+ bits<2> r0;
+ bits<2> r1;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = r1;
+ let Inst{15-8} = 0;
+ let InOperandList = (ins Regs:$r0, Regs:$r1);
+ let AsmString = "Inst2";
+}
+
+// An instruction with three inputs. .
+def Inst3 : Instruction16Bit<3> {
+ bits<2> r0;
+ bits<2> r1;
+ bits<2> r2;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = r1;
+ let Inst{9-8} = r2;
+ let Inst{15-10} = 0;
+ let InOperandList = (ins Regs:$r0, Regs:$r1, Regs:$r2);
+ let AsmString = "Inst3";
+}
+
+// -----------------------------------------------------------------------------
+// In the default case, we emit a single decodeToMCinst function and DecodeIdx
+// is shared across all bitwidths.
+
+// CHECK-DEFAULT-LABEL: DecoderTable8[25]
+// CHECK-DEFAULT: DecodeIdx: 0
+// CHECK-DEFAULT: DecodeIdx: 1
+// CHECK-DEFAULT: };
+
+// CHECK-DEFAULT-LABEL: DecoderTable16[25]
+// CHECK-DEFAULT: DecodeIdx: 2
+// CHECK-DEFAULT: DecodeIdx: 3
+// CHECK-DEFAULT: };
+
+// CHECK-DEFAULT-LABEL: template <typename InsnType>
+// CHECK-DEFAULT-NEXT: static DecodeStatus decodeToMCInst
+// CHECK-DEFAULT: case 0
+// CHECK-DEFAULT: case 1
+// CHECK-DEFAULT: case 2
+// CHECK-DEFAULT: case 3
+
+// -----------------------------------------------------------------------------
+// When we specialize per bitwidth, we emit 2 decodeToMCInst functions and
+// DecodeIdx is assigned per bit width.
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: DecoderTable8[25]
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-NO-TABLE: };
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-NO-TABLE: case 0
+// CHECK-SPECIALIZE-NO-TABLE: case 1
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: DecoderTable16[25]
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-NO-TABLE: };
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-NO-TABLE: case 0
+// CHECK-SPECIALIZE-NO-TABLE: case 1
+
+// -----------------------------------------------------------------------------
+// Per bitwidth specialization with function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: DecoderTable8[25]
+// CHECK-SPECIALIZE-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-TABLE: };
+
+// 8 bit functions and function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_0
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_1
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-TABLE-LABEL: static constexpr DecodeFnTy decodeFnTable[] = {
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_0,
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_1,
+// CHECK-SPECIALIZE-TABLE-NEXT: };
+
+// 16 bit functions and function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_0
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_1
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-TABLE-LABEL: static constexpr DecodeFnTy decodeFnTable[] = {
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_0,
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_1,
+// CHECK-SPECIALIZE-TABLE-NEXT: };
diff --git a/llvm/test/TableGen/DecoderEmitterFnTable.td b/llvm/test/TableGen/DecoderEmitterFnTable.td
index 7bed18c19a513..8929e6da716e6 100644
--- a/llvm/test/TableGen/DecoderEmitterFnTable.td
+++ b/llvm/test/TableGen/DecoderEmitterFnTable.td
@@ -71,14 +71,14 @@ def Inst3 : TestInstruction {
let AsmString = "Inst3";
}
-// CHECK-LABEL: DecodeStatus decodeFn0(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn1(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn2(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn3(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_0(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_1(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_2(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_3(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
// CHECK-LABEL: decodeToMCInst(unsigned Idx, DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
// CHECK: static constexpr DecodeFnTy decodeFnTable[]
-// CHECK-NEXT: decodeFn0,
-// CHECK-NEX...
[truncated]
|
@llvm/pr-subscribers-backend-amdgpu Author: Rahul Joshi (jurahul) ChangesThis change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. Current state: Several backends call New state: Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 4.5% reduction in the .rodata size. For RISCV, which today uses a single Actual measured sizes are as follows:
Patch is 35.73 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154865.diff 14 Files Affected:
diff --git a/llvm/include/llvm/MC/MCDecoder.h b/llvm/include/llvm/MC/MCDecoder.h
index 70762a4a5ebae..459c8a6a5ea34 100644
--- a/llvm/include/llvm/MC/MCDecoder.h
+++ b/llvm/include/llvm/MC/MCDecoder.h
@@ -12,6 +12,7 @@
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/Support/MathExtras.h"
+#include <bitset>
#include <cassert>
namespace llvm::MCD {
@@ -48,6 +49,15 @@ fieldFromInstruction(const InsnType &Insn, unsigned StartBit,
return Insn.extractBitsAsZExtValue(NumBits, StartBit);
}
+template <size_t N>
+uint64_t fieldFromInstruction(const std::bitset<N> &Insn, unsigned StartBit,
+ unsigned NumBits) {
+ assert(StartBit + NumBits <= N && "Instruction field out of bounds!");
+ assert(NumBits <= 64 && "Cannot support >64-bit extractions!");
+ const std::bitset<N> Mask(maskTrailingOnes<uint64_t>(NumBits));
+ return ((Insn >> StartBit) & Mask).to_ullong();
+}
+
// Helper function for inserting bits extracted from an encoded instruction into
// an integer-typed field.
template <typename IntType>
@@ -62,6 +72,13 @@ insertBits(IntType &field, IntType bits, unsigned startBit, unsigned numBits) {
field |= bits << startBit;
}
+// InsnBitWidth is essentially a type trait used by the decoder emitter to query
+// the supported bitwidth for a given type. But default, the value is 0, making
+// it an invalid type for use as `InsnType` when instantiating the decoder.
+// Individual targets are expected to provide specializations for these based
+// on their usage.
+template <typename T> static constexpr uint32_t InsnBitWidth = 0;
+
} // namespace llvm::MCD
#endif // LLVM_MC_MCDECODER_H
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index dc9dd220130ea..56dc6f9e8e6e9 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -6,7 +6,8 @@ tablegen(LLVM AMDGPUGenAsmMatcher.inc -gen-asm-matcher)
tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
tablegen(LLVM AMDGPUGenCallingConv.inc -gen-callingconv)
tablegen(LLVM AMDGPUGenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler
+ --specialize-decoders-per-bitwidth)
tablegen(LLVM AMDGPUGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM AMDGPUGenMCCodeEmitter.inc -gen-emitter)
tablegen(LLVM AMDGPUGenMCPseudoLowering.inc -gen-pseudo-lowering)
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 6a2beeed41dfd..80d194afa926b 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -38,6 +38,7 @@
#include "llvm/Support/Compiler.h"
using namespace llvm;
+using namespace llvm::MCD;
#define DEBUG_TYPE "amdgpu-disassembler"
@@ -446,6 +447,14 @@ static DecodeStatus decodeVersionImm(MCInst &Inst, unsigned Imm,
#include "AMDGPUGenDisassemblerTables.inc"
+// Define bitwidths for various types used to instantiate the decoder.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 64;
+template <>
+static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<96>> = 96;
+template <>
+static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<128>> = 128;
+
//===----------------------------------------------------------------------===//
//
//===----------------------------------------------------------------------===//
@@ -498,26 +507,24 @@ template <typename T> static inline T eatBytes(ArrayRef<uint8_t>& Bytes) {
return Res;
}
-static inline DecoderUInt128 eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<96> eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+ using namespace llvm::support::endian;
assert(Bytes.size() >= 12);
- uint64_t Lo =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<96> Lo(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- uint64_t Hi =
- support::endian::read<uint32_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<96> Hi(read<uint32_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(4);
- return DecoderUInt128(Lo, Hi);
+ return (Hi << 64) | Lo;
}
-static inline DecoderUInt128 eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<128> eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+ using namespace llvm::support::endian;
assert(Bytes.size() >= 16);
- uint64_t Lo =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<128> Lo(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- uint64_t Hi =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<128> Hi(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- return DecoderUInt128(Lo, Hi);
+ return (Hi << 64) | Lo;
}
void AMDGPUDisassembler::decodeImmOperands(MCInst &MI,
@@ -600,14 +607,14 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
// Try to decode DPP and SDWA first to solve conflict with VOP1 and VOP2
// encodings
if (isGFX1250() && Bytes.size() >= 16) {
- DecoderUInt128 DecW = eat16Bytes(Bytes);
+ std::bitset<128> DecW = eat16Bytes(Bytes);
if (tryDecodeInst(DecoderTableGFX1250128, MI, DecW, Address, CS))
break;
Bytes = Bytes_.slice(0, MaxInstBytesNum);
}
if (isGFX11Plus() && Bytes.size() >= 12) {
- DecoderUInt128 DecW = eat12Bytes(Bytes);
+ std::bitset<96> DecW = eat12Bytes(Bytes);
if (isGFX11() &&
tryDecodeInst(DecoderTableGFX1196, DecoderTableGFX11_FAKE1696, MI,
@@ -642,7 +649,7 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
} else if (Bytes.size() >= 16 &&
STI.hasFeature(AMDGPU::FeatureGFX950Insts)) {
- DecoderUInt128 DecW = eat16Bytes(Bytes);
+ std::bitset<128> DecW = eat16Bytes(Bytes);
if (tryDecodeInst(DecoderTableGFX940128, MI, DecW, Address, CS))
break;
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
index f4d164bf10c3c..ded447b6f8d5a 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
@@ -32,44 +32,6 @@ class MCOperand;
class MCSubtargetInfo;
class Twine;
-// Exposes an interface expected by autogenerated code in
-// FixedLenDecoderEmitter
-class DecoderUInt128 {
-private:
- uint64_t Lo = 0;
- uint64_t Hi = 0;
-
-public:
- DecoderUInt128() = default;
- DecoderUInt128(uint64_t Lo, uint64_t Hi = 0) : Lo(Lo), Hi(Hi) {}
- operator bool() const { return Lo || Hi; }
- uint64_t extractBitsAsZExtValue(unsigned NumBits,
- unsigned BitPosition) const {
- assert(NumBits && NumBits <= 64);
- assert(BitPosition < 128);
- uint64_t Val;
- if (BitPosition < 64)
- Val = Lo >> BitPosition | Hi << 1 << (63 - BitPosition);
- else
- Val = Hi >> (BitPosition - 64);
- return Val & ((uint64_t(2) << (NumBits - 1)) - 1);
- }
- DecoderUInt128 operator&(const DecoderUInt128 &RHS) const {
- return DecoderUInt128(Lo & RHS.Lo, Hi & RHS.Hi);
- }
- DecoderUInt128 operator&(const uint64_t &RHS) const {
- return *this & DecoderUInt128(RHS);
- }
- DecoderUInt128 operator~() const { return DecoderUInt128(~Lo, ~Hi); }
- bool operator==(const DecoderUInt128 &RHS) {
- return Lo == RHS.Lo && Hi == RHS.Hi;
- }
- bool operator!=(const DecoderUInt128 &RHS) {
- return Lo != RHS.Lo || Hi != RHS.Hi;
- }
- bool operator!=(const int &RHS) { return *this != DecoderUInt128(RHS); }
-};
-
//===----------------------------------------------------------------------===//
// AMDGPUDisassembler
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/DirectX/CMakeLists.txt b/llvm/lib/Target/DirectX/CMakeLists.txt
index 8100f941c8d94..6c079517e22d6 100644
--- a/llvm/lib/Target/DirectX/CMakeLists.txt
+++ b/llvm/lib/Target/DirectX/CMakeLists.txt
@@ -41,6 +41,7 @@ add_llvm_target(DirectXCodeGen
LINK_COMPONENTS
Analysis
AsmPrinter
+ BinaryFormat
CodeGen
CodeGenTypes
Core
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index 47329b2c2f4d2..531238ae85029 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -7,7 +7,8 @@ tablegen(LLVM RISCVGenAsmWriter.inc -gen-asm-writer)
tablegen(LLVM RISCVGenCompressInstEmitter.inc -gen-compress-inst-emitter)
tablegen(LLVM RISCVGenMacroFusion.inc -gen-macro-fusion-pred)
tablegen(LLVM RISCVGenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler
+ --specialize-decoders-per-bitwidth)
tablegen(LLVM RISCVGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM RISCVGenMCCodeEmitter.inc -gen-emitter)
tablegen(LLVM RISCVGenMCPseudoLowering.inc -gen-pseudo-lowering)
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index dbb16fce8390a..4becd3bbb25f2 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -558,7 +558,7 @@ static DecodeStatus decodeXqccmpRlistS0(MCInst &Inst, uint32_t Imm,
return decodeZcmpRlist(Inst, Imm, Address, Decoder);
}
-static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint32_t Insn,
+static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint16_t Insn,
uint64_t Address,
const MCDisassembler *Decoder) {
uint32_t Rs1 = fieldFromInstruction(Insn, 7, 5);
@@ -700,6 +700,12 @@ static constexpr DecoderListEntry DecoderList32[]{
{DecoderTableZdinxRV32Only32, {}, "RV32-only Zdinx (Double in Integer)"},
};
+// Define bitwidths for various types used to instantiate the decoder.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint16_t> = 16;
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
+// Use uint64_t to represent 48 bit instructions.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 48;
+
DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
ArrayRef<uint8_t> Bytes,
uint64_t Address,
@@ -710,9 +716,7 @@ DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
}
Size = 4;
- // Use uint64_t to match getInstruction48. decodeInstruction is templated
- // on the Insn type.
- uint64_t Insn = support::endian::read32le(Bytes.data());
+ uint32_t Insn = support::endian::read32le(Bytes.data());
for (const DecoderListEntry &Entry : DecoderList32) {
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
@@ -758,9 +762,7 @@ DecodeStatus RISCVDisassembler::getInstruction16(MCInst &MI, uint64_t &Size,
}
Size = 2;
- // Use uint64_t to match getInstruction48. decodeInstruction is templated
- // on the Insn type.
- uint64_t Insn = support::endian::read16le(Bytes.data());
+ uint16_t Insn = support::endian::read16le(Bytes.data());
for (const DecoderListEntry &Entry : DecoderList16) {
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
diff --git a/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td b/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td
new file mode 100644
index 0000000000000..6fd9676939bde
--- /dev/null
+++ b/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td
@@ -0,0 +1,169 @@
+// RUN: llvm-tblgen -gen-disassembler -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-DEFAULT
+// RUN: llvm-tblgen -gen-disassembler -specialize-decoders-per-bitwidth -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-SPECIALIZE-NO-TABLE
+// RUN: llvm-tblgen -gen-disassembler -specialize-decoders-per-bitwidth -use-fn-table-in-decode-to-mcinst -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-SPECIALIZE-TABLE
+
+
+include "llvm/Target/Target.td"
+
+def archInstrInfo : InstrInfo { }
+
+def arch : Target {
+ let InstructionSet = archInstrInfo;
+}
+
+let Namespace = "arch" in {
+ def R0 : Register<"r0">;
+ def R1 : Register<"r1">;
+ def R2 : Register<"r2">;
+ def R3 : Register<"r3">;
+}
+def Regs : RegisterClass<"Regs", [i32], 32, (add R0, R1, R2, R3)>;
+
+// Bit 0 of the encoding determines the size (8 or 16 bits).
+// Bits {3..1} define the number define the number of operands encoded.
+class Instruction8Bit<int NumOps> : Instruction {
+ let Size = 1;
+ let OutOperandList = (outs);
+ field bits<8> Inst;
+ let Inst{0} = 0;
+ let Inst{3-1} = NumOps;
+}
+
+class Instruction16Bit<int NumOps> : Instruction {
+ let Size = 2;
+ let OutOperandList = (outs);
+ field bits<16> Inst;
+ let Inst{0} = 1;
+ let Inst{3-1} = NumOps;
+}
+
+// Define instructions to generate 4 cases in decodeToMCInst.
+// Each register operand needs 2 bits to encode.
+
+// An instruction with no inputs.
+def Inst0 : Instruction8Bit<0> {
+ let Inst{7-4} = 0;
+ let InOperandList = (ins);
+ let AsmString = "Inst0";
+}
+
+// An instruction with a single input.
+def Inst1 : Instruction8Bit<1> {
+ bits<2> r0;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = 0;
+ let InOperandList = (ins Regs:$r0);
+ let AsmString = "Inst1";
+}
+
+// An instruction with two inputs.
+def Inst2 : Instruction16Bit<2> {
+ bits<2> r0;
+ bits<2> r1;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = r1;
+ let Inst{15-8} = 0;
+ let InOperandList = (ins Regs:$r0, Regs:$r1);
+ let AsmString = "Inst2";
+}
+
+// An instruction with three inputs. .
+def Inst3 : Instruction16Bit<3> {
+ bits<2> r0;
+ bits<2> r1;
+ bits<2> r2;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = r1;
+ let Inst{9-8} = r2;
+ let Inst{15-10} = 0;
+ let InOperandList = (ins Regs:$r0, Regs:$r1, Regs:$r2);
+ let AsmString = "Inst3";
+}
+
+// -----------------------------------------------------------------------------
+// In the default case, we emit a single decodeToMCinst function and DecodeIdx
+// is shared across all bitwidths.
+
+// CHECK-DEFAULT-LABEL: DecoderTable8[25]
+// CHECK-DEFAULT: DecodeIdx: 0
+// CHECK-DEFAULT: DecodeIdx: 1
+// CHECK-DEFAULT: };
+
+// CHECK-DEFAULT-LABEL: DecoderTable16[25]
+// CHECK-DEFAULT: DecodeIdx: 2
+// CHECK-DEFAULT: DecodeIdx: 3
+// CHECK-DEFAULT: };
+
+// CHECK-DEFAULT-LABEL: template <typename InsnType>
+// CHECK-DEFAULT-NEXT: static DecodeStatus decodeToMCInst
+// CHECK-DEFAULT: case 0
+// CHECK-DEFAULT: case 1
+// CHECK-DEFAULT: case 2
+// CHECK-DEFAULT: case 3
+
+// -----------------------------------------------------------------------------
+// When we specialize per bitwidth, we emit 2 decodeToMCInst functions and
+// DecodeIdx is assigned per bit width.
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: DecoderTable8[25]
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-NO-TABLE: };
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-NO-TABLE: case 0
+// CHECK-SPECIALIZE-NO-TABLE: case 1
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: DecoderTable16[25]
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-NO-TABLE: };
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-NO-TABLE: case 0
+// CHECK-SPECIALIZE-NO-TABLE: case 1
+
+// -----------------------------------------------------------------------------
+// Per bitwidth specialization with function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: DecoderTable8[25]
+// CHECK-SPECIALIZE-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-TABLE: };
+
+// 8 bit functions and function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_0
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_1
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-TABLE-LABEL: static constexpr DecodeFnTy decodeFnTable[] = {
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_0,
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_1,
+// CHECK-SPECIALIZE-TABLE-NEXT: };
+
+// 16 bit functions and function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_0
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_1
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-TABLE-LABEL: static constexpr DecodeFnTy decodeFnTable[] = {
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_0,
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_1,
+// CHECK-SPECIALIZE-TABLE-NEXT: };
diff --git a/llvm/test/TableGen/DecoderEmitterFnTable.td b/llvm/test/TableGen/DecoderEmitterFnTable.td
index 7bed18c19a513..8929e6da716e6 100644
--- a/llvm/test/TableGen/DecoderEmitterFnTable.td
+++ b/llvm/test/TableGen/DecoderEmitterFnTable.td
@@ -71,14 +71,14 @@ def Inst3 : TestInstruction {
let AsmString = "Inst3";
}
-// CHECK-LABEL: DecodeStatus decodeFn0(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn1(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn2(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn3(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_0(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_1(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_2(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_3(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
// CHECK-LABEL: decodeToMCInst(unsigned Idx, DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
// CHECK: static constexpr DecodeFnTy decodeFnTable[]
-// CHECK-NEXT: decodeFn0,
-// CHECK-NEX...
[truncated]
|
@llvm/pr-subscribers-backend-risc-v Author: Rahul Joshi (jurahul) ChangesThis change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. Current state: Several backends call New state: Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 4.5% reduction in the .rodata size. For RISCV, which today uses a single Actual measured sizes are as follows:
Patch is 35.73 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154865.diff 14 Files Affected:
diff --git a/llvm/include/llvm/MC/MCDecoder.h b/llvm/include/llvm/MC/MCDecoder.h
index 70762a4a5ebae..459c8a6a5ea34 100644
--- a/llvm/include/llvm/MC/MCDecoder.h
+++ b/llvm/include/llvm/MC/MCDecoder.h
@@ -12,6 +12,7 @@
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/Support/MathExtras.h"
+#include <bitset>
#include <cassert>
namespace llvm::MCD {
@@ -48,6 +49,15 @@ fieldFromInstruction(const InsnType &Insn, unsigned StartBit,
return Insn.extractBitsAsZExtValue(NumBits, StartBit);
}
+template <size_t N>
+uint64_t fieldFromInstruction(const std::bitset<N> &Insn, unsigned StartBit,
+ unsigned NumBits) {
+ assert(StartBit + NumBits <= N && "Instruction field out of bounds!");
+ assert(NumBits <= 64 && "Cannot support >64-bit extractions!");
+ const std::bitset<N> Mask(maskTrailingOnes<uint64_t>(NumBits));
+ return ((Insn >> StartBit) & Mask).to_ullong();
+}
+
// Helper function for inserting bits extracted from an encoded instruction into
// an integer-typed field.
template <typename IntType>
@@ -62,6 +72,13 @@ insertBits(IntType &field, IntType bits, unsigned startBit, unsigned numBits) {
field |= bits << startBit;
}
+// InsnBitWidth is essentially a type trait used by the decoder emitter to query
+// the supported bitwidth for a given type. But default, the value is 0, making
+// it an invalid type for use as `InsnType` when instantiating the decoder.
+// Individual targets are expected to provide specializations for these based
+// on their usage.
+template <typename T> static constexpr uint32_t InsnBitWidth = 0;
+
} // namespace llvm::MCD
#endif // LLVM_MC_MCDECODER_H
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index dc9dd220130ea..56dc6f9e8e6e9 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -6,7 +6,8 @@ tablegen(LLVM AMDGPUGenAsmMatcher.inc -gen-asm-matcher)
tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
tablegen(LLVM AMDGPUGenCallingConv.inc -gen-callingconv)
tablegen(LLVM AMDGPUGenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler
+ --specialize-decoders-per-bitwidth)
tablegen(LLVM AMDGPUGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM AMDGPUGenMCCodeEmitter.inc -gen-emitter)
tablegen(LLVM AMDGPUGenMCPseudoLowering.inc -gen-pseudo-lowering)
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 6a2beeed41dfd..80d194afa926b 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -38,6 +38,7 @@
#include "llvm/Support/Compiler.h"
using namespace llvm;
+using namespace llvm::MCD;
#define DEBUG_TYPE "amdgpu-disassembler"
@@ -446,6 +447,14 @@ static DecodeStatus decodeVersionImm(MCInst &Inst, unsigned Imm,
#include "AMDGPUGenDisassemblerTables.inc"
+// Define bitwidths for various types used to instantiate the decoder.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 64;
+template <>
+static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<96>> = 96;
+template <>
+static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<128>> = 128;
+
//===----------------------------------------------------------------------===//
//
//===----------------------------------------------------------------------===//
@@ -498,26 +507,24 @@ template <typename T> static inline T eatBytes(ArrayRef<uint8_t>& Bytes) {
return Res;
}
-static inline DecoderUInt128 eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<96> eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+ using namespace llvm::support::endian;
assert(Bytes.size() >= 12);
- uint64_t Lo =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<96> Lo(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- uint64_t Hi =
- support::endian::read<uint32_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<96> Hi(read<uint32_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(4);
- return DecoderUInt128(Lo, Hi);
+ return (Hi << 64) | Lo;
}
-static inline DecoderUInt128 eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<128> eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+ using namespace llvm::support::endian;
assert(Bytes.size() >= 16);
- uint64_t Lo =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<128> Lo(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- uint64_t Hi =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<128> Hi(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- return DecoderUInt128(Lo, Hi);
+ return (Hi << 64) | Lo;
}
void AMDGPUDisassembler::decodeImmOperands(MCInst &MI,
@@ -600,14 +607,14 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
// Try to decode DPP and SDWA first to solve conflict with VOP1 and VOP2
// encodings
if (isGFX1250() && Bytes.size() >= 16) {
- DecoderUInt128 DecW = eat16Bytes(Bytes);
+ std::bitset<128> DecW = eat16Bytes(Bytes);
if (tryDecodeInst(DecoderTableGFX1250128, MI, DecW, Address, CS))
break;
Bytes = Bytes_.slice(0, MaxInstBytesNum);
}
if (isGFX11Plus() && Bytes.size() >= 12) {
- DecoderUInt128 DecW = eat12Bytes(Bytes);
+ std::bitset<96> DecW = eat12Bytes(Bytes);
if (isGFX11() &&
tryDecodeInst(DecoderTableGFX1196, DecoderTableGFX11_FAKE1696, MI,
@@ -642,7 +649,7 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
} else if (Bytes.size() >= 16 &&
STI.hasFeature(AMDGPU::FeatureGFX950Insts)) {
- DecoderUInt128 DecW = eat16Bytes(Bytes);
+ std::bitset<128> DecW = eat16Bytes(Bytes);
if (tryDecodeInst(DecoderTableGFX940128, MI, DecW, Address, CS))
break;
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
index f4d164bf10c3c..ded447b6f8d5a 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
@@ -32,44 +32,6 @@ class MCOperand;
class MCSubtargetInfo;
class Twine;
-// Exposes an interface expected by autogenerated code in
-// FixedLenDecoderEmitter
-class DecoderUInt128 {
-private:
- uint64_t Lo = 0;
- uint64_t Hi = 0;
-
-public:
- DecoderUInt128() = default;
- DecoderUInt128(uint64_t Lo, uint64_t Hi = 0) : Lo(Lo), Hi(Hi) {}
- operator bool() const { return Lo || Hi; }
- uint64_t extractBitsAsZExtValue(unsigned NumBits,
- unsigned BitPosition) const {
- assert(NumBits && NumBits <= 64);
- assert(BitPosition < 128);
- uint64_t Val;
- if (BitPosition < 64)
- Val = Lo >> BitPosition | Hi << 1 << (63 - BitPosition);
- else
- Val = Hi >> (BitPosition - 64);
- return Val & ((uint64_t(2) << (NumBits - 1)) - 1);
- }
- DecoderUInt128 operator&(const DecoderUInt128 &RHS) const {
- return DecoderUInt128(Lo & RHS.Lo, Hi & RHS.Hi);
- }
- DecoderUInt128 operator&(const uint64_t &RHS) const {
- return *this & DecoderUInt128(RHS);
- }
- DecoderUInt128 operator~() const { return DecoderUInt128(~Lo, ~Hi); }
- bool operator==(const DecoderUInt128 &RHS) {
- return Lo == RHS.Lo && Hi == RHS.Hi;
- }
- bool operator!=(const DecoderUInt128 &RHS) {
- return Lo != RHS.Lo || Hi != RHS.Hi;
- }
- bool operator!=(const int &RHS) { return *this != DecoderUInt128(RHS); }
-};
-
//===----------------------------------------------------------------------===//
// AMDGPUDisassembler
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/DirectX/CMakeLists.txt b/llvm/lib/Target/DirectX/CMakeLists.txt
index 8100f941c8d94..6c079517e22d6 100644
--- a/llvm/lib/Target/DirectX/CMakeLists.txt
+++ b/llvm/lib/Target/DirectX/CMakeLists.txt
@@ -41,6 +41,7 @@ add_llvm_target(DirectXCodeGen
LINK_COMPONENTS
Analysis
AsmPrinter
+ BinaryFormat
CodeGen
CodeGenTypes
Core
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index 47329b2c2f4d2..531238ae85029 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -7,7 +7,8 @@ tablegen(LLVM RISCVGenAsmWriter.inc -gen-asm-writer)
tablegen(LLVM RISCVGenCompressInstEmitter.inc -gen-compress-inst-emitter)
tablegen(LLVM RISCVGenMacroFusion.inc -gen-macro-fusion-pred)
tablegen(LLVM RISCVGenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler
+ --specialize-decoders-per-bitwidth)
tablegen(LLVM RISCVGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM RISCVGenMCCodeEmitter.inc -gen-emitter)
tablegen(LLVM RISCVGenMCPseudoLowering.inc -gen-pseudo-lowering)
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index dbb16fce8390a..4becd3bbb25f2 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -558,7 +558,7 @@ static DecodeStatus decodeXqccmpRlistS0(MCInst &Inst, uint32_t Imm,
return decodeZcmpRlist(Inst, Imm, Address, Decoder);
}
-static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint32_t Insn,
+static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint16_t Insn,
uint64_t Address,
const MCDisassembler *Decoder) {
uint32_t Rs1 = fieldFromInstruction(Insn, 7, 5);
@@ -700,6 +700,12 @@ static constexpr DecoderListEntry DecoderList32[]{
{DecoderTableZdinxRV32Only32, {}, "RV32-only Zdinx (Double in Integer)"},
};
+// Define bitwidths for various types used to instantiate the decoder.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint16_t> = 16;
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
+// Use uint64_t to represent 48 bit instructions.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 48;
+
DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
ArrayRef<uint8_t> Bytes,
uint64_t Address,
@@ -710,9 +716,7 @@ DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
}
Size = 4;
- // Use uint64_t to match getInstruction48. decodeInstruction is templated
- // on the Insn type.
- uint64_t Insn = support::endian::read32le(Bytes.data());
+ uint32_t Insn = support::endian::read32le(Bytes.data());
for (const DecoderListEntry &Entry : DecoderList32) {
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
@@ -758,9 +762,7 @@ DecodeStatus RISCVDisassembler::getInstruction16(MCInst &MI, uint64_t &Size,
}
Size = 2;
- // Use uint64_t to match getInstruction48. decodeInstruction is templated
- // on the Insn type.
- uint64_t Insn = support::endian::read16le(Bytes.data());
+ uint16_t Insn = support::endian::read16le(Bytes.data());
for (const DecoderListEntry &Entry : DecoderList16) {
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
diff --git a/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td b/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td
new file mode 100644
index 0000000000000..6fd9676939bde
--- /dev/null
+++ b/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td
@@ -0,0 +1,169 @@
+// RUN: llvm-tblgen -gen-disassembler -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-DEFAULT
+// RUN: llvm-tblgen -gen-disassembler -specialize-decoders-per-bitwidth -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-SPECIALIZE-NO-TABLE
+// RUN: llvm-tblgen -gen-disassembler -specialize-decoders-per-bitwidth -use-fn-table-in-decode-to-mcinst -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-SPECIALIZE-TABLE
+
+
+include "llvm/Target/Target.td"
+
+def archInstrInfo : InstrInfo { }
+
+def arch : Target {
+ let InstructionSet = archInstrInfo;
+}
+
+let Namespace = "arch" in {
+ def R0 : Register<"r0">;
+ def R1 : Register<"r1">;
+ def R2 : Register<"r2">;
+ def R3 : Register<"r3">;
+}
+def Regs : RegisterClass<"Regs", [i32], 32, (add R0, R1, R2, R3)>;
+
+// Bit 0 of the encoding determines the size (8 or 16 bits).
+// Bits {3..1} define the number define the number of operands encoded.
+class Instruction8Bit<int NumOps> : Instruction {
+ let Size = 1;
+ let OutOperandList = (outs);
+ field bits<8> Inst;
+ let Inst{0} = 0;
+ let Inst{3-1} = NumOps;
+}
+
+class Instruction16Bit<int NumOps> : Instruction {
+ let Size = 2;
+ let OutOperandList = (outs);
+ field bits<16> Inst;
+ let Inst{0} = 1;
+ let Inst{3-1} = NumOps;
+}
+
+// Define instructions to generate 4 cases in decodeToMCInst.
+// Each register operand needs 2 bits to encode.
+
+// An instruction with no inputs.
+def Inst0 : Instruction8Bit<0> {
+ let Inst{7-4} = 0;
+ let InOperandList = (ins);
+ let AsmString = "Inst0";
+}
+
+// An instruction with a single input.
+def Inst1 : Instruction8Bit<1> {
+ bits<2> r0;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = 0;
+ let InOperandList = (ins Regs:$r0);
+ let AsmString = "Inst1";
+}
+
+// An instruction with two inputs.
+def Inst2 : Instruction16Bit<2> {
+ bits<2> r0;
+ bits<2> r1;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = r1;
+ let Inst{15-8} = 0;
+ let InOperandList = (ins Regs:$r0, Regs:$r1);
+ let AsmString = "Inst2";
+}
+
+// An instruction with three inputs. .
+def Inst3 : Instruction16Bit<3> {
+ bits<2> r0;
+ bits<2> r1;
+ bits<2> r2;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = r1;
+ let Inst{9-8} = r2;
+ let Inst{15-10} = 0;
+ let InOperandList = (ins Regs:$r0, Regs:$r1, Regs:$r2);
+ let AsmString = "Inst3";
+}
+
+// -----------------------------------------------------------------------------
+// In the default case, we emit a single decodeToMCinst function and DecodeIdx
+// is shared across all bitwidths.
+
+// CHECK-DEFAULT-LABEL: DecoderTable8[25]
+// CHECK-DEFAULT: DecodeIdx: 0
+// CHECK-DEFAULT: DecodeIdx: 1
+// CHECK-DEFAULT: };
+
+// CHECK-DEFAULT-LABEL: DecoderTable16[25]
+// CHECK-DEFAULT: DecodeIdx: 2
+// CHECK-DEFAULT: DecodeIdx: 3
+// CHECK-DEFAULT: };
+
+// CHECK-DEFAULT-LABEL: template <typename InsnType>
+// CHECK-DEFAULT-NEXT: static DecodeStatus decodeToMCInst
+// CHECK-DEFAULT: case 0
+// CHECK-DEFAULT: case 1
+// CHECK-DEFAULT: case 2
+// CHECK-DEFAULT: case 3
+
+// -----------------------------------------------------------------------------
+// When we specialize per bitwidth, we emit 2 decodeToMCInst functions and
+// DecodeIdx is assigned per bit width.
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: DecoderTable8[25]
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-NO-TABLE: };
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-NO-TABLE: case 0
+// CHECK-SPECIALIZE-NO-TABLE: case 1
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: DecoderTable16[25]
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-NO-TABLE: };
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-NO-TABLE: case 0
+// CHECK-SPECIALIZE-NO-TABLE: case 1
+
+// -----------------------------------------------------------------------------
+// Per bitwidth specialization with function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: DecoderTable8[25]
+// CHECK-SPECIALIZE-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-TABLE: };
+
+// 8 bit functions and function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_0
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_1
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-TABLE-LABEL: static constexpr DecodeFnTy decodeFnTable[] = {
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_0,
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_1,
+// CHECK-SPECIALIZE-TABLE-NEXT: };
+
+// 16 bit functions and function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_0
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_1
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-TABLE-LABEL: static constexpr DecodeFnTy decodeFnTable[] = {
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_0,
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_1,
+// CHECK-SPECIALIZE-TABLE-NEXT: };
diff --git a/llvm/test/TableGen/DecoderEmitterFnTable.td b/llvm/test/TableGen/DecoderEmitterFnTable.td
index 7bed18c19a513..8929e6da716e6 100644
--- a/llvm/test/TableGen/DecoderEmitterFnTable.td
+++ b/llvm/test/TableGen/DecoderEmitterFnTable.td
@@ -71,14 +71,14 @@ def Inst3 : TestInstruction {
let AsmString = "Inst3";
}
-// CHECK-LABEL: DecodeStatus decodeFn0(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn1(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn2(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn3(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_0(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_1(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_2(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_3(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
// CHECK-LABEL: decodeToMCInst(unsigned Idx, DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
// CHECK: static constexpr DecodeFnTy decodeFnTable[]
-// CHECK-NEXT: decodeFn0,
-// CHECK-NEX...
[truncated]
|
@llvm/pr-subscribers-backend-directx Author: Rahul Joshi (jurahul) ChangesThis change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. Current state: Several backends call New state: Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 4.5% reduction in the .rodata size. For RISCV, which today uses a single Actual measured sizes are as follows:
Patch is 35.73 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154865.diff 14 Files Affected:
diff --git a/llvm/include/llvm/MC/MCDecoder.h b/llvm/include/llvm/MC/MCDecoder.h
index 70762a4a5ebae..459c8a6a5ea34 100644
--- a/llvm/include/llvm/MC/MCDecoder.h
+++ b/llvm/include/llvm/MC/MCDecoder.h
@@ -12,6 +12,7 @@
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/Support/MathExtras.h"
+#include <bitset>
#include <cassert>
namespace llvm::MCD {
@@ -48,6 +49,15 @@ fieldFromInstruction(const InsnType &Insn, unsigned StartBit,
return Insn.extractBitsAsZExtValue(NumBits, StartBit);
}
+template <size_t N>
+uint64_t fieldFromInstruction(const std::bitset<N> &Insn, unsigned StartBit,
+ unsigned NumBits) {
+ assert(StartBit + NumBits <= N && "Instruction field out of bounds!");
+ assert(NumBits <= 64 && "Cannot support >64-bit extractions!");
+ const std::bitset<N> Mask(maskTrailingOnes<uint64_t>(NumBits));
+ return ((Insn >> StartBit) & Mask).to_ullong();
+}
+
// Helper function for inserting bits extracted from an encoded instruction into
// an integer-typed field.
template <typename IntType>
@@ -62,6 +72,13 @@ insertBits(IntType &field, IntType bits, unsigned startBit, unsigned numBits) {
field |= bits << startBit;
}
+// InsnBitWidth is essentially a type trait used by the decoder emitter to query
+// the supported bitwidth for a given type. But default, the value is 0, making
+// it an invalid type for use as `InsnType` when instantiating the decoder.
+// Individual targets are expected to provide specializations for these based
+// on their usage.
+template <typename T> static constexpr uint32_t InsnBitWidth = 0;
+
} // namespace llvm::MCD
#endif // LLVM_MC_MCDECODER_H
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index dc9dd220130ea..56dc6f9e8e6e9 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -6,7 +6,8 @@ tablegen(LLVM AMDGPUGenAsmMatcher.inc -gen-asm-matcher)
tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
tablegen(LLVM AMDGPUGenCallingConv.inc -gen-callingconv)
tablegen(LLVM AMDGPUGenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler
+ --specialize-decoders-per-bitwidth)
tablegen(LLVM AMDGPUGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM AMDGPUGenMCCodeEmitter.inc -gen-emitter)
tablegen(LLVM AMDGPUGenMCPseudoLowering.inc -gen-pseudo-lowering)
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 6a2beeed41dfd..80d194afa926b 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -38,6 +38,7 @@
#include "llvm/Support/Compiler.h"
using namespace llvm;
+using namespace llvm::MCD;
#define DEBUG_TYPE "amdgpu-disassembler"
@@ -446,6 +447,14 @@ static DecodeStatus decodeVersionImm(MCInst &Inst, unsigned Imm,
#include "AMDGPUGenDisassemblerTables.inc"
+// Define bitwidths for various types used to instantiate the decoder.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 64;
+template <>
+static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<96>> = 96;
+template <>
+static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<128>> = 128;
+
//===----------------------------------------------------------------------===//
//
//===----------------------------------------------------------------------===//
@@ -498,26 +507,24 @@ template <typename T> static inline T eatBytes(ArrayRef<uint8_t>& Bytes) {
return Res;
}
-static inline DecoderUInt128 eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<96> eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+ using namespace llvm::support::endian;
assert(Bytes.size() >= 12);
- uint64_t Lo =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<96> Lo(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- uint64_t Hi =
- support::endian::read<uint32_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<96> Hi(read<uint32_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(4);
- return DecoderUInt128(Lo, Hi);
+ return (Hi << 64) | Lo;
}
-static inline DecoderUInt128 eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<128> eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+ using namespace llvm::support::endian;
assert(Bytes.size() >= 16);
- uint64_t Lo =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<128> Lo(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- uint64_t Hi =
- support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+ std::bitset<128> Hi(read<uint64_t, endianness::little>(Bytes.data()));
Bytes = Bytes.slice(8);
- return DecoderUInt128(Lo, Hi);
+ return (Hi << 64) | Lo;
}
void AMDGPUDisassembler::decodeImmOperands(MCInst &MI,
@@ -600,14 +607,14 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
// Try to decode DPP and SDWA first to solve conflict with VOP1 and VOP2
// encodings
if (isGFX1250() && Bytes.size() >= 16) {
- DecoderUInt128 DecW = eat16Bytes(Bytes);
+ std::bitset<128> DecW = eat16Bytes(Bytes);
if (tryDecodeInst(DecoderTableGFX1250128, MI, DecW, Address, CS))
break;
Bytes = Bytes_.slice(0, MaxInstBytesNum);
}
if (isGFX11Plus() && Bytes.size() >= 12) {
- DecoderUInt128 DecW = eat12Bytes(Bytes);
+ std::bitset<96> DecW = eat12Bytes(Bytes);
if (isGFX11() &&
tryDecodeInst(DecoderTableGFX1196, DecoderTableGFX11_FAKE1696, MI,
@@ -642,7 +649,7 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
} else if (Bytes.size() >= 16 &&
STI.hasFeature(AMDGPU::FeatureGFX950Insts)) {
- DecoderUInt128 DecW = eat16Bytes(Bytes);
+ std::bitset<128> DecW = eat16Bytes(Bytes);
if (tryDecodeInst(DecoderTableGFX940128, MI, DecW, Address, CS))
break;
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
index f4d164bf10c3c..ded447b6f8d5a 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
@@ -32,44 +32,6 @@ class MCOperand;
class MCSubtargetInfo;
class Twine;
-// Exposes an interface expected by autogenerated code in
-// FixedLenDecoderEmitter
-class DecoderUInt128 {
-private:
- uint64_t Lo = 0;
- uint64_t Hi = 0;
-
-public:
- DecoderUInt128() = default;
- DecoderUInt128(uint64_t Lo, uint64_t Hi = 0) : Lo(Lo), Hi(Hi) {}
- operator bool() const { return Lo || Hi; }
- uint64_t extractBitsAsZExtValue(unsigned NumBits,
- unsigned BitPosition) const {
- assert(NumBits && NumBits <= 64);
- assert(BitPosition < 128);
- uint64_t Val;
- if (BitPosition < 64)
- Val = Lo >> BitPosition | Hi << 1 << (63 - BitPosition);
- else
- Val = Hi >> (BitPosition - 64);
- return Val & ((uint64_t(2) << (NumBits - 1)) - 1);
- }
- DecoderUInt128 operator&(const DecoderUInt128 &RHS) const {
- return DecoderUInt128(Lo & RHS.Lo, Hi & RHS.Hi);
- }
- DecoderUInt128 operator&(const uint64_t &RHS) const {
- return *this & DecoderUInt128(RHS);
- }
- DecoderUInt128 operator~() const { return DecoderUInt128(~Lo, ~Hi); }
- bool operator==(const DecoderUInt128 &RHS) {
- return Lo == RHS.Lo && Hi == RHS.Hi;
- }
- bool operator!=(const DecoderUInt128 &RHS) {
- return Lo != RHS.Lo || Hi != RHS.Hi;
- }
- bool operator!=(const int &RHS) { return *this != DecoderUInt128(RHS); }
-};
-
//===----------------------------------------------------------------------===//
// AMDGPUDisassembler
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/DirectX/CMakeLists.txt b/llvm/lib/Target/DirectX/CMakeLists.txt
index 8100f941c8d94..6c079517e22d6 100644
--- a/llvm/lib/Target/DirectX/CMakeLists.txt
+++ b/llvm/lib/Target/DirectX/CMakeLists.txt
@@ -41,6 +41,7 @@ add_llvm_target(DirectXCodeGen
LINK_COMPONENTS
Analysis
AsmPrinter
+ BinaryFormat
CodeGen
CodeGenTypes
Core
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index 47329b2c2f4d2..531238ae85029 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -7,7 +7,8 @@ tablegen(LLVM RISCVGenAsmWriter.inc -gen-asm-writer)
tablegen(LLVM RISCVGenCompressInstEmitter.inc -gen-compress-inst-emitter)
tablegen(LLVM RISCVGenMacroFusion.inc -gen-macro-fusion-pred)
tablegen(LLVM RISCVGenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler
+ --specialize-decoders-per-bitwidth)
tablegen(LLVM RISCVGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM RISCVGenMCCodeEmitter.inc -gen-emitter)
tablegen(LLVM RISCVGenMCPseudoLowering.inc -gen-pseudo-lowering)
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index dbb16fce8390a..4becd3bbb25f2 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -558,7 +558,7 @@ static DecodeStatus decodeXqccmpRlistS0(MCInst &Inst, uint32_t Imm,
return decodeZcmpRlist(Inst, Imm, Address, Decoder);
}
-static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint32_t Insn,
+static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint16_t Insn,
uint64_t Address,
const MCDisassembler *Decoder) {
uint32_t Rs1 = fieldFromInstruction(Insn, 7, 5);
@@ -700,6 +700,12 @@ static constexpr DecoderListEntry DecoderList32[]{
{DecoderTableZdinxRV32Only32, {}, "RV32-only Zdinx (Double in Integer)"},
};
+// Define bitwidths for various types used to instantiate the decoder.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint16_t> = 16;
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
+// Use uint64_t to represent 48 bit instructions.
+template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 48;
+
DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
ArrayRef<uint8_t> Bytes,
uint64_t Address,
@@ -710,9 +716,7 @@ DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
}
Size = 4;
- // Use uint64_t to match getInstruction48. decodeInstruction is templated
- // on the Insn type.
- uint64_t Insn = support::endian::read32le(Bytes.data());
+ uint32_t Insn = support::endian::read32le(Bytes.data());
for (const DecoderListEntry &Entry : DecoderList32) {
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
@@ -758,9 +762,7 @@ DecodeStatus RISCVDisassembler::getInstruction16(MCInst &MI, uint64_t &Size,
}
Size = 2;
- // Use uint64_t to match getInstruction48. decodeInstruction is templated
- // on the Insn type.
- uint64_t Insn = support::endian::read16le(Bytes.data());
+ uint16_t Insn = support::endian::read16le(Bytes.data());
for (const DecoderListEntry &Entry : DecoderList16) {
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
diff --git a/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td b/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td
new file mode 100644
index 0000000000000..6fd9676939bde
--- /dev/null
+++ b/llvm/test/TableGen/DecoderEmitterBitwidthSpecialization.td
@@ -0,0 +1,169 @@
+// RUN: llvm-tblgen -gen-disassembler -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-DEFAULT
+// RUN: llvm-tblgen -gen-disassembler -specialize-decoders-per-bitwidth -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-SPECIALIZE-NO-TABLE
+// RUN: llvm-tblgen -gen-disassembler -specialize-decoders-per-bitwidth -use-fn-table-in-decode-to-mcinst -I %p/../../include %s | FileCheck %s --check-prefix=CHECK-SPECIALIZE-TABLE
+
+
+include "llvm/Target/Target.td"
+
+def archInstrInfo : InstrInfo { }
+
+def arch : Target {
+ let InstructionSet = archInstrInfo;
+}
+
+let Namespace = "arch" in {
+ def R0 : Register<"r0">;
+ def R1 : Register<"r1">;
+ def R2 : Register<"r2">;
+ def R3 : Register<"r3">;
+}
+def Regs : RegisterClass<"Regs", [i32], 32, (add R0, R1, R2, R3)>;
+
+// Bit 0 of the encoding determines the size (8 or 16 bits).
+// Bits {3..1} define the number define the number of operands encoded.
+class Instruction8Bit<int NumOps> : Instruction {
+ let Size = 1;
+ let OutOperandList = (outs);
+ field bits<8> Inst;
+ let Inst{0} = 0;
+ let Inst{3-1} = NumOps;
+}
+
+class Instruction16Bit<int NumOps> : Instruction {
+ let Size = 2;
+ let OutOperandList = (outs);
+ field bits<16> Inst;
+ let Inst{0} = 1;
+ let Inst{3-1} = NumOps;
+}
+
+// Define instructions to generate 4 cases in decodeToMCInst.
+// Each register operand needs 2 bits to encode.
+
+// An instruction with no inputs.
+def Inst0 : Instruction8Bit<0> {
+ let Inst{7-4} = 0;
+ let InOperandList = (ins);
+ let AsmString = "Inst0";
+}
+
+// An instruction with a single input.
+def Inst1 : Instruction8Bit<1> {
+ bits<2> r0;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = 0;
+ let InOperandList = (ins Regs:$r0);
+ let AsmString = "Inst1";
+}
+
+// An instruction with two inputs.
+def Inst2 : Instruction16Bit<2> {
+ bits<2> r0;
+ bits<2> r1;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = r1;
+ let Inst{15-8} = 0;
+ let InOperandList = (ins Regs:$r0, Regs:$r1);
+ let AsmString = "Inst2";
+}
+
+// An instruction with three inputs. .
+def Inst3 : Instruction16Bit<3> {
+ bits<2> r0;
+ bits<2> r1;
+ bits<2> r2;
+ let Inst{5-4} = r0;
+ let Inst{7-6} = r1;
+ let Inst{9-8} = r2;
+ let Inst{15-10} = 0;
+ let InOperandList = (ins Regs:$r0, Regs:$r1, Regs:$r2);
+ let AsmString = "Inst3";
+}
+
+// -----------------------------------------------------------------------------
+// In the default case, we emit a single decodeToMCinst function and DecodeIdx
+// is shared across all bitwidths.
+
+// CHECK-DEFAULT-LABEL: DecoderTable8[25]
+// CHECK-DEFAULT: DecodeIdx: 0
+// CHECK-DEFAULT: DecodeIdx: 1
+// CHECK-DEFAULT: };
+
+// CHECK-DEFAULT-LABEL: DecoderTable16[25]
+// CHECK-DEFAULT: DecodeIdx: 2
+// CHECK-DEFAULT: DecodeIdx: 3
+// CHECK-DEFAULT: };
+
+// CHECK-DEFAULT-LABEL: template <typename InsnType>
+// CHECK-DEFAULT-NEXT: static DecodeStatus decodeToMCInst
+// CHECK-DEFAULT: case 0
+// CHECK-DEFAULT: case 1
+// CHECK-DEFAULT: case 2
+// CHECK-DEFAULT: case 3
+
+// -----------------------------------------------------------------------------
+// When we specialize per bitwidth, we emit 2 decodeToMCInst functions and
+// DecodeIdx is assigned per bit width.
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: DecoderTable8[25]
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-NO-TABLE: };
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-NO-TABLE: case 0
+// CHECK-SPECIALIZE-NO-TABLE: case 1
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: DecoderTable16[25]
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-NO-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-NO-TABLE: };
+
+// CHECK-SPECIALIZE-NO-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-NO-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-NO-TABLE: case 0
+// CHECK-SPECIALIZE-NO-TABLE: case 1
+
+// -----------------------------------------------------------------------------
+// Per bitwidth specialization with function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: DecoderTable8[25]
+// CHECK-SPECIALIZE-TABLE: DecodeIdx: 0
+// CHECK-SPECIALIZE-TABLE: DecodeIdx: 1
+// CHECK-SPECIALIZE-TABLE: };
+
+// 8 bit functions and function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_0
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_1
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 8, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-TABLE-LABEL: static constexpr DecodeFnTy decodeFnTable[] = {
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_0,
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_8bit_1,
+// CHECK-SPECIALIZE-TABLE-NEXT: };
+
+// 16 bit functions and function table.
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_0
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_1
+
+// CHECK-SPECIALIZE-TABLE-LABEL: template <typename InsnType>
+// CHECK-SPECIALIZE-TABLE-NEXT: static std::enable_if_t<InsnBitWidth<InsnType> == 16, DecodeStatus>
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeToMCInst
+// CHECK-SPECIALIZE-TABLE-LABEL: static constexpr DecodeFnTy decodeFnTable[] = {
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_0,
+// CHECK-SPECIALIZE-TABLE-NEXT: decodeFn_16bit_1,
+// CHECK-SPECIALIZE-TABLE-NEXT: };
diff --git a/llvm/test/TableGen/DecoderEmitterFnTable.td b/llvm/test/TableGen/DecoderEmitterFnTable.td
index 7bed18c19a513..8929e6da716e6 100644
--- a/llvm/test/TableGen/DecoderEmitterFnTable.td
+++ b/llvm/test/TableGen/DecoderEmitterFnTable.td
@@ -71,14 +71,14 @@ def Inst3 : TestInstruction {
let AsmString = "Inst3";
}
-// CHECK-LABEL: DecodeStatus decodeFn0(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn1(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn2(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn3(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_0(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_1(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_2(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_3(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
// CHECK-LABEL: decodeToMCInst(unsigned Idx, DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
// CHECK: static constexpr DecodeFnTy decodeFnTable[]
-// CHECK-NEXT: decodeFn0,
-// CHECK-NEX...
[truncated]
|
@topperc can you please review as well? |
Gentle ping @topperc, would like to get this in early next week (assuming review is ok). |
template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint16_t> = 16; | ||
template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32; | ||
// Use uint64_t to represent 48 bit instructions. | ||
template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 48; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to add 64-bit RISC-V instructions in the future does this break? Will we need to use std::bitset<48> to make 48 unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems backwards. Shouldn't the decoder know what bit width it's for rather than trying to reinfer it from the type that ended up being used to represent that bit width?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@topperc yeah, if we have true 64-bit insts in future, then 48-bit insts need to use a different type. It will not break silently though, since InsnBitWidth
specializations need to be defined at that time and if we have > 1 type with the same InsnBitWidth
value, we will get compiler warnings about ambiguous templating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrtc27 The decoder know the number of bits, but not the actual C++ type to be used for that bitwidth (it generates templated code and backends are free to use whatever compatible types they want to use). InsnBitWidth
is a way for the particular target to communicate this intent to the decoder generated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A previous iteration of this was trying to enforce the actual type as well, but that turned out to be much larger change (see #146593).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why not flip this as a template on uint32_t that gives the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So something like?
template <int BitWidth>
struct TypeForBitWidth {
using T = void;
};
template <>
struct TypeForBitWidth<32> {
using T = uint32_t;
};
template <>
struct TypeForBitWidth<64> {
using T = uint64_t;
};
int decodeToMCInst(TypeForBitWidth<32>::T Insn) {
return 32;
}
int decodeToMCInst(TypeForBitWidth<64>::T Insn) {
return 64;
}
int main() {
uint32_t x = 0;
std::cout << decodeToMCInst(x) << '\n';
uint64_t y = 0;
std::cout << decodeToMCInst(y) << '\n';
}
@topperc and @s-barannikov WDYT? It conveys the same information, but I agree its more directly expressing things. Note that with this though it won't directly work if/when we want to support > 1 bitwidth for a given type (which I did attempt in this PR but gave up because MSVC broke).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure overloading on type is a good idea. For this reason and calling the function with the wrong type issue raised here #154865 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I propose we keep the current scheme for now,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I guess we'll find out if there are any out of tree RISC-V vendors with 64 bit instructions.
using InnerKeyTy = std::pair<StringRef, unsigned>; | ||
using InnerMapTy = std::map<InnerKeyTy, std::vector<unsigned>>; | ||
std::map<unsigned, InnerMapTy> EncMap; | ||
|
||
for (const auto &[HwModeID, EncodingIDs] : EncodingIDsByHwMode) { | ||
for (unsigned EncodingID : EncodingIDs) { | ||
const InstructionEncoding &Encoding = Encodings[EncodingID]; | ||
const Record *EncodingDef = Encoding.getRecord(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead variable
This change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code.
Current state:
Currently, the code generated by the decoder emitter consists of two key functions:
decodeInstruction
which is the entry point into the generated code anddecodeToMCInst
which is invoked when a decode op is reached while traversing through the decoder table. Both functions are templated onInsnType
which is the raw instruction bits that are supplied todecodeInstruction
.Several backends call
decodeInstruction
with differentInsnType
types, leading to several template instantiations of these functions in the final code. As an example, AMDGPU instantiates this function with typeDecoderUInt128
type for decoding 96/128-bit instructions,uint64_t
for decoding 64-bit instructions, anduint32_t
for decoding 32-bit instructions. Since there is just onedecodeToMCInst
in the generated code, it has code that handles decoding for all instruction sizes. However, the decoders emitted for different instructions sizes rarely have any intersection with each other. That means, in the AMDGPU case, the instantiation with InsnType == DecoderUInt128 has decoder code for 32/64-bit instructions that is never exercised. Conversely, the instantiation with InsnType == uint64_t has decoder code for 128/96/32-bit instructions that is never exercised. This leads to unnecessary dead code in the generated disassembler binary (that the compiler cannot eliminate by itself).New state:
With this change, we introduce an option
specialize-decoders-per-bitwidth
. Under this mode, the DecoderEmitter will generate several versions ofdecodeToMCInst
function, one for each bitwidth. The code is still templated, but will require backends to specify, for eachInsnType
used, the bitwidth of the instruction that the type is used to represent using a type-traitInsnBitWidth
. This will enable the templated code to choose the right variant ofdecodeToMCInst
. Under this mode, a particular instantiation will only end up instantiating a single variant ofdecodeToMCInst
generated and that will include only those decoders that are applicable to a single bitwidth, resulting in elimination of the code duplication through instantiation and a reduction in code size.Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size.
Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For RISCV, which today uses a single
uint64_t
type, this results in a 3.7% increase in code size (expected as we instantiate the code 3 times now).Actual measured sizes are as follows: