[AMDGPU] Do not use 64-bit literals with VOP3* encodings #156602

jayfoad · 2025-09-03T07:07:40Z

Encoding sometimes uses a 64-bit instead of 32-bit literal because it
does not know the signedness of the operand: if the value does not fit
in both a 32-bit signed and a 32-bit unsigned then it will use a 64-bit
literal for safety. However it should never do this for VOP3 and VOP3P
encoded instructions, because these encodings do not allow 64-bit
literal operands.

Encoding sometimes uses a 64-bit instead of 32-bit literal because it does not know the signedness of the operand: if the value does not fit in both a 32-bit signed and a 32-bit unsigned then it will use a 64-bit literal for safety. However it should never do this for VOP3 and VOP3P encoded instructions, because these encodings do not allow 64-bit literal operands.

JanekvO · 2025-09-03T11:00:46Z

Do uses of GCNSubtarget::has64BitLiterals also require extra gating? Or is that one already fine, as is?

jayfoad · 2025-09-03T11:48:36Z

Do uses of GCNSubtarget::has64BitLiterals also require extra gating? Or is that one already fine, as is?

It depends on what they're doing, but there is nothing fundamentally wrong with calling GCNSubtarget::has64BitLiterals.

Sisyph

It looks like the right thing to do.

Is it true that handling for the 64bit literals is already correct in codegen, so this patch would not change emitted code?

It is a little suspicious that all the affected test cases are vopc. Perhaps there is some additional handling for vop1/2 that is not present for vopc?

Sisyph · 2025-09-03T14:44:36Z

llvm/test/MC/AMDGPU/vop3-literal.s

-v_lshlrev_b64 v[6:7], v1, 0x3f717273
-// GFX10: v_lshlrev_b64 v[6:7], v1, 0x3f717273    ; encoding: [0x06,0x00,0xff,0xd6,0x01,0xff,0x01,0x00,0x73,0x72,0x71,0x3f]
-// GFX1250: v_lshlrev_b64_e64 v[6:7], v1, 0x3f717273 ; encoding: [0x06,0x00,0x1f,0xd5,0x01,0xff,0x01,0x00,0x73,0x72,0x71,0x3f]
+v_lshlrev_b64 v[6:7], v1, 0xbf717273


Why change the input here?

You can see in the first commit in the PR that it introduces a bogus lit64, which the second commit fixes.

It is a little suspicious that all the affected test cases are vopc.

... and this demonstrates a non-VOPC case that is affected.

jayfoad · 2025-09-03T15:20:13Z

Is it true that handling for the 64bit literals is already correct in codegen, so this patch would not change emitted code?

The patch could change emitted code if codegen created anything like:

$vgpr0_vgpr1, $sgpr0 = V_MAD_U64_U32_e64 $vgpr0, $vgpr0, 0x80000000, 0, implicit $exec

Currently I don't think codegen does this, but only because SIFoldOperands is conservative about when it folds 64-bit constants into their uses. If codegen did generate something like this then I would not consider it a bug in codegen.

Sisyph

Thanks for the clarifications. LGTM

rampitec · 2025-09-03T18:45:24Z

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

                                       raw_ostream &O, bool IsFP) {
  // This part needs to align with AMDGPUOperand::addLiteralImmOperand.
+  bool CanUse64BitLiterals =
+      STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
+      !(Desc.TSFlags & (SIInstrFlags::VOP3 | SIInstrFlags::VOP3P));


FYI, given the multitude of encodings I was using Desc.getSize() != 4 for that instead. This is the real reason lit64 cannot be used, the instruction itself takes more than a DWORD.

Nice idea, but in practice that doesn't work for instructions like V_FMAAK_F64. This is VOP2 but Size is set to 12, i.e. the size including the mandatory literal. Changing that feels like a big project...

Argh, your are right. Maybe I need to check other places.

rampitec

LGTM, but I'd probably check Desc size directly instead of flags.

llvmbot · 2025-09-04T08:29:33Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

Encoding sometimes uses a 64-bit instead of 32-bit literal because it
does not know the signedness of the operand: if the value does not fit
in both a 32-bit signed and a 32-bit unsigned then it will use a 64-bit
literal for safety. However it should never do this for VOP3 and VOP3P
encoded instructions, because these encodings do not allow 64-bit
literal operands.

Patch is 41.70 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156602.diff

6 Files Affected:

(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp (+13-9)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h (+4-4)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp (+17-17)
(modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s (+24-24)
(modified) llvm/test/MC/AMDGPU/vop3-literal.s (+8-8)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3cx.txt (+24-24)

diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
index af6b7a9e7fdeb..ad122390e1f03 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
@@ -80,8 +80,9 @@ void AMDGPUInstPrinter::printFP64ImmOperand(const MCInst *MI, unsigned OpNo,
                                             const MCSubtargetInfo &STI,
                                             raw_ostream &O) {
   // KIMM64
+  const MCInstrDesc &Desc = MII.get(MI->getOpcode());
   uint64_t Imm = MI->getOperand(OpNo).getImm();
-  printLiteral64(Imm, STI, O, /*IsFP=*/true);
+  printLiteral64(Desc, Imm, STI, O, /*IsFP=*/true);
 }
 
 void AMDGPUInstPrinter::printNamedBit(const MCInst *MI, unsigned OpNo,
@@ -590,7 +591,7 @@ void AMDGPUInstPrinter::printImmediate32(uint32_t Imm,
   O << formatHex(static_cast<uint64_t>(Imm));
 }
 
-void AMDGPUInstPrinter::printImmediate64(uint64_t Imm,
+void AMDGPUInstPrinter::printImmediate64(const MCInstrDesc &Desc, uint64_t Imm,
                                          const MCSubtargetInfo &STI,
                                          raw_ostream &O, bool IsFP) {
   int64_t SImm = static_cast<int64_t>(Imm);
@@ -621,20 +622,23 @@ void AMDGPUInstPrinter::printImmediate64(uint64_t Imm,
            STI.hasFeature(AMDGPU::FeatureInv2PiInlineImm))
     O << "0.15915494309189532";
   else
-    printLiteral64(Imm, STI, O, IsFP);
+    printLiteral64(Desc, Imm, STI, O, IsFP);
 }
 
-void AMDGPUInstPrinter::printLiteral64(uint64_t Imm, const MCSubtargetInfo &STI,
+void AMDGPUInstPrinter::printLiteral64(const MCInstrDesc &Desc, uint64_t Imm,
+                                       const MCSubtargetInfo &STI,
                                        raw_ostream &O, bool IsFP) {
   // This part needs to align with AMDGPUOperand::addLiteralImmOperand.
+  bool CanUse64BitLiterals =
+      STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
+      !(Desc.TSFlags & (SIInstrFlags::VOP3 | SIInstrFlags::VOP3P));
   if (IsFP) {
-    if (STI.hasFeature(AMDGPU::Feature64BitLiterals) && Lo_32(Imm))
+    if (CanUse64BitLiterals && Lo_32(Imm))
       O << "lit64(" << formatHex(static_cast<uint64_t>(Imm)) << ')';
     else
       O << formatHex(static_cast<uint64_t>(Hi_32(Imm)));
   } else {
-    if (STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
-        (!isInt<32>(Imm) || !isUInt<32>(Imm)))
+    if (CanUse64BitLiterals && (!isInt<32>(Imm) || !isUInt<32>(Imm)))
       O << "lit64(" << formatHex(static_cast<uint64_t>(Imm)) << ')';
     else
       O << formatHex(static_cast<uint64_t>(Imm));
@@ -749,12 +753,12 @@ void AMDGPUInstPrinter::printRegularOperand(const MCInst *MI, unsigned OpNo,
       break;
     case AMDGPU::OPERAND_REG_IMM_INT64:
     case AMDGPU::OPERAND_REG_INLINE_C_INT64:
-      printImmediate64(Op.getImm(), STI, O, false);
+      printImmediate64(Desc, Op.getImm(), STI, O, false);
       break;
     case AMDGPU::OPERAND_REG_IMM_FP64:
     case AMDGPU::OPERAND_REG_INLINE_C_FP64:
     case AMDGPU::OPERAND_REG_INLINE_AC_FP64:
-      printImmediate64(Op.getImm(), STI, O, true);
+      printImmediate64(Desc, Op.getImm(), STI, O, true);
       break;
     case AMDGPU::OPERAND_REG_INLINE_C_INT16:
     case AMDGPU::OPERAND_REG_IMM_INT16:
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
index 0c4aca4eee472..a92f99c3c0e4b 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
@@ -87,10 +87,10 @@ class AMDGPUInstPrinter : public MCInstPrinter {
                              raw_ostream &O);
   void printImmediate32(uint32_t Imm, const MCSubtargetInfo &STI,
                         raw_ostream &O);
-  void printImmediate64(uint64_t Imm, const MCSubtargetInfo &STI,
-                        raw_ostream &O, bool IsFP);
-  void printLiteral64(uint64_t Imm, const MCSubtargetInfo &STI, raw_ostream &O,
-                      bool IsFP);
+  void printImmediate64(const MCInstrDesc &Desc, uint64_t Imm,
+                        const MCSubtargetInfo &STI, raw_ostream &O, bool IsFP);
+  void printLiteral64(const MCInstrDesc &Desc, uint64_t Imm,
+                      const MCSubtargetInfo &STI, raw_ostream &O, bool IsFP);
   void printOperand(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
                     raw_ostream &O);
   void printRegularOperand(const MCInst *MI, unsigned OpNo,
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
index fc28b9c96f5af..895f9e7078f27 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
@@ -88,7 +88,7 @@ class AMDGPUMCCodeEmitter : public MCCodeEmitter {
 
   /// Encode an fp or int literal.
   std::optional<uint64_t>
-  getLitEncoding(const MCOperand &MO, const MCOperandInfo &OpInfo,
+  getLitEncoding(const MCInstrDesc &Desc, const MCOperand &MO, unsigned OpNo,
                  const MCSubtargetInfo &STI,
                  bool HasMandatoryLiteral = false) const;
 
@@ -219,8 +219,8 @@ static uint32_t getLit16IntEncoding(uint32_t Val, const MCSubtargetInfo &STI) {
   return getLit32Encoding(Val, STI);
 }
 
-static uint32_t getLit64Encoding(uint64_t Val, const MCSubtargetInfo &STI,
-                                 bool IsFP) {
+static uint32_t getLit64Encoding(const MCInstrDesc &Desc, uint64_t Val,
+                                 const MCSubtargetInfo &STI, bool IsFP) {
   uint32_t IntImm = getIntInlineImmEncoding(static_cast<int64_t>(Val));
   if (IntImm != 0)
     return IntImm;
@@ -255,20 +255,21 @@ static uint32_t getLit64Encoding(uint64_t Val, const MCSubtargetInfo &STI,
 
   // The rest part needs to align with AMDGPUInstPrinter::printLiteral64.
 
+  bool CanUse64BitLiterals =
+      STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
+      !(Desc.TSFlags & (SIInstrFlags::VOP3 | SIInstrFlags::VOP3P));
   if (IsFP) {
-    return STI.hasFeature(AMDGPU::Feature64BitLiterals) && Lo_32(Val) ? 254
-                                                                      : 255;
+    return CanUse64BitLiterals && Lo_32(Val) ? 254 : 255;
   }
 
-  return STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
-                 (!isInt<32>(Val) || !isUInt<32>(Val))
-             ? 254
-             : 255;
+  return CanUse64BitLiterals && (!isInt<32>(Val) || !isUInt<32>(Val)) ? 254
+                                                                      : 255;
 }
 
 std::optional<uint64_t> AMDGPUMCCodeEmitter::getLitEncoding(
-    const MCOperand &MO, const MCOperandInfo &OpInfo,
+    const MCInstrDesc &Desc, const MCOperand &MO, unsigned OpNo,
     const MCSubtargetInfo &STI, bool HasMandatoryLiteral) const {
+  const MCOperandInfo &OpInfo = Desc.operands()[OpNo];
   int64_t Imm;
   if (MO.isExpr()) {
     if (!MO.getExpr()->evaluateAsAbsolute(Imm))
@@ -296,14 +297,14 @@ std::optional<uint64_t> AMDGPUMCCodeEmitter::getLitEncoding(
 
   case AMDGPU::OPERAND_REG_IMM_INT64:
   case AMDGPU::OPERAND_REG_INLINE_C_INT64:
-    return getLit64Encoding(static_cast<uint64_t>(Imm), STI, false);
+    return getLit64Encoding(Desc, static_cast<uint64_t>(Imm), STI, false);
 
   case AMDGPU::OPERAND_REG_INLINE_C_FP64:
   case AMDGPU::OPERAND_REG_INLINE_AC_FP64:
-    return getLit64Encoding(static_cast<uint64_t>(Imm), STI, true);
+    return getLit64Encoding(Desc, static_cast<uint64_t>(Imm), STI, true);
 
   case AMDGPU::OPERAND_REG_IMM_FP64: {
-    auto Enc = getLit64Encoding(static_cast<uint64_t>(Imm), STI, true);
+    auto Enc = getLit64Encoding(Desc, static_cast<uint64_t>(Imm), STI, true);
     return (HasMandatoryLiteral && Enc == 255) ? 254 : Enc;
   }
 
@@ -444,7 +445,7 @@ void AMDGPUMCCodeEmitter::encodeInstruction(const MCInst &MI,
 
     // Is this operand a literal immediate?
     const MCOperand &Op = MI.getOperand(i);
-    auto Enc = getLitEncoding(Op, Desc.operands()[i], STI);
+    auto Enc = getLitEncoding(Desc, Op, i, STI);
     if (!Enc || (*Enc != 255 && *Enc != 254))
       continue;
 
@@ -518,7 +519,7 @@ void AMDGPUMCCodeEmitter::getSDWASrcEncoding(const MCInst &MI, unsigned OpNo,
     return;
   } else {
     const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
-    auto Enc = getLitEncoding(MO, Desc.operands()[OpNo], STI);
+    auto Enc = getLitEncoding(Desc, MO, OpNo, STI);
     if (Enc && *Enc != 255) {
       Op = *Enc | SDWA9EncValues::SRC_SGPR_MASK;
       return;
@@ -701,8 +702,7 @@ void AMDGPUMCCodeEmitter::getMachineOpValueCommon(
   if (AMDGPU::isSISrcOperand(Desc, OpNo)) {
     bool HasMandatoryLiteral =
         AMDGPU::hasNamedOperand(MI.getOpcode(), AMDGPU::OpName::imm);
-    if (auto Enc = getLitEncoding(MO, Desc.operands()[OpNo], STI,
-                                  HasMandatoryLiteral)) {
+    if (auto Enc = getLitEncoding(Desc, MO, OpNo, STI, HasMandatoryLiteral)) {
       Op = *Enc;
       return;
     }
diff --git a/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s b/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s
index 4aea7b32f13e5..e0ee928190ada 100644
--- a/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s
+++ b/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s
@@ -368,7 +368,7 @@ v_cmpx_eq_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_eq_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd2,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_eq_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_eq_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd2,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd2,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_eq_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_eq_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd2,0xd4,0x7e,0xfa,0x01,0x00]
@@ -386,7 +386,7 @@ v_cmpx_eq_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_eq_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd2,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_eq_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_eq_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd2,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd2,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_eq_u16_e64 v1, v2
 // GFX1250: v_cmpx_eq_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xba,0xd4,0x01,0x05,0x02,0x00]
@@ -494,7 +494,7 @@ v_cmpx_eq_u64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_eq_u64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xda,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_eq_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_eq_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xda,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xda,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_eq_u64_e64 exec, src_scc
 // GFX1250: v_cmpx_eq_u64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xda,0xd4,0x7e,0xfa,0x01,0x00]
@@ -512,7 +512,7 @@ v_cmpx_eq_u64_e64 src_scc, exec
 // GFX1250: v_cmpx_eq_u64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xda,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_eq_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_eq_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xda,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_u64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xda,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_ge_f16_e64 v1, v2
 // GFX1250: v_cmpx_ge_f16_e64 v1, v2                ; encoding: [0x7e,0x00,0x86,0xd4,0x01,0x05,0x02,0x00]
@@ -746,7 +746,7 @@ v_cmpx_ge_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_ge_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd6,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_ge_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_ge_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd6,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd6,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_ge_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_ge_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd6,0xd4,0x7e,0xfa,0x01,0x00]
@@ -764,7 +764,7 @@ v_cmpx_ge_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_ge_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd6,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_ge_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_ge_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd6,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd6,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_ge_u16_e64 v1, v2
 // GFX1250: v_cmpx_ge_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xbe,0xd4,0x01,0x05,0x02,0x00]
@@ -872,7 +872,7 @@ v_cmpx_ge_u64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_ge_u64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xde,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_ge_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_ge_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xde,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xde,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_ge_u64_e64 exec, src_scc
 // GFX1250: v_cmpx_ge_u64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xde,0xd4,0x7e,0xfa,0x01,0x00]
@@ -890,7 +890,7 @@ v_cmpx_ge_u64_e64 src_scc, exec
 // GFX1250: v_cmpx_ge_u64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xde,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_ge_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_ge_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xde,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_u64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xde,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_gt_f16_e64 v1, v2
 // GFX1250: v_cmpx_gt_f16_e64 v1, v2                ; encoding: [0x7e,0x00,0x84,0xd4,0x01,0x05,0x02,0x00]
@@ -1124,7 +1124,7 @@ v_cmpx_gt_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_gt_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd4,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_gt_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_gt_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd4,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd4,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_gt_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_gt_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd4,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1142,7 +1142,7 @@ v_cmpx_gt_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_gt_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd4,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_gt_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_gt_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd4,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd4,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_gt_u16_e64 v1, v2
 // GFX1250: v_cmpx_gt_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xbc,0xd4,0x01,0x05,0x02,0x00]
@@ -1250,7 +1250,7 @@ v_cmpx_gt_u64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_gt_u64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xdc,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_gt_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_gt_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xdc,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xdc,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_gt_u64_e64 exec, src_scc
 // GFX1250: v_cmpx_gt_u64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xdc,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1268,7 +1268,7 @@ v_cmpx_gt_u64_e64 src_scc, exec
 // GFX1250: v_cmpx_gt_u64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xdc,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_gt_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_gt_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xdc,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_u64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xdc,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_le_f16_e64 v1, v2
 // GFX1250: v_cmpx_le_f16_e64 v1, v2                ; encoding: [0x7e,0x00,0x83,0xd4,0x01,0x05,0x02,0x00]
@@ -1502,7 +1502,7 @@ v_cmpx_le_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_le_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd3,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_le_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_le_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd3,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd3,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_le_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_le_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd3,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1520,7 +1520,7 @@ v_cmpx_le_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_le_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd3,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_le_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_le_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd3,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd3,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_le_u16_e64 v1, v2
 // GFX1250: v_cmpx_le_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xbb,0xd4,0x01,0x05,0x02,0x00]
@@ -1628,7 +1628,7 @@ v_cmpx_le_u64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_le_u64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xdb,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_le_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_le_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xdb,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xdb,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_le_u64_e64 exec, src_scc
 // GFX1250: v_cmpx_le_u64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xdb,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1646,7 +1646,7 @@ v_cmpx_le_u64_e64 src_scc, exec
 // GFX1250: v_cmpx_le_u64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xdb,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_le_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_le_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xdb,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_u64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xdb,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_lg_f16_e64 v1, v2
 // GFX1250: v_cmpx_lg_f16_e64 v1, v2                ; encoding: [0x7e,0x00,0x85,0xd4,0x01,0x05,0x02,0x00]
@@ -2006,7 +2006,7 @@ v_cmpx_lt_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_lt_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd1,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_lt_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_lt_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd1,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_lt_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd1,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_lt_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_lt_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd1,0xd4,0x7e,0xfa,0x01,0x00]
@@ -2024,7 +2024,7 @@ v_cmpx_lt_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_lt_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd1,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_lt_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_lt_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd1,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_lt_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd1,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_lt_u16_e64 v1, v2
 // GFX1250: v_cmpx_lt_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xb9,0xd4,0x01,0x05,0x02,0x00]
@@ -213...
[truncated]

rampitec · 2025-09-04T08:46:52Z

Is it true that handling for the 64bit literals is already correct in codegen, so this patch would not change emitted code?

The patch could change emitted code if codegen created anything like:
$vgpr0_vgpr1, $sgpr0 = V_MAD_U64_U32_e64 $vgpr0, $vgpr0, 0x80000000, 0, implicit $exec
Currently I don't think codegen does this, but only because SIFoldOperands is conservative about when it folds 64-bit constants into their uses. If codegen did generate something like this then I would not consider it a bug in codegen.

This seems like a situation when folding could have happened, but was gated by the conservative condition Desc.getSize() != 4.

rampitec · 2025-09-04T08:48:45Z

Then in general we probably will need signed and unsigned integer operands to fully use it.

rampitec · 2025-09-04T08:54:57Z

LGTM anyway, it fixes the bug and we can think how to optimize more separately.

jayfoad · 2025-09-04T08:58:43Z

Then in general we probably will need signed and unsigned integer operands to fully use it.

Exactly! If/when we know the signedness of all operands, then we won't have to make conservative decisions based on !isInt<32>(Imm) || !isUInt<32>(Imm), and we can revert this patch.

jayfoad added 2 commits September 3, 2025 07:51

Tweak test to show one more bogus lit64

ed684e8

jayfoad requested review from Sisyph, shiltian and rampitec September 3, 2025 07:08

Sisyph reviewed Sep 3, 2025

View reviewed changes

Sisyph approved these changes Sep 3, 2025

View reviewed changes

rampitec reviewed Sep 3, 2025

View reviewed changes

rampitec approved these changes Sep 3, 2025

View reviewed changes

Merge commit 'd29dc18992c8' into vop3-lit64

339afa7

llvmbot added the backend:AMDGPU label Sep 4, 2025

jayfoad enabled auto-merge (squash) September 4, 2025 08:29

jayfoad merged commit 88effbf into llvm:main Sep 4, 2025
10 checks passed

jayfoad deleted the vop3-lit64 branch September 4, 2025 09:36

[AMDGPU] Do not use 64-bit literals with VOP3* encodings #156602

[AMDGPU] Do not use 64-bit literals with VOP3* encodings #156602

Uh oh!

Conversation

jayfoad commented Sep 3, 2025

Uh oh!

JanekvO commented Sep 3, 2025

Uh oh!

jayfoad commented Sep 3, 2025

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

Sisyph Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Sep 3, 2025

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

rampitec Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

rampitec Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

rampitec left a comment

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Sep 4, 2025

Uh oh!

rampitec commented Sep 4, 2025

Uh oh!

rampitec commented Sep 4, 2025

Uh oh!

rampitec commented Sep 4, 2025

Uh oh!

jayfoad commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!