Skip to content

Conversation

jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Sep 3, 2025

Encoding sometimes uses a 64-bit instead of 32-bit literal because it
does not know the signedness of the operand: if the value does not fit
in both a 32-bit signed and a 32-bit unsigned then it will use a 64-bit
literal for safety. However it should never do this for VOP3 and VOP3P
encoded instructions, because these encodings do not allow 64-bit
literal operands.

Encoding sometimes uses a 64-bit instead of 32-bit literal because it
does not know the signedness of the operand: if the value does not fit
in both a 32-bit signed and a 32-bit unsigned then it will use a 64-bit
literal for safety. However it should never do this for VOP3 and VOP3P
encoded instructions, because these encodings do not allow 64-bit
literal operands.
@JanekvO
Copy link
Contributor

JanekvO commented Sep 3, 2025

Do uses of GCNSubtarget::has64BitLiterals also require extra gating? Or is that one already fine, as is?

@jayfoad
Copy link
Contributor Author

jayfoad commented Sep 3, 2025

Do uses of GCNSubtarget::has64BitLiterals also require extra gating? Or is that one already fine, as is?

It depends on what they're doing, but there is nothing fundamentally wrong with calling GCNSubtarget::has64BitLiterals.

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the right thing to do.

Is it true that handling for the 64bit literals is already correct in codegen, so this patch would not change emitted code?

It is a little suspicious that all the affected test cases are vopc. Perhaps there is some additional handling for vop1/2 that is not present for vopc?

v_lshlrev_b64 v[6:7], v1, 0x3f717273
// GFX10: v_lshlrev_b64 v[6:7], v1, 0x3f717273 ; encoding: [0x06,0x00,0xff,0xd6,0x01,0xff,0x01,0x00,0x73,0x72,0x71,0x3f]
// GFX1250: v_lshlrev_b64_e64 v[6:7], v1, 0x3f717273 ; encoding: [0x06,0x00,0x1f,0xd5,0x01,0xff,0x01,0x00,0x73,0x72,0x71,0x3f]
v_lshlrev_b64 v[6:7], v1, 0xbf717273
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the input here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see in the first commit in the PR that it introduces a bogus lit64, which the second commit fixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little suspicious that all the affected test cases are vopc.

... and this demonstrates a non-VOPC case that is affected.

@jayfoad
Copy link
Contributor Author

jayfoad commented Sep 3, 2025

Is it true that handling for the 64bit literals is already correct in codegen, so this patch would not change emitted code?

The patch could change emitted code if codegen created anything like:

$vgpr0_vgpr1, $sgpr0 = V_MAD_U64_U32_e64 $vgpr0, $vgpr0, 0x80000000, 0, implicit $exec

Currently I don't think codegen does this, but only because SIFoldOperands is conservative about when it folds 64-bit constants into their uses. If codegen did generate something like this then I would not consider it a bug in codegen.

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications. LGTM

raw_ostream &O, bool IsFP) {
// This part needs to align with AMDGPUOperand::addLiteralImmOperand.
bool CanUse64BitLiterals =
STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
!(Desc.TSFlags & (SIInstrFlags::VOP3 | SIInstrFlags::VOP3P));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, given the multitude of encodings I was using Desc.getSize() != 4 for that instead. This is the real reason lit64 cannot be used, the instruction itself takes more than a DWORD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, but in practice that doesn't work for instructions like V_FMAAK_F64. This is VOP2 but Size is set to 12, i.e. the size including the mandatory literal. Changing that feels like a big project...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, your are right. Maybe I need to check other places.

Copy link
Collaborator

@rampitec rampitec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'd probably check Desc size directly instead of flags.

@jayfoad jayfoad enabled auto-merge (squash) September 4, 2025 08:29
@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

Encoding sometimes uses a 64-bit instead of 32-bit literal because it
does not know the signedness of the operand: if the value does not fit
in both a 32-bit signed and a 32-bit unsigned then it will use a 64-bit
literal for safety. However it should never do this for VOP3 and VOP3P
encoded instructions, because these encodings do not allow 64-bit
literal operands.


Patch is 41.70 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156602.diff

6 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp (+13-9)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h (+4-4)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp (+17-17)
  • (modified) llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s (+24-24)
  • (modified) llvm/test/MC/AMDGPU/vop3-literal.s (+8-8)
  • (modified) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3cx.txt (+24-24)
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
index af6b7a9e7fdeb..ad122390e1f03 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
@@ -80,8 +80,9 @@ void AMDGPUInstPrinter::printFP64ImmOperand(const MCInst *MI, unsigned OpNo,
                                             const MCSubtargetInfo &STI,
                                             raw_ostream &O) {
   // KIMM64
+  const MCInstrDesc &Desc = MII.get(MI->getOpcode());
   uint64_t Imm = MI->getOperand(OpNo).getImm();
-  printLiteral64(Imm, STI, O, /*IsFP=*/true);
+  printLiteral64(Desc, Imm, STI, O, /*IsFP=*/true);
 }
 
 void AMDGPUInstPrinter::printNamedBit(const MCInst *MI, unsigned OpNo,
@@ -590,7 +591,7 @@ void AMDGPUInstPrinter::printImmediate32(uint32_t Imm,
   O << formatHex(static_cast<uint64_t>(Imm));
 }
 
-void AMDGPUInstPrinter::printImmediate64(uint64_t Imm,
+void AMDGPUInstPrinter::printImmediate64(const MCInstrDesc &Desc, uint64_t Imm,
                                          const MCSubtargetInfo &STI,
                                          raw_ostream &O, bool IsFP) {
   int64_t SImm = static_cast<int64_t>(Imm);
@@ -621,20 +622,23 @@ void AMDGPUInstPrinter::printImmediate64(uint64_t Imm,
            STI.hasFeature(AMDGPU::FeatureInv2PiInlineImm))
     O << "0.15915494309189532";
   else
-    printLiteral64(Imm, STI, O, IsFP);
+    printLiteral64(Desc, Imm, STI, O, IsFP);
 }
 
-void AMDGPUInstPrinter::printLiteral64(uint64_t Imm, const MCSubtargetInfo &STI,
+void AMDGPUInstPrinter::printLiteral64(const MCInstrDesc &Desc, uint64_t Imm,
+                                       const MCSubtargetInfo &STI,
                                        raw_ostream &O, bool IsFP) {
   // This part needs to align with AMDGPUOperand::addLiteralImmOperand.
+  bool CanUse64BitLiterals =
+      STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
+      !(Desc.TSFlags & (SIInstrFlags::VOP3 | SIInstrFlags::VOP3P));
   if (IsFP) {
-    if (STI.hasFeature(AMDGPU::Feature64BitLiterals) && Lo_32(Imm))
+    if (CanUse64BitLiterals && Lo_32(Imm))
       O << "lit64(" << formatHex(static_cast<uint64_t>(Imm)) << ')';
     else
       O << formatHex(static_cast<uint64_t>(Hi_32(Imm)));
   } else {
-    if (STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
-        (!isInt<32>(Imm) || !isUInt<32>(Imm)))
+    if (CanUse64BitLiterals && (!isInt<32>(Imm) || !isUInt<32>(Imm)))
       O << "lit64(" << formatHex(static_cast<uint64_t>(Imm)) << ')';
     else
       O << formatHex(static_cast<uint64_t>(Imm));
@@ -749,12 +753,12 @@ void AMDGPUInstPrinter::printRegularOperand(const MCInst *MI, unsigned OpNo,
       break;
     case AMDGPU::OPERAND_REG_IMM_INT64:
     case AMDGPU::OPERAND_REG_INLINE_C_INT64:
-      printImmediate64(Op.getImm(), STI, O, false);
+      printImmediate64(Desc, Op.getImm(), STI, O, false);
       break;
     case AMDGPU::OPERAND_REG_IMM_FP64:
     case AMDGPU::OPERAND_REG_INLINE_C_FP64:
     case AMDGPU::OPERAND_REG_INLINE_AC_FP64:
-      printImmediate64(Op.getImm(), STI, O, true);
+      printImmediate64(Desc, Op.getImm(), STI, O, true);
       break;
     case AMDGPU::OPERAND_REG_INLINE_C_INT16:
     case AMDGPU::OPERAND_REG_IMM_INT16:
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
index 0c4aca4eee472..a92f99c3c0e4b 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
@@ -87,10 +87,10 @@ class AMDGPUInstPrinter : public MCInstPrinter {
                              raw_ostream &O);
   void printImmediate32(uint32_t Imm, const MCSubtargetInfo &STI,
                         raw_ostream &O);
-  void printImmediate64(uint64_t Imm, const MCSubtargetInfo &STI,
-                        raw_ostream &O, bool IsFP);
-  void printLiteral64(uint64_t Imm, const MCSubtargetInfo &STI, raw_ostream &O,
-                      bool IsFP);
+  void printImmediate64(const MCInstrDesc &Desc, uint64_t Imm,
+                        const MCSubtargetInfo &STI, raw_ostream &O, bool IsFP);
+  void printLiteral64(const MCInstrDesc &Desc, uint64_t Imm,
+                      const MCSubtargetInfo &STI, raw_ostream &O, bool IsFP);
   void printOperand(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
                     raw_ostream &O);
   void printRegularOperand(const MCInst *MI, unsigned OpNo,
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
index fc28b9c96f5af..895f9e7078f27 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
@@ -88,7 +88,7 @@ class AMDGPUMCCodeEmitter : public MCCodeEmitter {
 
   /// Encode an fp or int literal.
   std::optional<uint64_t>
-  getLitEncoding(const MCOperand &MO, const MCOperandInfo &OpInfo,
+  getLitEncoding(const MCInstrDesc &Desc, const MCOperand &MO, unsigned OpNo,
                  const MCSubtargetInfo &STI,
                  bool HasMandatoryLiteral = false) const;
 
@@ -219,8 +219,8 @@ static uint32_t getLit16IntEncoding(uint32_t Val, const MCSubtargetInfo &STI) {
   return getLit32Encoding(Val, STI);
 }
 
-static uint32_t getLit64Encoding(uint64_t Val, const MCSubtargetInfo &STI,
-                                 bool IsFP) {
+static uint32_t getLit64Encoding(const MCInstrDesc &Desc, uint64_t Val,
+                                 const MCSubtargetInfo &STI, bool IsFP) {
   uint32_t IntImm = getIntInlineImmEncoding(static_cast<int64_t>(Val));
   if (IntImm != 0)
     return IntImm;
@@ -255,20 +255,21 @@ static uint32_t getLit64Encoding(uint64_t Val, const MCSubtargetInfo &STI,
 
   // The rest part needs to align with AMDGPUInstPrinter::printLiteral64.
 
+  bool CanUse64BitLiterals =
+      STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
+      !(Desc.TSFlags & (SIInstrFlags::VOP3 | SIInstrFlags::VOP3P));
   if (IsFP) {
-    return STI.hasFeature(AMDGPU::Feature64BitLiterals) && Lo_32(Val) ? 254
-                                                                      : 255;
+    return CanUse64BitLiterals && Lo_32(Val) ? 254 : 255;
   }
 
-  return STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
-                 (!isInt<32>(Val) || !isUInt<32>(Val))
-             ? 254
-             : 255;
+  return CanUse64BitLiterals && (!isInt<32>(Val) || !isUInt<32>(Val)) ? 254
+                                                                      : 255;
 }
 
 std::optional<uint64_t> AMDGPUMCCodeEmitter::getLitEncoding(
-    const MCOperand &MO, const MCOperandInfo &OpInfo,
+    const MCInstrDesc &Desc, const MCOperand &MO, unsigned OpNo,
     const MCSubtargetInfo &STI, bool HasMandatoryLiteral) const {
+  const MCOperandInfo &OpInfo = Desc.operands()[OpNo];
   int64_t Imm;
   if (MO.isExpr()) {
     if (!MO.getExpr()->evaluateAsAbsolute(Imm))
@@ -296,14 +297,14 @@ std::optional<uint64_t> AMDGPUMCCodeEmitter::getLitEncoding(
 
   case AMDGPU::OPERAND_REG_IMM_INT64:
   case AMDGPU::OPERAND_REG_INLINE_C_INT64:
-    return getLit64Encoding(static_cast<uint64_t>(Imm), STI, false);
+    return getLit64Encoding(Desc, static_cast<uint64_t>(Imm), STI, false);
 
   case AMDGPU::OPERAND_REG_INLINE_C_FP64:
   case AMDGPU::OPERAND_REG_INLINE_AC_FP64:
-    return getLit64Encoding(static_cast<uint64_t>(Imm), STI, true);
+    return getLit64Encoding(Desc, static_cast<uint64_t>(Imm), STI, true);
 
   case AMDGPU::OPERAND_REG_IMM_FP64: {
-    auto Enc = getLit64Encoding(static_cast<uint64_t>(Imm), STI, true);
+    auto Enc = getLit64Encoding(Desc, static_cast<uint64_t>(Imm), STI, true);
     return (HasMandatoryLiteral && Enc == 255) ? 254 : Enc;
   }
 
@@ -444,7 +445,7 @@ void AMDGPUMCCodeEmitter::encodeInstruction(const MCInst &MI,
 
     // Is this operand a literal immediate?
     const MCOperand &Op = MI.getOperand(i);
-    auto Enc = getLitEncoding(Op, Desc.operands()[i], STI);
+    auto Enc = getLitEncoding(Desc, Op, i, STI);
     if (!Enc || (*Enc != 255 && *Enc != 254))
       continue;
 
@@ -518,7 +519,7 @@ void AMDGPUMCCodeEmitter::getSDWASrcEncoding(const MCInst &MI, unsigned OpNo,
     return;
   } else {
     const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
-    auto Enc = getLitEncoding(MO, Desc.operands()[OpNo], STI);
+    auto Enc = getLitEncoding(Desc, MO, OpNo, STI);
     if (Enc && *Enc != 255) {
       Op = *Enc | SDWA9EncValues::SRC_SGPR_MASK;
       return;
@@ -701,8 +702,7 @@ void AMDGPUMCCodeEmitter::getMachineOpValueCommon(
   if (AMDGPU::isSISrcOperand(Desc, OpNo)) {
     bool HasMandatoryLiteral =
         AMDGPU::hasNamedOperand(MI.getOpcode(), AMDGPU::OpName::imm);
-    if (auto Enc = getLitEncoding(MO, Desc.operands()[OpNo], STI,
-                                  HasMandatoryLiteral)) {
+    if (auto Enc = getLitEncoding(Desc, MO, OpNo, STI, HasMandatoryLiteral)) {
       Op = *Enc;
       return;
     }
diff --git a/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s b/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s
index 4aea7b32f13e5..e0ee928190ada 100644
--- a/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s
+++ b/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s
@@ -368,7 +368,7 @@ v_cmpx_eq_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_eq_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd2,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_eq_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_eq_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd2,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd2,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_eq_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_eq_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd2,0xd4,0x7e,0xfa,0x01,0x00]
@@ -386,7 +386,7 @@ v_cmpx_eq_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_eq_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd2,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_eq_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_eq_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd2,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd2,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_eq_u16_e64 v1, v2
 // GFX1250: v_cmpx_eq_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xba,0xd4,0x01,0x05,0x02,0x00]
@@ -494,7 +494,7 @@ v_cmpx_eq_u64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_eq_u64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xda,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_eq_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_eq_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xda,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xda,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_eq_u64_e64 exec, src_scc
 // GFX1250: v_cmpx_eq_u64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xda,0xd4,0x7e,0xfa,0x01,0x00]
@@ -512,7 +512,7 @@ v_cmpx_eq_u64_e64 src_scc, exec
 // GFX1250: v_cmpx_eq_u64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xda,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_eq_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_eq_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xda,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_u64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xda,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_ge_f16_e64 v1, v2
 // GFX1250: v_cmpx_ge_f16_e64 v1, v2                ; encoding: [0x7e,0x00,0x86,0xd4,0x01,0x05,0x02,0x00]
@@ -746,7 +746,7 @@ v_cmpx_ge_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_ge_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd6,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_ge_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_ge_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd6,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd6,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_ge_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_ge_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd6,0xd4,0x7e,0xfa,0x01,0x00]
@@ -764,7 +764,7 @@ v_cmpx_ge_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_ge_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd6,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_ge_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_ge_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd6,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd6,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_ge_u16_e64 v1, v2
 // GFX1250: v_cmpx_ge_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xbe,0xd4,0x01,0x05,0x02,0x00]
@@ -872,7 +872,7 @@ v_cmpx_ge_u64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_ge_u64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xde,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_ge_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_ge_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xde,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xde,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_ge_u64_e64 exec, src_scc
 // GFX1250: v_cmpx_ge_u64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xde,0xd4,0x7e,0xfa,0x01,0x00]
@@ -890,7 +890,7 @@ v_cmpx_ge_u64_e64 src_scc, exec
 // GFX1250: v_cmpx_ge_u64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xde,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_ge_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_ge_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xde,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_u64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xde,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_gt_f16_e64 v1, v2
 // GFX1250: v_cmpx_gt_f16_e64 v1, v2                ; encoding: [0x7e,0x00,0x84,0xd4,0x01,0x05,0x02,0x00]
@@ -1124,7 +1124,7 @@ v_cmpx_gt_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_gt_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd4,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_gt_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_gt_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd4,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd4,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_gt_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_gt_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd4,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1142,7 +1142,7 @@ v_cmpx_gt_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_gt_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd4,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_gt_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_gt_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd4,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd4,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_gt_u16_e64 v1, v2
 // GFX1250: v_cmpx_gt_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xbc,0xd4,0x01,0x05,0x02,0x00]
@@ -1250,7 +1250,7 @@ v_cmpx_gt_u64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_gt_u64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xdc,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_gt_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_gt_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xdc,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xdc,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_gt_u64_e64 exec, src_scc
 // GFX1250: v_cmpx_gt_u64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xdc,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1268,7 +1268,7 @@ v_cmpx_gt_u64_e64 src_scc, exec
 // GFX1250: v_cmpx_gt_u64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xdc,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_gt_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_gt_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xdc,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_u64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xdc,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_le_f16_e64 v1, v2
 // GFX1250: v_cmpx_le_f16_e64 v1, v2                ; encoding: [0x7e,0x00,0x83,0xd4,0x01,0x05,0x02,0x00]
@@ -1502,7 +1502,7 @@ v_cmpx_le_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_le_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd3,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_le_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_le_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd3,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd3,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_le_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_le_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd3,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1520,7 +1520,7 @@ v_cmpx_le_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_le_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd3,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_le_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_le_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd3,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd3,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_le_u16_e64 v1, v2
 // GFX1250: v_cmpx_le_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xbb,0xd4,0x01,0x05,0x02,0x00]
@@ -1628,7 +1628,7 @@ v_cmpx_le_u64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_le_u64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xdb,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_le_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_le_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xdb,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xdb,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_le_u64_e64 exec, src_scc
 // GFX1250: v_cmpx_le_u64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xdb,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1646,7 +1646,7 @@ v_cmpx_le_u64_e64 src_scc, exec
 // GFX1250: v_cmpx_le_u64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xdb,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_le_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_le_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xdb,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_u64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xdb,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_lg_f16_e64 v1, v2
 // GFX1250: v_cmpx_lg_f16_e64 v1, v2                ; encoding: [0x7e,0x00,0x85,0xd4,0x01,0x05,0x02,0x00]
@@ -2006,7 +2006,7 @@ v_cmpx_lt_i64_e64 vcc, ttmp[14:15]
 // GFX1250: v_cmpx_lt_i64_e64 vcc, ttmp[14:15]      ; encoding: [0x7e,0x00,0xd1,0xd4,0x6a,0xf4,0x00,0x00]
 
 v_cmpx_lt_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_lt_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd1,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_lt_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd1,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_lt_i64_e64 exec, src_scc
 // GFX1250: v_cmpx_lt_i64_e64 exec, src_scc         ; encoding: [0x7e,0x00,0xd1,0xd4,0x7e,0xfa,0x01,0x00]
@@ -2024,7 +2024,7 @@ v_cmpx_lt_i64_e64 src_scc, exec
 // GFX1250: v_cmpx_lt_i64_e64 src_scc, exec         ; encoding: [0x7e,0x00,0xd1,0xd4,0xfd,0xfc,0x00,0x00]
 
 v_cmpx_lt_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_lt_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd1,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_lt_i64_e64 0xaf123456, vcc       ; encoding: [0x7e,0x00,0xd1,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
 
 v_cmpx_lt_u16_e64 v1, v2
 // GFX1250: v_cmpx_lt_u16_e64 v1, v2                ; encoding: [0x7e,0x00,0xb9,0xd4,0x01,0x05,0x02,0x00]
@@ -213...
[truncated]

@rampitec
Copy link
Collaborator

rampitec commented Sep 4, 2025

Is it true that handling for the 64bit literals is already correct in codegen, so this patch would not change emitted code?

The patch could change emitted code if codegen created anything like:

$vgpr0_vgpr1, $sgpr0 = V_MAD_U64_U32_e64 $vgpr0, $vgpr0, 0x80000000, 0, implicit $exec

Currently I don't think codegen does this, but only because SIFoldOperands is conservative about when it folds 64-bit constants into their uses. If codegen did generate something like this then I would not consider it a bug in codegen.

This seems like a situation when folding could have happened, but was gated by the conservative condition Desc.getSize() != 4.

@rampitec
Copy link
Collaborator

rampitec commented Sep 4, 2025

Then in general we probably will need signed and unsigned integer operands to fully use it.

@rampitec
Copy link
Collaborator

rampitec commented Sep 4, 2025

LGTM anyway, it fixes the bug and we can think how to optimize more separately.

@jayfoad
Copy link
Contributor Author

jayfoad commented Sep 4, 2025

Then in general we probably will need signed and unsigned integer operands to fully use it.

Exactly! If/when we know the signedness of all operands, then we won't have to make conservative decisions based on !isInt<32>(Imm) || !isUInt<32>(Imm), and we can revert this patch.

@jayfoad jayfoad merged commit 88effbf into llvm:main Sep 4, 2025
10 checks passed
@jayfoad jayfoad deleted the vop3-lit64 branch September 4, 2025 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants