-
Notifications
You must be signed in to change notification settings - Fork 15k
[AMDGPU] Do not use 64-bit literals with VOP3* encodings #156602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Encoding sometimes uses a 64-bit instead of 32-bit literal because it does not know the signedness of the operand: if the value does not fit in both a 32-bit signed and a 32-bit unsigned then it will use a 64-bit literal for safety. However it should never do this for VOP3 and VOP3P encoded instructions, because these encodings do not allow 64-bit literal operands.
Do uses of |
It depends on what they're doing, but there is nothing fundamentally wrong with calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the right thing to do.
Is it true that handling for the 64bit literals is already correct in codegen, so this patch would not change emitted code?
It is a little suspicious that all the affected test cases are vopc. Perhaps there is some additional handling for vop1/2 that is not present for vopc?
v_lshlrev_b64 v[6:7], v1, 0x3f717273 | ||
// GFX10: v_lshlrev_b64 v[6:7], v1, 0x3f717273 ; encoding: [0x06,0x00,0xff,0xd6,0x01,0xff,0x01,0x00,0x73,0x72,0x71,0x3f] | ||
// GFX1250: v_lshlrev_b64_e64 v[6:7], v1, 0x3f717273 ; encoding: [0x06,0x00,0x1f,0xd5,0x01,0xff,0x01,0x00,0x73,0x72,0x71,0x3f] | ||
v_lshlrev_b64 v[6:7], v1, 0xbf717273 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the input here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see in the first commit in the PR that it introduces a bogus lit64, which the second commit fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little suspicious that all the affected test cases are vopc.
... and this demonstrates a non-VOPC case that is affected.
The patch could change emitted code if codegen created anything like:
Currently I don't think codegen does this, but only because SIFoldOperands is conservative about when it folds 64-bit constants into their uses. If codegen did generate something like this then I would not consider it a bug in codegen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarifications. LGTM
raw_ostream &O, bool IsFP) { | ||
// This part needs to align with AMDGPUOperand::addLiteralImmOperand. | ||
bool CanUse64BitLiterals = | ||
STI.hasFeature(AMDGPU::Feature64BitLiterals) && | ||
!(Desc.TSFlags & (SIInstrFlags::VOP3 | SIInstrFlags::VOP3P)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, given the multitude of encodings I was using Desc.getSize() != 4
for that instead. This is the real reason lit64 cannot be used, the instruction itself takes more than a DWORD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea, but in practice that doesn't work for instructions like V_FMAAK_F64. This is VOP2 but Size is set to 12, i.e. the size including the mandatory literal. Changing that feels like a big project...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, your are right. Maybe I need to check other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd probably check Desc size directly instead of flags.
@llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) ChangesEncoding sometimes uses a 64-bit instead of 32-bit literal because it Patch is 41.70 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156602.diff 6 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
index af6b7a9e7fdeb..ad122390e1f03 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
@@ -80,8 +80,9 @@ void AMDGPUInstPrinter::printFP64ImmOperand(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,
raw_ostream &O) {
// KIMM64
+ const MCInstrDesc &Desc = MII.get(MI->getOpcode());
uint64_t Imm = MI->getOperand(OpNo).getImm();
- printLiteral64(Imm, STI, O, /*IsFP=*/true);
+ printLiteral64(Desc, Imm, STI, O, /*IsFP=*/true);
}
void AMDGPUInstPrinter::printNamedBit(const MCInst *MI, unsigned OpNo,
@@ -590,7 +591,7 @@ void AMDGPUInstPrinter::printImmediate32(uint32_t Imm,
O << formatHex(static_cast<uint64_t>(Imm));
}
-void AMDGPUInstPrinter::printImmediate64(uint64_t Imm,
+void AMDGPUInstPrinter::printImmediate64(const MCInstrDesc &Desc, uint64_t Imm,
const MCSubtargetInfo &STI,
raw_ostream &O, bool IsFP) {
int64_t SImm = static_cast<int64_t>(Imm);
@@ -621,20 +622,23 @@ void AMDGPUInstPrinter::printImmediate64(uint64_t Imm,
STI.hasFeature(AMDGPU::FeatureInv2PiInlineImm))
O << "0.15915494309189532";
else
- printLiteral64(Imm, STI, O, IsFP);
+ printLiteral64(Desc, Imm, STI, O, IsFP);
}
-void AMDGPUInstPrinter::printLiteral64(uint64_t Imm, const MCSubtargetInfo &STI,
+void AMDGPUInstPrinter::printLiteral64(const MCInstrDesc &Desc, uint64_t Imm,
+ const MCSubtargetInfo &STI,
raw_ostream &O, bool IsFP) {
// This part needs to align with AMDGPUOperand::addLiteralImmOperand.
+ bool CanUse64BitLiterals =
+ STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
+ !(Desc.TSFlags & (SIInstrFlags::VOP3 | SIInstrFlags::VOP3P));
if (IsFP) {
- if (STI.hasFeature(AMDGPU::Feature64BitLiterals) && Lo_32(Imm))
+ if (CanUse64BitLiterals && Lo_32(Imm))
O << "lit64(" << formatHex(static_cast<uint64_t>(Imm)) << ')';
else
O << formatHex(static_cast<uint64_t>(Hi_32(Imm)));
} else {
- if (STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
- (!isInt<32>(Imm) || !isUInt<32>(Imm)))
+ if (CanUse64BitLiterals && (!isInt<32>(Imm) || !isUInt<32>(Imm)))
O << "lit64(" << formatHex(static_cast<uint64_t>(Imm)) << ')';
else
O << formatHex(static_cast<uint64_t>(Imm));
@@ -749,12 +753,12 @@ void AMDGPUInstPrinter::printRegularOperand(const MCInst *MI, unsigned OpNo,
break;
case AMDGPU::OPERAND_REG_IMM_INT64:
case AMDGPU::OPERAND_REG_INLINE_C_INT64:
- printImmediate64(Op.getImm(), STI, O, false);
+ printImmediate64(Desc, Op.getImm(), STI, O, false);
break;
case AMDGPU::OPERAND_REG_IMM_FP64:
case AMDGPU::OPERAND_REG_INLINE_C_FP64:
case AMDGPU::OPERAND_REG_INLINE_AC_FP64:
- printImmediate64(Op.getImm(), STI, O, true);
+ printImmediate64(Desc, Op.getImm(), STI, O, true);
break;
case AMDGPU::OPERAND_REG_INLINE_C_INT16:
case AMDGPU::OPERAND_REG_IMM_INT16:
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
index 0c4aca4eee472..a92f99c3c0e4b 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
@@ -87,10 +87,10 @@ class AMDGPUInstPrinter : public MCInstPrinter {
raw_ostream &O);
void printImmediate32(uint32_t Imm, const MCSubtargetInfo &STI,
raw_ostream &O);
- void printImmediate64(uint64_t Imm, const MCSubtargetInfo &STI,
- raw_ostream &O, bool IsFP);
- void printLiteral64(uint64_t Imm, const MCSubtargetInfo &STI, raw_ostream &O,
- bool IsFP);
+ void printImmediate64(const MCInstrDesc &Desc, uint64_t Imm,
+ const MCSubtargetInfo &STI, raw_ostream &O, bool IsFP);
+ void printLiteral64(const MCInstrDesc &Desc, uint64_t Imm,
+ const MCSubtargetInfo &STI, raw_ostream &O, bool IsFP);
void printOperand(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);
void printRegularOperand(const MCInst *MI, unsigned OpNo,
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
index fc28b9c96f5af..895f9e7078f27 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
@@ -88,7 +88,7 @@ class AMDGPUMCCodeEmitter : public MCCodeEmitter {
/// Encode an fp or int literal.
std::optional<uint64_t>
- getLitEncoding(const MCOperand &MO, const MCOperandInfo &OpInfo,
+ getLitEncoding(const MCInstrDesc &Desc, const MCOperand &MO, unsigned OpNo,
const MCSubtargetInfo &STI,
bool HasMandatoryLiteral = false) const;
@@ -219,8 +219,8 @@ static uint32_t getLit16IntEncoding(uint32_t Val, const MCSubtargetInfo &STI) {
return getLit32Encoding(Val, STI);
}
-static uint32_t getLit64Encoding(uint64_t Val, const MCSubtargetInfo &STI,
- bool IsFP) {
+static uint32_t getLit64Encoding(const MCInstrDesc &Desc, uint64_t Val,
+ const MCSubtargetInfo &STI, bool IsFP) {
uint32_t IntImm = getIntInlineImmEncoding(static_cast<int64_t>(Val));
if (IntImm != 0)
return IntImm;
@@ -255,20 +255,21 @@ static uint32_t getLit64Encoding(uint64_t Val, const MCSubtargetInfo &STI,
// The rest part needs to align with AMDGPUInstPrinter::printLiteral64.
+ bool CanUse64BitLiterals =
+ STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
+ !(Desc.TSFlags & (SIInstrFlags::VOP3 | SIInstrFlags::VOP3P));
if (IsFP) {
- return STI.hasFeature(AMDGPU::Feature64BitLiterals) && Lo_32(Val) ? 254
- : 255;
+ return CanUse64BitLiterals && Lo_32(Val) ? 254 : 255;
}
- return STI.hasFeature(AMDGPU::Feature64BitLiterals) &&
- (!isInt<32>(Val) || !isUInt<32>(Val))
- ? 254
- : 255;
+ return CanUse64BitLiterals && (!isInt<32>(Val) || !isUInt<32>(Val)) ? 254
+ : 255;
}
std::optional<uint64_t> AMDGPUMCCodeEmitter::getLitEncoding(
- const MCOperand &MO, const MCOperandInfo &OpInfo,
+ const MCInstrDesc &Desc, const MCOperand &MO, unsigned OpNo,
const MCSubtargetInfo &STI, bool HasMandatoryLiteral) const {
+ const MCOperandInfo &OpInfo = Desc.operands()[OpNo];
int64_t Imm;
if (MO.isExpr()) {
if (!MO.getExpr()->evaluateAsAbsolute(Imm))
@@ -296,14 +297,14 @@ std::optional<uint64_t> AMDGPUMCCodeEmitter::getLitEncoding(
case AMDGPU::OPERAND_REG_IMM_INT64:
case AMDGPU::OPERAND_REG_INLINE_C_INT64:
- return getLit64Encoding(static_cast<uint64_t>(Imm), STI, false);
+ return getLit64Encoding(Desc, static_cast<uint64_t>(Imm), STI, false);
case AMDGPU::OPERAND_REG_INLINE_C_FP64:
case AMDGPU::OPERAND_REG_INLINE_AC_FP64:
- return getLit64Encoding(static_cast<uint64_t>(Imm), STI, true);
+ return getLit64Encoding(Desc, static_cast<uint64_t>(Imm), STI, true);
case AMDGPU::OPERAND_REG_IMM_FP64: {
- auto Enc = getLit64Encoding(static_cast<uint64_t>(Imm), STI, true);
+ auto Enc = getLit64Encoding(Desc, static_cast<uint64_t>(Imm), STI, true);
return (HasMandatoryLiteral && Enc == 255) ? 254 : Enc;
}
@@ -444,7 +445,7 @@ void AMDGPUMCCodeEmitter::encodeInstruction(const MCInst &MI,
// Is this operand a literal immediate?
const MCOperand &Op = MI.getOperand(i);
- auto Enc = getLitEncoding(Op, Desc.operands()[i], STI);
+ auto Enc = getLitEncoding(Desc, Op, i, STI);
if (!Enc || (*Enc != 255 && *Enc != 254))
continue;
@@ -518,7 +519,7 @@ void AMDGPUMCCodeEmitter::getSDWASrcEncoding(const MCInst &MI, unsigned OpNo,
return;
} else {
const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
- auto Enc = getLitEncoding(MO, Desc.operands()[OpNo], STI);
+ auto Enc = getLitEncoding(Desc, MO, OpNo, STI);
if (Enc && *Enc != 255) {
Op = *Enc | SDWA9EncValues::SRC_SGPR_MASK;
return;
@@ -701,8 +702,7 @@ void AMDGPUMCCodeEmitter::getMachineOpValueCommon(
if (AMDGPU::isSISrcOperand(Desc, OpNo)) {
bool HasMandatoryLiteral =
AMDGPU::hasNamedOperand(MI.getOpcode(), AMDGPU::OpName::imm);
- if (auto Enc = getLitEncoding(MO, Desc.operands()[OpNo], STI,
- HasMandatoryLiteral)) {
+ if (auto Enc = getLitEncoding(Desc, MO, OpNo, STI, HasMandatoryLiteral)) {
Op = *Enc;
return;
}
diff --git a/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s b/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s
index 4aea7b32f13e5..e0ee928190ada 100644
--- a/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s
+++ b/llvm/test/MC/AMDGPU/gfx1250_asm_vop3cx.s
@@ -368,7 +368,7 @@ v_cmpx_eq_i64_e64 vcc, ttmp[14:15]
// GFX1250: v_cmpx_eq_i64_e64 vcc, ttmp[14:15] ; encoding: [0x7e,0x00,0xd2,0xd4,0x6a,0xf4,0x00,0x00]
v_cmpx_eq_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_eq_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd2,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd2,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_eq_i64_e64 exec, src_scc
// GFX1250: v_cmpx_eq_i64_e64 exec, src_scc ; encoding: [0x7e,0x00,0xd2,0xd4,0x7e,0xfa,0x01,0x00]
@@ -386,7 +386,7 @@ v_cmpx_eq_i64_e64 src_scc, exec
// GFX1250: v_cmpx_eq_i64_e64 src_scc, exec ; encoding: [0x7e,0x00,0xd2,0xd4,0xfd,0xfc,0x00,0x00]
v_cmpx_eq_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_eq_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd2,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_i64_e64 0xaf123456, vcc ; encoding: [0x7e,0x00,0xd2,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_eq_u16_e64 v1, v2
// GFX1250: v_cmpx_eq_u16_e64 v1, v2 ; encoding: [0x7e,0x00,0xba,0xd4,0x01,0x05,0x02,0x00]
@@ -494,7 +494,7 @@ v_cmpx_eq_u64_e64 vcc, ttmp[14:15]
// GFX1250: v_cmpx_eq_u64_e64 vcc, ttmp[14:15] ; encoding: [0x7e,0x00,0xda,0xd4,0x6a,0xf4,0x00,0x00]
v_cmpx_eq_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_eq_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xda,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xda,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_eq_u64_e64 exec, src_scc
// GFX1250: v_cmpx_eq_u64_e64 exec, src_scc ; encoding: [0x7e,0x00,0xda,0xd4,0x7e,0xfa,0x01,0x00]
@@ -512,7 +512,7 @@ v_cmpx_eq_u64_e64 src_scc, exec
// GFX1250: v_cmpx_eq_u64_e64 src_scc, exec ; encoding: [0x7e,0x00,0xda,0xd4,0xfd,0xfc,0x00,0x00]
v_cmpx_eq_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_eq_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xda,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_eq_u64_e64 0xaf123456, vcc ; encoding: [0x7e,0x00,0xda,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_ge_f16_e64 v1, v2
// GFX1250: v_cmpx_ge_f16_e64 v1, v2 ; encoding: [0x7e,0x00,0x86,0xd4,0x01,0x05,0x02,0x00]
@@ -746,7 +746,7 @@ v_cmpx_ge_i64_e64 vcc, ttmp[14:15]
// GFX1250: v_cmpx_ge_i64_e64 vcc, ttmp[14:15] ; encoding: [0x7e,0x00,0xd6,0xd4,0x6a,0xf4,0x00,0x00]
v_cmpx_ge_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_ge_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd6,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd6,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_ge_i64_e64 exec, src_scc
// GFX1250: v_cmpx_ge_i64_e64 exec, src_scc ; encoding: [0x7e,0x00,0xd6,0xd4,0x7e,0xfa,0x01,0x00]
@@ -764,7 +764,7 @@ v_cmpx_ge_i64_e64 src_scc, exec
// GFX1250: v_cmpx_ge_i64_e64 src_scc, exec ; encoding: [0x7e,0x00,0xd6,0xd4,0xfd,0xfc,0x00,0x00]
v_cmpx_ge_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_ge_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd6,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_i64_e64 0xaf123456, vcc ; encoding: [0x7e,0x00,0xd6,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_ge_u16_e64 v1, v2
// GFX1250: v_cmpx_ge_u16_e64 v1, v2 ; encoding: [0x7e,0x00,0xbe,0xd4,0x01,0x05,0x02,0x00]
@@ -872,7 +872,7 @@ v_cmpx_ge_u64_e64 vcc, ttmp[14:15]
// GFX1250: v_cmpx_ge_u64_e64 vcc, ttmp[14:15] ; encoding: [0x7e,0x00,0xde,0xd4,0x6a,0xf4,0x00,0x00]
v_cmpx_ge_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_ge_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xde,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xde,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_ge_u64_e64 exec, src_scc
// GFX1250: v_cmpx_ge_u64_e64 exec, src_scc ; encoding: [0x7e,0x00,0xde,0xd4,0x7e,0xfa,0x01,0x00]
@@ -890,7 +890,7 @@ v_cmpx_ge_u64_e64 src_scc, exec
// GFX1250: v_cmpx_ge_u64_e64 src_scc, exec ; encoding: [0x7e,0x00,0xde,0xd4,0xfd,0xfc,0x00,0x00]
v_cmpx_ge_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_ge_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xde,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_ge_u64_e64 0xaf123456, vcc ; encoding: [0x7e,0x00,0xde,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_gt_f16_e64 v1, v2
// GFX1250: v_cmpx_gt_f16_e64 v1, v2 ; encoding: [0x7e,0x00,0x84,0xd4,0x01,0x05,0x02,0x00]
@@ -1124,7 +1124,7 @@ v_cmpx_gt_i64_e64 vcc, ttmp[14:15]
// GFX1250: v_cmpx_gt_i64_e64 vcc, ttmp[14:15] ; encoding: [0x7e,0x00,0xd4,0xd4,0x6a,0xf4,0x00,0x00]
v_cmpx_gt_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_gt_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd4,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd4,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_gt_i64_e64 exec, src_scc
// GFX1250: v_cmpx_gt_i64_e64 exec, src_scc ; encoding: [0x7e,0x00,0xd4,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1142,7 +1142,7 @@ v_cmpx_gt_i64_e64 src_scc, exec
// GFX1250: v_cmpx_gt_i64_e64 src_scc, exec ; encoding: [0x7e,0x00,0xd4,0xd4,0xfd,0xfc,0x00,0x00]
v_cmpx_gt_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_gt_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd4,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_i64_e64 0xaf123456, vcc ; encoding: [0x7e,0x00,0xd4,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_gt_u16_e64 v1, v2
// GFX1250: v_cmpx_gt_u16_e64 v1, v2 ; encoding: [0x7e,0x00,0xbc,0xd4,0x01,0x05,0x02,0x00]
@@ -1250,7 +1250,7 @@ v_cmpx_gt_u64_e64 vcc, ttmp[14:15]
// GFX1250: v_cmpx_gt_u64_e64 vcc, ttmp[14:15] ; encoding: [0x7e,0x00,0xdc,0xd4,0x6a,0xf4,0x00,0x00]
v_cmpx_gt_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_gt_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xdc,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xdc,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_gt_u64_e64 exec, src_scc
// GFX1250: v_cmpx_gt_u64_e64 exec, src_scc ; encoding: [0x7e,0x00,0xdc,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1268,7 +1268,7 @@ v_cmpx_gt_u64_e64 src_scc, exec
// GFX1250: v_cmpx_gt_u64_e64 src_scc, exec ; encoding: [0x7e,0x00,0xdc,0xd4,0xfd,0xfc,0x00,0x00]
v_cmpx_gt_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_gt_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xdc,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_gt_u64_e64 0xaf123456, vcc ; encoding: [0x7e,0x00,0xdc,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_le_f16_e64 v1, v2
// GFX1250: v_cmpx_le_f16_e64 v1, v2 ; encoding: [0x7e,0x00,0x83,0xd4,0x01,0x05,0x02,0x00]
@@ -1502,7 +1502,7 @@ v_cmpx_le_i64_e64 vcc, ttmp[14:15]
// GFX1250: v_cmpx_le_i64_e64 vcc, ttmp[14:15] ; encoding: [0x7e,0x00,0xd3,0xd4,0x6a,0xf4,0x00,0x00]
v_cmpx_le_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_le_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd3,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd3,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_le_i64_e64 exec, src_scc
// GFX1250: v_cmpx_le_i64_e64 exec, src_scc ; encoding: [0x7e,0x00,0xd3,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1520,7 +1520,7 @@ v_cmpx_le_i64_e64 src_scc, exec
// GFX1250: v_cmpx_le_i64_e64 src_scc, exec ; encoding: [0x7e,0x00,0xd3,0xd4,0xfd,0xfc,0x00,0x00]
v_cmpx_le_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_le_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd3,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_i64_e64 0xaf123456, vcc ; encoding: [0x7e,0x00,0xd3,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_le_u16_e64 v1, v2
// GFX1250: v_cmpx_le_u16_e64 v1, v2 ; encoding: [0x7e,0x00,0xbb,0xd4,0x01,0x05,0x02,0x00]
@@ -1628,7 +1628,7 @@ v_cmpx_le_u64_e64 vcc, ttmp[14:15]
// GFX1250: v_cmpx_le_u64_e64 vcc, ttmp[14:15] ; encoding: [0x7e,0x00,0xdb,0xd4,0x6a,0xf4,0x00,0x00]
v_cmpx_le_u64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_le_u64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xdb,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_u64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xdb,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_le_u64_e64 exec, src_scc
// GFX1250: v_cmpx_le_u64_e64 exec, src_scc ; encoding: [0x7e,0x00,0xdb,0xd4,0x7e,0xfa,0x01,0x00]
@@ -1646,7 +1646,7 @@ v_cmpx_le_u64_e64 src_scc, exec
// GFX1250: v_cmpx_le_u64_e64 src_scc, exec ; encoding: [0x7e,0x00,0xdb,0xd4,0xfd,0xfc,0x00,0x00]
v_cmpx_le_u64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_le_u64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xdb,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_le_u64_e64 0xaf123456, vcc ; encoding: [0x7e,0x00,0xdb,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_lg_f16_e64 v1, v2
// GFX1250: v_cmpx_lg_f16_e64 v1, v2 ; encoding: [0x7e,0x00,0x85,0xd4,0x01,0x05,0x02,0x00]
@@ -2006,7 +2006,7 @@ v_cmpx_lt_i64_e64 vcc, ttmp[14:15]
// GFX1250: v_cmpx_lt_i64_e64 vcc, ttmp[14:15] ; encoding: [0x7e,0x00,0xd1,0xd4,0x6a,0xf4,0x00,0x00]
v_cmpx_lt_i64_e64 ttmp[14:15], 0xaf123456
-// GFX1250: v_cmpx_lt_i64_e64 ttmp[14:15], lit64(0xaf123456) ; encoding: [0x7e,0x00,0xd1,0xd4,0x7a,0xfc,0x01,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_lt_i64_e64 ttmp[14:15], 0xaf123456 ; encoding: [0x7e,0x00,0xd1,0xd4,0x7a,0xfe,0x01,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_lt_i64_e64 exec, src_scc
// GFX1250: v_cmpx_lt_i64_e64 exec, src_scc ; encoding: [0x7e,0x00,0xd1,0xd4,0x7e,0xfa,0x01,0x00]
@@ -2024,7 +2024,7 @@ v_cmpx_lt_i64_e64 src_scc, exec
// GFX1250: v_cmpx_lt_i64_e64 src_scc, exec ; encoding: [0x7e,0x00,0xd1,0xd4,0xfd,0xfc,0x00,0x00]
v_cmpx_lt_i64_e64 0xaf123456, vcc
-// GFX1250: v_cmpx_lt_i64_e64 lit64(0xaf123456), vcc ; encoding: [0x7e,0x00,0xd1,0xd4,0xfe,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf,0x00,0x00,0x00,0x00]
+// GFX1250: v_cmpx_lt_i64_e64 0xaf123456, vcc ; encoding: [0x7e,0x00,0xd1,0xd4,0xff,0xd4,0x00,0x00,0x56,0x34,0x12,0xaf]
v_cmpx_lt_u16_e64 v1, v2
// GFX1250: v_cmpx_lt_u16_e64 v1, v2 ; encoding: [0x7e,0x00,0xb9,0xd4,0x01,0x05,0x02,0x00]
@@ -213...
[truncated]
|
This seems like a situation when folding could have happened, but was gated by the conservative condition |
Then in general we probably will need signed and unsigned integer operands to fully use it. |
LGTM anyway, it fixes the bug and we can think how to optimize more separately. |
Exactly! If/when we know the signedness of all operands, then we won't have to make conservative decisions based on |
Encoding sometimes uses a 64-bit instead of 32-bit literal because it
does not know the signedness of the operand: if the value does not fit
in both a 32-bit signed and a 32-bit unsigned then it will use a 64-bit
literal for safety. However it should never do this for VOP3 and VOP3P
encoded instructions, because these encodings do not allow 64-bit
literal operands.