-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AArch64] Transform add(x, abs(y)) -> saba(x, y, 0) #156615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I'm surprised this is the first use of SDPatternMatch.
Do we need the new combine / node, or could this be done with a tablegen pattern directly?
I was suprised too, but it does seem to be (llvm::PatternMatch is used, but not llvm::SDPatternMatch).
I was thinking that this couldn't be done via tablegen, because in order to transform |
We generate the instructions directly in some of the raddhn patterns
Generating two instructions in a pattern has it's down sides, so they both have advantages and disadvantages, but it is likely a little better than a new AArch64ISD node that isn't otherwise optimized. |
IsZExt = false; | ||
return true; | ||
} | ||
if (sd_match(V0, SDPatternMatch::m_ZExt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can also look for ANY_EXTEND opcodes too and treat it as a zero-extend.
|
||
auto MatchAbsOrZExtAbs = [](SDValue V0, SDValue V1, SDValue &AbsOp, | ||
SDValue &Other, bool &IsZExt) { | ||
Other = V1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise with your code below it's not a problem right now, but I think it' still worth only setting Other
if we found a match.
return SDValue(); | ||
|
||
SDLoc DL(N); | ||
SDValue Zero = DCI.DAG.getConstant(0, DL, MVT::i64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to get a constant that matches the element type of AbsOp, right?
// Transform the following: | ||
// - add(x, abs(y)) -> saba(x, y, 0) | ||
// - add(x, zext(abs(y))) -> sabal(x, y, 0) | ||
static SDValue performAddSABACombine(SDNode *N, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we limit this to post-legalisation only or do we specifically want to catch cases pre-legalisation?
SDValue Zeros = DCI.DAG.getSplatVector(AbsOp.getValueType(), DL, Zero); | ||
|
||
unsigned Opcode = IsZExt ? AArch64ISD::SABAL : AArch64ISD::SABA; | ||
return DCI.DAG.getNode(Opcode, DL, VT, Other, AbsOp, Zeros); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also be checking if the operation is legal for the instruction? If we allow this DAG combine to run pre-legalisation then we could be zero-extending 64-bit to 128-bit and I don't think we support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @david-arm for reviewing this - at @davemgreen's suggestion I've removed this function in favor of the tablegen-only approach
@llvm/pr-subscribers-backend-aarch64 Author: Hari Limaye (hazzlim) ChangesAdd a DAGCombine to perform the following transformations:
As well as being a useful generic transformation, this also fixes an issue where LLVM de-optimises [US]ABA neon ACLE intrinsics into separate ABD+ADD instructions when one of the operands is a zero vector. Full diff: https://github.com/llvm/llvm-project/pull/156615.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index f1e8fb7f77652..b22cd04a6f3c0 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -50,6 +50,7 @@
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/SDPatternMatch.h"
#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/CodeGen/TargetCallingConv.h"
@@ -21913,6 +21914,56 @@ static SDValue performExtBinopLoadFold(SDNode *N, SelectionDAG &DAG) {
return DAG.getNode(N->getOpcode(), DL, VT, Ext0, NShift);
}
+// Transform the following:
+// - add(x, abs(y)) -> saba(x, y, 0)
+// - add(x, zext(abs(y))) -> sabal(x, y, 0)
+static SDValue performAddSABACombine(SDNode *N,
+ TargetLowering::DAGCombinerInfo &DCI) {
+ if (N->getOpcode() != ISD::ADD)
+ return SDValue();
+
+ EVT VT = N->getValueType(0);
+ if (!VT.isFixedLengthVector())
+ return SDValue();
+
+ SDValue N0 = N->getOperand(0);
+ SDValue N1 = N->getOperand(1);
+
+ auto MatchAbsOrZExtAbs = [](SDValue V0, SDValue V1, SDValue &AbsOp,
+ SDValue &Other, bool &IsZExt) {
+ Other = V1;
+ if (sd_match(V0, m_Abs(SDPatternMatch::m_Value(AbsOp)))) {
+ IsZExt = false;
+ return true;
+ }
+ if (sd_match(V0, SDPatternMatch::m_ZExt(
+ m_Abs(SDPatternMatch::m_Value(AbsOp))))) {
+ IsZExt = true;
+ return true;
+ }
+
+ return false;
+ };
+
+ SDValue AbsOp;
+ SDValue Other;
+ bool IsZExt;
+ if (!MatchAbsOrZExtAbs(N0, N1, AbsOp, Other, IsZExt) &&
+ !MatchAbsOrZExtAbs(N1, N0, AbsOp, Other, IsZExt))
+ return SDValue();
+
+ // Don't perform this on abs(sub), as this will become an ABD/ABA anyway.
+ if (AbsOp.getOpcode() == ISD::SUB)
+ return SDValue();
+
+ SDLoc DL(N);
+ SDValue Zero = DCI.DAG.getConstant(0, DL, MVT::i64);
+ SDValue Zeros = DCI.DAG.getSplatVector(AbsOp.getValueType(), DL, Zero);
+
+ unsigned Opcode = IsZExt ? AArch64ISD::SABAL : AArch64ISD::SABA;
+ return DCI.DAG.getNode(Opcode, DL, VT, Other, AbsOp, Zeros);
+}
+
static SDValue performAddSubCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {
// Try to change sum of two reductions.
@@ -21938,6 +21989,9 @@ static SDValue performAddSubCombine(SDNode *N,
if (SDValue Val = performExtBinopLoadFold(N, DCI.DAG))
return Val;
+ if (SDValue Val = performAddSABACombine(N, DCI))
+ return Val;
+
return performAddSubLongCombine(N, DCI);
}
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 62b26b5239365..fdfde5ea1dc37 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -1059,6 +1059,10 @@ def AArch64sdot : SDNode<"AArch64ISD::SDOT", SDT_AArch64Dot>;
def AArch64udot : SDNode<"AArch64ISD::UDOT", SDT_AArch64Dot>;
def AArch64usdot : SDNode<"AArch64ISD::USDOT", SDT_AArch64Dot>;
+// saba/sabal
+def AArch64neonsaba : SDNode<"AArch64ISD::SABA", SDT_AArch64trivec>;
+def AArch64neonsabal : SDNode<"AArch64ISD::SABAL", SDT_AArch64Dot>;
+
// Vector across-lanes addition
// Only the lower result lane is defined.
def AArch64saddv : SDNode<"AArch64ISD::SADDV", SDT_AArch64UnaryVec>;
@@ -6121,6 +6125,19 @@ defm SQRDMLAH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10000,"sqrdmlah",
defm SQRDMLSH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10001,"sqrdmlsh",
int_aarch64_neon_sqrdmlsh>;
+def : Pat<(AArch64neonsaba (v8i8 V64:$Rd), V64:$Rn, V64:$Rm),
+ (SABAv8i8 V64:$Rd, V64:$Rn, V64:$Rm)>;
+def : Pat<(AArch64neonsaba (v4i16 V64:$Rd), V64:$Rn, V64:$Rm),
+ (SABAv4i16 V64:$Rd, V64:$Rn, V64:$Rm)>;
+def : Pat<(AArch64neonsaba (v2i32 V64:$Rd), V64:$Rn, V64:$Rm),
+ (SABAv2i32 V64:$Rd, V64:$Rn, V64:$Rm)>;
+def : Pat<(AArch64neonsaba (v16i8 V128:$Rd), V128:$Rn, V128:$Rm),
+ (SABAv16i8 V128:$Rd, V128:$Rn, V128:$Rm)>;
+def : Pat<(AArch64neonsaba (v8i16 V128:$Rd), V128:$Rn, V128:$Rm),
+ (SABAv8i16 V128:$Rd, V128:$Rn, V128:$Rm)>;
+def : Pat<(AArch64neonsaba (v4i32 V128:$Rd), V128:$Rn, V128:$Rm),
+ (SABAv4i32 V128:$Rd, V128:$Rn, V128:$Rm)>;
+
defm AND : SIMDLogicalThreeVector<0, 0b00, "and", and>;
defm BIC : SIMDLogicalThreeVector<0, 0b01, "bic",
BinOpFrag<(and node:$LHS, (vnot node:$RHS))> >;
@@ -7008,6 +7025,14 @@ defm : AddSubHNPatterns<ADDHNv2i64_v2i32, ADDHNv2i64_v4i32,
SUBHNv2i64_v2i32, SUBHNv2i64_v4i32,
v2i32, v2i64, 32>;
+// Patterns for SABAL
+def : Pat<(AArch64neonsabal (v8i16 V128:$Rd), (v8i8 V64:$Rn), (v8i8 V64:$Rm)),
+ (SABALv8i8_v8i16 V128:$Rd, V64:$Rn, V64:$Rm)>;
+def : Pat<(AArch64neonsabal (v4i32 V128:$Rd), (v4i16 V64:$Rn), (v4i16 V64:$Rm)),
+ (SABALv4i16_v4i32 V128:$Rd, V64:$Rn, V64:$Rm)>;
+def : Pat<(AArch64neonsabal (v2i64 V128:$Rd), (v2i32 V64:$Rn), (v2i32 V64:$Rm)),
+ (SABALv2i32_v2i64 V128:$Rd, V64:$Rn, V64:$Rm)>;
+
//----------------------------------------------------------------------------
// AdvSIMD bitwise extract from vector instruction.
//----------------------------------------------------------------------------
diff --git a/llvm/test/CodeGen/AArch64/neon-saba.ll b/llvm/test/CodeGen/AArch64/neon-saba.ll
index 19967bd1a69ec..c8de6b21e9764 100644
--- a/llvm/test/CodeGen/AArch64/neon-saba.ll
+++ b/llvm/test/CodeGen/AArch64/neon-saba.ll
@@ -174,6 +174,268 @@ define <8 x i8> @saba_sabd_8b(<8 x i8> %a, <8 x i8> %b, <8 x i8> %c) #0 {
ret <8 x i8> %add
}
+; SABA from ADD(SABD(X, ZEROS))
+
+define <4 x i32> @saba_sabd_zeros_4s(<4 x i32> %a, <4 x i32> %b) #0 {
+; CHECK-LABEL: saba_sabd_zeros_4s:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v2.2d, #0000000000000000
+; CHECK-NEXT: saba v0.4s, v1.4s, v2.4s
+; CHECK-NEXT: ret
+ %sabd = call <4 x i32> @llvm.aarch64.neon.sabd.v4i32(<4 x i32> %b, <4 x i32> zeroinitializer)
+ %add = add <4 x i32> %sabd, %a
+ ret <4 x i32> %add
+}
+
+define <2 x i32> @saba_sabd_zeros_2s(<2 x i32> %a, <2 x i32> %b) #0 {
+; CHECK-LABEL: saba_sabd_zeros_2s:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v2.2d, #0000000000000000
+; CHECK-NEXT: saba v0.2s, v1.2s, v2.2s
+; CHECK-NEXT: ret
+ %sabd = call <2 x i32> @llvm.aarch64.neon.sabd.v2i32(<2 x i32> %b, <2 x i32> zeroinitializer)
+ %add = add <2 x i32> %sabd, %a
+ ret <2 x i32> %add
+}
+
+define <8 x i16> @saba_sabd_zeros_8h(<8 x i16> %a, <8 x i16> %b) #0 {
+; CHECK-LABEL: saba_sabd_zeros_8h:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v2.2d, #0000000000000000
+; CHECK-NEXT: saba v0.8h, v1.8h, v2.8h
+; CHECK-NEXT: ret
+ %sabd = call <8 x i16> @llvm.aarch64.neon.sabd.v8i16(<8 x i16> %b, <8 x i16> zeroinitializer)
+ %add = add <8 x i16> %sabd, %a
+ ret <8 x i16> %add
+}
+
+define <4 x i16> @saba_sabd_zeros_4h(<4 x i16> %a, <4 x i16> %b) #0 {
+; CHECK-LABEL: saba_sabd_zeros_4h:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v2.2d, #0000000000000000
+; CHECK-NEXT: saba v0.4h, v1.4h, v2.4h
+; CHECK-NEXT: ret
+ %sabd = call <4 x i16> @llvm.aarch64.neon.sabd.v4i16(<4 x i16> %b, <4 x i16> zeroinitializer)
+ %add = add <4 x i16> %sabd, %a
+ ret <4 x i16> %add
+}
+
+define <16 x i8> @saba_sabd_zeros_16b(<16 x i8> %a, <16 x i8> %b) #0 {
+; CHECK-LABEL: saba_sabd_zeros_16b:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v2.2d, #0000000000000000
+; CHECK-NEXT: saba v0.16b, v1.16b, v2.16b
+; CHECK-NEXT: ret
+ %sabd = call <16 x i8> @llvm.aarch64.neon.sabd.v16i8(<16 x i8> %b, <16 x i8> zeroinitializer)
+ %add = add <16 x i8> %sabd, %a
+ ret <16 x i8> %add
+}
+
+define <8 x i8> @saba_sabd_zeros_8b(<8 x i8> %a, <8 x i8> %b) #0 {
+; CHECK-LABEL: saba_sabd_zeros_8b:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v2.2d, #0000000000000000
+; CHECK-NEXT: saba v0.8b, v1.8b, v2.8b
+; CHECK-NEXT: ret
+ %sabd = call <8 x i8> @llvm.aarch64.neon.sabd.v8i8(<8 x i8> %b, <8 x i8> zeroinitializer)
+ %add = add <8 x i8> %sabd, %a
+ ret <8 x i8> %add
+}
+
+define <4 x i32> @saba_abs_zeros_4s(<4 x i32> %a, <4 x i32> %b) #0 {
+; CHECK-SD-LABEL: saba_abs_zeros_4s:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: movi v2.2d, #0000000000000000
+; CHECK-SD-NEXT: saba v0.4s, v1.4s, v2.4s
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: saba_abs_zeros_4s:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: abs v1.4s, v1.4s
+; CHECK-GI-NEXT: add v0.4s, v0.4s, v1.4s
+; CHECK-GI-NEXT: ret
+ %abs = call <4 x i32> @llvm.abs.v4i32(<4 x i32> %b, i1 true)
+ %add = add <4 x i32> %a, %abs
+ ret <4 x i32> %add
+}
+
+define <2 x i32> @saba_abs_zeros_2s(<2 x i32> %a, <2 x i32> %b) #0 {
+; CHECK-SD-LABEL: saba_abs_zeros_2s:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: movi v2.2d, #0000000000000000
+; CHECK-SD-NEXT: saba v0.2s, v1.2s, v2.2s
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: saba_abs_zeros_2s:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: abs v1.2s, v1.2s
+; CHECK-GI-NEXT: add v0.2s, v0.2s, v1.2s
+; CHECK-GI-NEXT: ret
+ %abs = call <2 x i32> @llvm.abs.v2i32(<2 x i32> %b, i1 true)
+ %add = add <2 x i32> %a, %abs
+ ret <2 x i32> %add
+}
+
+define <8 x i16> @saba_abs_zeros_8h(<8 x i16> %a, <8 x i16> %b) #0 {
+; CHECK-SD-LABEL: saba_abs_zeros_8h:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: movi v2.2d, #0000000000000000
+; CHECK-SD-NEXT: saba v0.8h, v1.8h, v2.8h
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: saba_abs_zeros_8h:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: abs v1.8h, v1.8h
+; CHECK-GI-NEXT: add v0.8h, v0.8h, v1.8h
+; CHECK-GI-NEXT: ret
+ %abs = call <8 x i16> @llvm.abs.v8i16(<8 x i16> %b, i1 true)
+ %add = add <8 x i16> %a, %abs
+ ret <8 x i16> %add
+}
+
+define <4 x i16> @saba_abs_zeros_4h(<4 x i16> %a, <4 x i16> %b) #0 {
+; CHECK-SD-LABEL: saba_abs_zeros_4h:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: movi v2.2d, #0000000000000000
+; CHECK-SD-NEXT: saba v0.4h, v1.4h, v2.4h
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: saba_abs_zeros_4h:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: abs v1.4h, v1.4h
+; CHECK-GI-NEXT: add v0.4h, v0.4h, v1.4h
+; CHECK-GI-NEXT: ret
+ %abs = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %b, i1 true)
+ %add = add <4 x i16> %a, %abs
+ ret <4 x i16> %add
+}
+
+define <16 x i8> @saba_abs_zeros_16b(<16 x i8> %a, <16 x i8> %b) #0 {
+; CHECK-SD-LABEL: saba_abs_zeros_16b:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: movi v2.2d, #0000000000000000
+; CHECK-SD-NEXT: saba v0.16b, v1.16b, v2.16b
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: saba_abs_zeros_16b:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: abs v1.16b, v1.16b
+; CHECK-GI-NEXT: add v0.16b, v0.16b, v1.16b
+; CHECK-GI-NEXT: ret
+ %abs = call <16 x i8> @llvm.abs.v16i8(<16 x i8> %b, i1 true)
+ %add = add <16 x i8> %a, %abs
+ ret <16 x i8> %add
+}
+
+define <8 x i8> @saba_abs_zeros_8b(<8 x i8> %a, <8 x i8> %b) #0 {
+; CHECK-SD-LABEL: saba_abs_zeros_8b:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: movi v2.2d, #0000000000000000
+; CHECK-SD-NEXT: saba v0.8b, v1.8b, v2.8b
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: saba_abs_zeros_8b:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: abs v1.8b, v1.8b
+; CHECK-GI-NEXT: add v0.8b, v0.8b, v1.8b
+; CHECK-GI-NEXT: ret
+ %abs = call <8 x i8> @llvm.abs.v8i8(<8 x i8> %b, i1 true)
+ %add = add <8 x i8> %a, %abs
+ ret <8 x i8> %add
+}
+
+; SABAL from ADD(ZEXT(SABD(X, ZEROS)))
+
+define <2 x i64> @sabal_sabd_zeros_2s(<2 x i64> %a, <2 x i32> %b) #0 {
+; CHECK-LABEL: sabal_sabd_zeros_2s:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v2.2d, #0000000000000000
+; CHECK-NEXT: sabal v0.2d, v1.2s, v2.2s
+; CHECK-NEXT: ret
+ %sabd = call <2 x i32> @llvm.aarch64.neon.sabd.v2i32(<2 x i32> %b, <2 x i32> zeroinitializer)
+ %sabd.zext = zext <2 x i32> %sabd to <2 x i64>
+ %add = add <2 x i64> %a, %sabd.zext
+ ret <2 x i64> %add
+}
+
+define <4 x i32> @sabal_sabd_zeros_4h(<4 x i32> %a, <4 x i16> %b) #0 {
+; CHECK-LABEL: sabal_sabd_zeros_4h:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v2.2d, #0000000000000000
+; CHECK-NEXT: sabal v0.4s, v1.4h, v2.4h
+; CHECK-NEXT: ret
+ %sabd = call <4 x i16> @llvm.aarch64.neon.sabd.v4i16(<4 x i16> %b, <4 x i16> zeroinitializer)
+ %sabd.zext = zext <4 x i16> %sabd to <4 x i32>
+ %add = add <4 x i32> %a, %sabd.zext
+ ret <4 x i32> %add
+}
+
+define <8 x i16> @sabal_sabd_zeros_8b(<8 x i16> %a, <8 x i8> %b) #0 {
+; CHECK-LABEL: sabal_sabd_zeros_8b:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v2.2d, #0000000000000000
+; CHECK-NEXT: sabal v0.8h, v1.8b, v2.8b
+; CHECK-NEXT: ret
+ %sabd = call <8 x i8> @llvm.aarch64.neon.sabd.v8i8(<8 x i8> %b, <8 x i8> zeroinitializer)
+ %sabd.zext = zext <8 x i8> %sabd to <8 x i16>
+ %add = add <8 x i16> %a, %sabd.zext
+ ret <8 x i16> %add
+}
+
+define <2 x i64> @sabal_abs_zeros_2s(<2 x i64> %a, <2 x i32> %b) #0 {
+; CHECK-SD-LABEL: sabal_abs_zeros_2s:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: movi v2.2d, #0000000000000000
+; CHECK-SD-NEXT: sabal v0.2d, v1.2s, v2.2s
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: sabal_abs_zeros_2s:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: abs v1.2s, v1.2s
+; CHECK-GI-NEXT: uaddw v0.2d, v0.2d, v1.2s
+; CHECK-GI-NEXT: ret
+ %abs = call <2 x i32> @llvm.abs.v2i32(<2 x i32> %b, i1 true)
+ %abs.zext = zext <2 x i32> %abs to <2 x i64>
+ %add = add <2 x i64> %a, %abs.zext
+ ret <2 x i64> %add
+}
+
+define <4 x i32> @sabal_abs_zeros_4h(<4 x i32> %a, <4 x i16> %b) #0 {
+; CHECK-SD-LABEL: sabal_abs_zeros_4h:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: movi v2.2d, #0000000000000000
+; CHECK-SD-NEXT: sabal v0.4s, v1.4h, v2.4h
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: sabal_abs_zeros_4h:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: abs v1.4h, v1.4h
+; CHECK-GI-NEXT: uaddw v0.4s, v0.4s, v1.4h
+; CHECK-GI-NEXT: ret
+ %abs = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %b, i1 true)
+ %abs.zext = zext <4 x i16> %abs to <4 x i32>
+ %add = add <4 x i32> %a, %abs.zext
+ ret <4 x i32> %add
+}
+
+define <8 x i16> @sabal_abs_zeros_8b(<8 x i16> %a, <8 x i8> %b) #0 {
+; CHECK-SD-LABEL: sabal_abs_zeros_8b:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: movi v2.2d, #0000000000000000
+; CHECK-SD-NEXT: sabal v0.8h, v1.8b, v2.8b
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: sabal_abs_zeros_8b:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: abs v1.8b, v1.8b
+; CHECK-GI-NEXT: uaddw v0.8h, v0.8h, v1.8b
+; CHECK-GI-NEXT: ret
+ %abs = call <8 x i8> @llvm.abs.v8i8(<8 x i8> %b, i1 true)
+ %abs.zext = zext <8 x i8> %abs to <8 x i16>
+ %add = add <8 x i16> %a, %abs.zext
+ ret <8 x i16> %add
+}
+
declare <4 x i32> @llvm.abs.v4i32(<4 x i32>, i1)
declare <2 x i32> @llvm.abs.v2i32(<2 x i32>, i1)
declare <8 x i16> @llvm.abs.v8i16(<8 x i16>, i1)
|
Thanks for the pointer here - I have reverted the AArch64ISD node/combine changes and done this directly in tablegen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. LGTM
Add a DAGCombine to perform the following transformations:
As well as being a useful generic transformation, this also fixes an issue where LLVM de-optimises [US]ABA neon ACLE intrinsics into separate ABD+ADD instructions when one of the operands is a zero vector.