[RISCV] Use slideup to lower build_vector when all operand are (extract_element X, 0) #154450

mshockwave · 2025-08-20T00:44:14Z

The general lowering of build_vector starts with splatting the first operand before sliding down other operands one-by-one. However, if the every operands is an extract_element from the first vector element, we could use the original vector (source of extraction) from the last build_vec operand as start value before sliding up other operands (in reverse order) one-by-one. By doing so we can avoid the initial splat and eliminate the vector to scalar movement later, which is something we cannot do with vslidedown/vslide1down.

Original context: #154175 (comment)

llvmbot · 2025-08-20T00:44:45Z

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

The general lowering of build_vector starts with splatting the first operand before sliding down other operands one-by-one. However, the initial splat could be avoid if the last operand is an extract from a reduction result, in which case we could use the original vector reduction result as start value before sliding up other operands (in reverse order) one-by-one.

Original context: #154175 (comment)

Full diff: https://github.com/llvm/llvm-project/pull/154450.diff

4 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+73-12)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll (+56)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+199)
(modified) llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll (+3-4)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 4a1db80076530..ce6fc8425856a 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -4512,33 +4512,94 @@ static SDValue lowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
          "Illegal type which will result in reserved encoding");
 
   const unsigned Policy = RISCVVType::TAIL_AGNOSTIC | RISCVVType::MASK_AGNOSTIC;
+  auto getVSlide = [&](bool SlideUp, EVT ContainerVT, SDValue Passthru,
+                       SDValue Vec, SDValue Offset, SDValue Mask,
+                       SDValue VL) -> SDValue {
+    if (SlideUp)
+      return getVSlideup(DAG, Subtarget, DL, ContainerVT, Passthru, Vec, Offset,
+                         Mask, VL, Policy);
+    return getVSlidedown(DAG, Subtarget, DL, ContainerVT, Passthru, Vec, Offset,
+                         Mask, VL, Policy);
+  };
+
+  // General case: splat the first operand and sliding other operands down one
+  // by one to form a vector. Alternatively, if the last operand is an
+  // extraction from a reduction result, we can use the original vector
+  // reduction result as the start value and slide up instead of slide down.
+  // Such that we can avoid the splat.
+  SmallVector<SDValue> Operands(Op->op_begin(), Op->op_end());
+  SDValue Reduce;
+  bool SlideUp = false;
+  // Find the first first non-undef from the tail.
+  auto ItLastNonUndef = find_if(Operands.rbegin(), Operands.rend(),
+                                [](SDValue V) { return !V.isUndef(); });
+  if (ItLastNonUndef != Operands.rend()) {
+    using namespace SDPatternMatch;
+    // Check if the last non-undef operand was extracted from a reduction.
+    for (unsigned Opc :
+         {RISCVISD::VECREDUCE_ADD_VL, RISCVISD::VECREDUCE_UMAX_VL,
+          RISCVISD::VECREDUCE_SMAX_VL, RISCVISD::VECREDUCE_UMIN_VL,
+          RISCVISD::VECREDUCE_SMIN_VL, RISCVISD::VECREDUCE_AND_VL,
+          RISCVISD::VECREDUCE_OR_VL, RISCVISD::VECREDUCE_XOR_VL,
+          RISCVISD::VECREDUCE_FADD_VL, RISCVISD::VECREDUCE_SEQ_FADD_VL,
+          RISCVISD::VECREDUCE_FMAX_VL, RISCVISD::VECREDUCE_FMIN_VL}) {
+      SlideUp = sd_match(
+          *ItLastNonUndef,
+          m_ExtractElt(m_AllOf(m_Opc(Opc), m_Value(Reduce)), m_Zero()));
+      if (SlideUp)
+        break;
+    }
+  }
+
+  if (SlideUp) {
+    // Adapt Reduce's type into ContainerVT.
+    if (Reduce.getValueType().getVectorMinNumElements() <
+        ContainerVT.getVectorMinNumElements())
+      Reduce = DAG.getInsertSubvector(DL, DAG.getUNDEF(ContainerVT), Reduce, 0);
+    else
+      Reduce = DAG.getExtractSubvector(DL, ContainerVT, Reduce, 0);
+
+    // Reverse the elements as we're going to slide up from the last element.
+    for (unsigned i = 0U, N = Operands.size(), H = divideCeil(N, 2); i < H; ++i)
+      std::swap(Operands[i], Operands[N - 1 - i]);
+  }
 
   SDValue Vec;
   UndefCount = 0;
-  for (SDValue V : Op->ops()) {
+  for (SDValue V : Operands) {
     if (V.isUndef()) {
       UndefCount++;
       continue;
     }
 
-    // Start our sequence with a TA splat in the hopes that hardware is able to
-    // recognize there's no dependency on the prior value of our temporary
-    // register.
+    // Start our sequence with either a TA splat or a reduction result in the
+    // hopes that hardware is able to recognize there's no dependency on the
+    // prior value of our temporary register.
     if (!Vec) {
-      Vec = DAG.getSplatVector(VT, DL, V);
-      Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
+      if (SlideUp) {
+        Vec = Reduce;
+      } else {
+        Vec = DAG.getSplatVector(VT, DL, V);
+        Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
+      }
+
       UndefCount = 0;
       continue;
     }
 
     if (UndefCount) {
       const SDValue Offset = DAG.getConstant(UndefCount, DL, Subtarget.getXLenVT());
-      Vec = getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
-                          Vec, Offset, Mask, VL, Policy);
+      Vec = getVSlide(SlideUp, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
+                      Offset, Mask, VL);
       UndefCount = 0;
     }
-    auto OpCode =
-      VT.isFloatingPoint() ? RISCVISD::VFSLIDE1DOWN_VL : RISCVISD::VSLIDE1DOWN_VL;
+
+    unsigned OpCode;
+    if (VT.isFloatingPoint())
+      OpCode = SlideUp ? RISCVISD::VFSLIDE1UP_VL : RISCVISD::VFSLIDE1DOWN_VL;
+    else
+      OpCode = SlideUp ? RISCVISD::VSLIDE1UP_VL : RISCVISD::VSLIDE1DOWN_VL;
+
     if (!VT.isFloatingPoint())
       V = DAG.getNode(ISD::ANY_EXTEND, DL, Subtarget.getXLenVT(), V);
     Vec = DAG.getNode(OpCode, DL, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
@@ -4546,8 +4607,8 @@ static SDValue lowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
   }
   if (UndefCount) {
     const SDValue Offset = DAG.getConstant(UndefCount, DL, Subtarget.getXLenVT());
-    Vec = getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
-                        Vec, Offset, Mask, VL, Policy);
+    Vec = getVSlide(SlideUp, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
+                    Offset, Mask, VL);
   }
   return convertFromScalableVector(VT, Vec, DAG, Subtarget);
 }
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
index 3c3e08d387faa..a806e758d2758 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll
@@ -1828,3 +1828,59 @@ define <8 x double> @buildvec_v8f64_zvl512(double %e0, double %e1, double %e2, d
   %v7 = insertelement <8 x double> %v6, double %e7, i64 7
   ret <8 x double> %v7
 }
+
+define <4 x float> @buildvec_vfredusum(float %start, <8 x float> %arg1, <8 x float> %arg2, <8 x float> %arg3, <8 x float> %arg4) nounwind {
+; CHECK-LABEL: buildvec_vfredusum:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfmv.s.f v16, fa0
+; CHECK-NEXT:    vfredusum.vs v8, v8, v16
+; CHECK-NEXT:    vfredusum.vs v9, v10, v16
+; CHECK-NEXT:    vfredusum.vs v10, v12, v16
+; CHECK-NEXT:    vfmv.f.s fa5, v8
+; CHECK-NEXT:    vfmv.f.s fa4, v9
+; CHECK-NEXT:    vfmv.f.s fa3, v10
+; CHECK-NEXT:    vfredusum.vs v8, v14, v16
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfslide1up.vf v9, v8, fa3
+; CHECK-NEXT:    vfslide1up.vf v10, v9, fa4
+; CHECK-NEXT:    vfslide1up.vf v8, v10, fa5
+; CHECK-NEXT:    ret
+  %247 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg1)
+  %248 = insertelement <4 x float> poison, float %247, i64 0
+  %250 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg2)
+  %251 = insertelement <4 x float> %248, float %250, i64 1
+  %252 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg3)
+  %253 = insertelement <4 x float> %251, float %252, i64 2
+  %254 = tail call reassoc float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg4)
+  %255 = insertelement <4 x float> %253, float %254, i64 3
+  ret <4 x float> %255
+}
+
+define <4 x float> @buildvec_vfredosum(float %start, <8 x float> %arg1, <8 x float> %arg2, <8 x float> %arg3, <8 x float> %arg4) nounwind {
+; CHECK-LABEL: buildvec_vfredosum:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfmv.s.f v16, fa0
+; CHECK-NEXT:    vfredosum.vs v8, v8, v16
+; CHECK-NEXT:    vfredosum.vs v9, v10, v16
+; CHECK-NEXT:    vfredosum.vs v10, v12, v16
+; CHECK-NEXT:    vfmv.f.s fa5, v8
+; CHECK-NEXT:    vfmv.f.s fa4, v9
+; CHECK-NEXT:    vfmv.f.s fa3, v10
+; CHECK-NEXT:    vfredosum.vs v8, v14, v16
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfslide1up.vf v9, v8, fa3
+; CHECK-NEXT:    vfslide1up.vf v10, v9, fa4
+; CHECK-NEXT:    vfslide1up.vf v8, v10, fa5
+; CHECK-NEXT:    ret
+  %247 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg1)
+  %248 = insertelement <4 x float> poison, float %247, i64 0
+  %250 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg2)
+  %251 = insertelement <4 x float> %248, float %250, i64 1
+  %252 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg3)
+  %253 = insertelement <4 x float> %251, float %252, i64 2
+  %254 = tail call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %arg4)
+  %255 = insertelement <4 x float> %253, float %254, i64 3
+  ret <4 x float> %255
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
index d9bb007a10f71..a02117fdd2833 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
@@ -3416,5 +3416,204 @@ define <4 x i1> @buildvec_i1_splat(i1 %e1) {
   ret <4 x i1> %v4
 }
 
+define <4 x i32> @buildvec_vredsum(<8 x i32> %arg0, <8 x i32> %arg1, <8 x i32> %arg2, <8 x i32> %arg3) nounwind {
+; RV32-LABEL: buildvec_vredsum:
+; RV32:       # %bb.0:
+; RV32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV32-NEXT:    vmv.s.x v16, zero
+; RV32-NEXT:    vredsum.vs v8, v8, v16
+; RV32-NEXT:    vredsum.vs v9, v10, v16
+; RV32-NEXT:    vredsum.vs v10, v12, v16
+; RV32-NEXT:    vmv.x.s a0, v8
+; RV32-NEXT:    vmv.x.s a1, v9
+; RV32-NEXT:    vmv.x.s a2, v10
+; RV32-NEXT:    vredsum.vs v8, v14, v16
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT:    vslide1up.vx v9, v8, a2
+; RV32-NEXT:    vslide1up.vx v10, v9, a1
+; RV32-NEXT:    vslide1up.vx v8, v10, a0
+; RV32-NEXT:    ret
+;
+; RV64V-ONLY-LABEL: buildvec_vredsum:
+; RV64V-ONLY:       # %bb.0:
+; RV64V-ONLY-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64V-ONLY-NEXT:    vmv.s.x v16, zero
+; RV64V-ONLY-NEXT:    vredsum.vs v8, v8, v16
+; RV64V-ONLY-NEXT:    vredsum.vs v9, v10, v16
+; RV64V-ONLY-NEXT:    vredsum.vs v10, v12, v16
+; RV64V-ONLY-NEXT:    vmv.x.s a0, v8
+; RV64V-ONLY-NEXT:    vmv.x.s a1, v9
+; RV64V-ONLY-NEXT:    vmv.x.s a2, v10
+; RV64V-ONLY-NEXT:    vredsum.vs v8, v14, v16
+; RV64V-ONLY-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV64V-ONLY-NEXT:    vslide1up.vx v9, v8, a2
+; RV64V-ONLY-NEXT:    vslide1up.vx v10, v9, a1
+; RV64V-ONLY-NEXT:    vslide1up.vx v8, v10, a0
+; RV64V-ONLY-NEXT:    ret
+;
+; RVA22U64-LABEL: buildvec_vredsum:
+; RVA22U64:       # %bb.0:
+; RVA22U64-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-NEXT:    vmv.s.x v16, zero
+; RVA22U64-NEXT:    vredsum.vs v8, v8, v16
+; RVA22U64-NEXT:    vredsum.vs v9, v10, v16
+; RVA22U64-NEXT:    vredsum.vs v10, v12, v16
+; RVA22U64-NEXT:    vredsum.vs v11, v14, v16
+; RVA22U64-NEXT:    vmv.x.s a0, v8
+; RVA22U64-NEXT:    vmv.x.s a1, v9
+; RVA22U64-NEXT:    vmv.x.s a2, v10
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a0, a0, a1
+; RVA22U64-NEXT:    vmv.x.s a1, v11
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a1, a2, a1
+; RVA22U64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-NEXT:    vmv.v.x v8, a0
+; RVA22U64-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-NEXT:    ret
+;
+; RVA22U64-PACK-LABEL: buildvec_vredsum:
+; RVA22U64-PACK:       # %bb.0:
+; RVA22U64-PACK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-PACK-NEXT:    vmv.s.x v16, zero
+; RVA22U64-PACK-NEXT:    vredsum.vs v8, v8, v16
+; RVA22U64-PACK-NEXT:    vredsum.vs v9, v10, v16
+; RVA22U64-PACK-NEXT:    vredsum.vs v10, v12, v16
+; RVA22U64-PACK-NEXT:    vredsum.vs v11, v14, v16
+; RVA22U64-PACK-NEXT:    vmv.x.s a0, v8
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v9
+; RVA22U64-PACK-NEXT:    vmv.x.s a2, v10
+; RVA22U64-PACK-NEXT:    pack a0, a0, a1
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v11
+; RVA22U64-PACK-NEXT:    pack a1, a2, a1
+; RVA22U64-PACK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-PACK-NEXT:    vmv.v.x v8, a0
+; RVA22U64-PACK-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-PACK-NEXT:    ret
+;
+; RV64ZVE32-LABEL: buildvec_vredsum:
+; RV64ZVE32:       # %bb.0:
+; RV64ZVE32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64ZVE32-NEXT:    vmv.s.x v16, zero
+; RV64ZVE32-NEXT:    vredsum.vs v8, v8, v16
+; RV64ZVE32-NEXT:    vredsum.vs v9, v10, v16
+; RV64ZVE32-NEXT:    vredsum.vs v10, v12, v16
+; RV64ZVE32-NEXT:    vmv.x.s a0, v8
+; RV64ZVE32-NEXT:    vmv.x.s a1, v9
+; RV64ZVE32-NEXT:    vmv.x.s a2, v10
+; RV64ZVE32-NEXT:    vredsum.vs v8, v14, v16
+; RV64ZVE32-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV64ZVE32-NEXT:    vslide1up.vx v9, v8, a2
+; RV64ZVE32-NEXT:    vslide1up.vx v10, v9, a1
+; RV64ZVE32-NEXT:    vslide1up.vx v8, v10, a0
+; RV64ZVE32-NEXT:    ret
+  %247 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg0)
+  %248 = insertelement <4 x i32> poison, i32 %247, i64 0
+  %250 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg1)
+  %251 = insertelement <4 x i32> %248, i32 %250, i64 1
+  %252 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg2)
+  %253 = insertelement <4 x i32> %251, i32 %252, i64 2
+  %254 = tail call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> %arg3)
+  %255 = insertelement <4 x i32> %253, i32 %254, i64 3
+  ret <4 x i32> %255
+}
+
+define <4 x i32> @buildvec_vredmax(<8 x i32> %arg0, <8 x i32> %arg1, <8 x i32> %arg2, <8 x i32> %arg3) nounwind {
+; RV32-LABEL: buildvec_vredmax:
+; RV32:       # %bb.0:
+; RV32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV32-NEXT:    vredmaxu.vs v8, v8, v8
+; RV32-NEXT:    vredmaxu.vs v9, v10, v10
+; RV32-NEXT:    vredmaxu.vs v10, v12, v12
+; RV32-NEXT:    vmv.x.s a0, v8
+; RV32-NEXT:    vmv.x.s a1, v9
+; RV32-NEXT:    vmv.x.s a2, v10
+; RV32-NEXT:    vredmaxu.vs v8, v14, v14
+; RV32-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV32-NEXT:    vslide1up.vx v9, v8, a2
+; RV32-NEXT:    vslide1up.vx v10, v9, a1
+; RV32-NEXT:    vslide1up.vx v8, v10, a0
+; RV32-NEXT:    ret
+;
+; RV64V-ONLY-LABEL: buildvec_vredmax:
+; RV64V-ONLY:       # %bb.0:
+; RV64V-ONLY-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64V-ONLY-NEXT:    vredmaxu.vs v8, v8, v8
+; RV64V-ONLY-NEXT:    vredmaxu.vs v9, v10, v10
+; RV64V-ONLY-NEXT:    vredmaxu.vs v10, v12, v12
+; RV64V-ONLY-NEXT:    vmv.x.s a0, v8
+; RV64V-ONLY-NEXT:    vmv.x.s a1, v9
+; RV64V-ONLY-NEXT:    vmv.x.s a2, v10
+; RV64V-ONLY-NEXT:    vredmaxu.vs v8, v14, v14
+; RV64V-ONLY-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV64V-ONLY-NEXT:    vslide1up.vx v9, v8, a2
+; RV64V-ONLY-NEXT:    vslide1up.vx v10, v9, a1
+; RV64V-ONLY-NEXT:    vslide1up.vx v8, v10, a0
+; RV64V-ONLY-NEXT:    ret
+;
+; RVA22U64-LABEL: buildvec_vredmax:
+; RVA22U64:       # %bb.0:
+; RVA22U64-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-NEXT:    vredmaxu.vs v8, v8, v8
+; RVA22U64-NEXT:    vredmaxu.vs v9, v10, v10
+; RVA22U64-NEXT:    vredmaxu.vs v10, v12, v12
+; RVA22U64-NEXT:    vredmaxu.vs v11, v14, v14
+; RVA22U64-NEXT:    vmv.x.s a0, v8
+; RVA22U64-NEXT:    vmv.x.s a1, v9
+; RVA22U64-NEXT:    vmv.x.s a2, v10
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a0, a0, a1
+; RVA22U64-NEXT:    vmv.x.s a1, v11
+; RVA22U64-NEXT:    slli a1, a1, 32
+; RVA22U64-NEXT:    add.uw a1, a2, a1
+; RVA22U64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-NEXT:    vmv.v.x v8, a0
+; RVA22U64-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-NEXT:    ret
+;
+; RVA22U64-PACK-LABEL: buildvec_vredmax:
+; RVA22U64-PACK:       # %bb.0:
+; RVA22U64-PACK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v8, v8, v8
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v9, v10, v10
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v10, v12, v12
+; RVA22U64-PACK-NEXT:    vredmaxu.vs v11, v14, v14
+; RVA22U64-PACK-NEXT:    vmv.x.s a0, v8
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v9
+; RVA22U64-PACK-NEXT:    vmv.x.s a2, v10
+; RVA22U64-PACK-NEXT:    pack a0, a0, a1
+; RVA22U64-PACK-NEXT:    vmv.x.s a1, v11
+; RVA22U64-PACK-NEXT:    pack a1, a2, a1
+; RVA22U64-PACK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RVA22U64-PACK-NEXT:    vmv.v.x v8, a0
+; RVA22U64-PACK-NEXT:    vslide1down.vx v8, v8, a1
+; RVA22U64-PACK-NEXT:    ret
+;
+; RV64ZVE32-LABEL: buildvec_vredmax:
+; RV64ZVE32:       # %bb.0:
+; RV64ZVE32-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; RV64ZVE32-NEXT:    vredmaxu.vs v8, v8, v8
+; RV64ZVE32-NEXT:    vredmaxu.vs v9, v10, v10
+; RV64ZVE32-NEXT:    vredmaxu.vs v10, v12, v12
+; RV64ZVE32-NEXT:    vmv.x.s a0, v8
+; RV64ZVE32-NEXT:    vmv.x.s a1, v9
+; RV64ZVE32-NEXT:    vmv.x.s a2, v10
+; RV64ZVE32-NEXT:    vredmaxu.vs v8, v14, v14
+; RV64ZVE32-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; RV64ZVE32-NEXT:    vslide1up.vx v9, v8, a2
+; RV64ZVE32-NEXT:    vslide1up.vx v10, v9, a1
+; RV64ZVE32-NEXT:    vslide1up.vx v8, v10, a0
+; RV64ZVE32-NEXT:    ret
+  %247 = tail call i32 @llvm.vector.reduce.umax.v8i32(<8 x i32> %arg0)
+  %248 = insertelement <4 x i32> poison, i32 %247, i64 0
+  %250 = tail call i32 @llvm.vector.reduce.umax.v8i32(<8 x i32> %arg1)
+  %251 = insertelement <4 x i32> %248, i32 %250, i64 1
+  %252 = tail call i32 @llvm.vector.reduce.umax.v8i32(<8 x i32> %arg2)
+  %253 = insertelement <4 x i32> %251, i32 %252, i64 2
+  %254 = tail call i32 @llvm.vector.reduce.umax.v8i32(<8 x i32> %arg3)
+  %255 = insertelement <4 x i32> %253, i32 %254, i64 3
+  ret <4 x i32> %255
+}
+
 ;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
 ; RV64: {{.*}}
diff --git a/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll b/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll
index da912bf401ec0..821d4240827fb 100644
--- a/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/redundant-vfmvsf.ll
@@ -9,12 +9,11 @@ define <2 x float> @redundant_vfmv(<2 x float> %arg0, <64 x float> %arg1, <64 x
 ; CHECK-NEXT:    vfredusum.vs v9, v12, v8
 ; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
 ; CHECK-NEXT:    vslidedown.vi v8, v8, 1
+; CHECK-NEXT:    vfmv.f.s fa5, v9
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m4, ta, ma
-; CHECK-NEXT:    vfredusum.vs v8, v16, v8
-; CHECK-NEXT:    vfmv.f.s fa5, v8
+; CHECK-NEXT:    vfredusum.vs v9, v16, v8
 ; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vrgather.vi v8, v9, 0
-; CHECK-NEXT:    vfslide1down.vf v8, v8, fa5
+; CHECK-NEXT:    vfslide1up.vf v8, v9, fa5
 ; CHECK-NEXT:    ret
   %s0 = extractelement <2 x float> %arg0, i64 0
   %r0 = tail call reassoc float @llvm.vector.reduce.fadd.v64f32(float %s0, <64 x float> %arg1)

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

topperc · 2025-08-20T20:06:48Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+                         Mask, VL, Policy);
+  };
+
+  // General case: splat the first operand and sliding other operands down one


Suggested change

// General case: splat the first operand and sliding other operands down one

// General case: splat the first operand and slide other operands down one

Co-authored-by: Craig Topper <craig.topper@sifive.com>

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

Co-Authored-By: Luke Lau <luke@igalia.com>

lukel97 · 2025-08-22T06:59:14Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

-    auto OpCode =
-      VT.isFloatingPoint() ? RISCVISD::VFSLIDE1DOWN_VL : RISCVISD::VSLIDE1DOWN_VL;
+
+    unsigned OpCode;


I think it's spelt as one word in the rest of LLVM

Suggested change

unsigned OpCode;

unsigned Opcode;

Gentle reverse ping! I don't see any new commits, were they pushed?

Oops yes I forgot to push, thanks for reminding. It should be synced now

lukel97 · 2025-08-22T07:04:01Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

    if (!VT.isFloatingPoint())
      V = DAG.getNode(ISD::ANY_EXTEND, DL, Subtarget.getXLenVT(), V);
    Vec = DAG.getNode(OpCode, DL, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,
                      V, Mask, VL);
  }
  if (UndefCount) {
    const SDValue Offset = DAG.getConstant(UndefCount, DL, Subtarget.getXLenVT());
-    Vec = getVSlidedown(DAG, Subtarget, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
-                        Vec, Offset, Mask, VL, Policy);
+    Vec = getVSlide(SlideUp, ContainerVT, DAG.getUNDEF(ContainerVT), Vec,


I don't think we have test coverage for emitting a vslideup. I think we can add a test for an undef in the middle of the build_vector, and another test with an undef at the start of the build_vector?

I've added tests for both cases

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

preames

So, high level question on profitability. The reason we prefer slidedown is that slideup has a register constraint which forces the use of two registers through the build sequence. Particularly at high LMUL, doubling the register pressure is really expensive. We do cap at m2 today, so maybe that's not too bad a problem?

For the motivating sequence, have you considered using a normal slidedown sequence, and then a single vslideup (not slide1up) from the source register into the last destination? I think you end up with one extra slide1down (to put the other elements in the right spot), but this might be cheaper overall?

mshockwave · 2025-08-26T23:01:49Z

For the motivating sequence, have you considered using a normal slidedown sequence, and then a single vslideup (not slide1up) from the source register into the last destination?

I think this is a good idea, though to give even more context: my actual motivation example is a build_vector where every operands are coming from a reduction -- that is, every operands is an (extract_element X, 0). With this patch, the following sequence

// v8[0], v9[0], and v10[0] holds build_vector's first, second, and third operations
vfmv.f.s a1, v9
vfmv.f.s a2, v10
vfrgather.vi v8, v8, 0 // splat
vfslide1down v8, v8, a1
vfslide1down v8, v8, a2

Is turned into

vfmv.f.s a0, v8
vfmv.f.s a1, v9
vfslide1up v9, v10, a1
vfslide1up v8, v9, a0

This patch will only save a single instruction (i.e. splat) no matter how many operands there are. But if we also take the follow-up patch #154847 into consideration, that patch further eliminates all the vector to scalar moves:

vfslideup v9, v10, 1
vfslide1up v8, v9, 1

And the number of eliminated moves is proportional to the number of build_vector's operands. In other words, #154847 is a bigger win and it's more or less depending on this patch.

The thing is, I don't think we can implement #154847 's move-elimination algorithm with vslidedown, because vslidedown reads pass the VL, unless we...concatenate v9 after v8 before sliding down, which I don't think will be profitable (note: the approach you proposed will eliminate the vector-to-scalar move of the last operand but not for other operands)

That being said, I understand your concern about register pressure imposed by vslideup / vslide1up. Let me think about this.

…extraction from first element

mshockwave · 2025-08-28T18:25:41Z

I have limited the scope of this optimization and it now only triggers when every operands in build_vector is an extraction from the first vector element. This is a tradeoff between register pressure brought by vslide1up and the potential benefit of eliminating vector to scalar after #154847 is landed.

mshockwave · 2025-08-28T18:28:55Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+  // capture the wrong EVec.
+  for (SDValue V : Operands) {
+    using namespace SDPatternMatch;
+    SlideUp = V.isUndef() || sd_match(V, m_ExtractElt(m_Value(EVec), m_Zero()));


Note that even we have interleaving undef "intervals", in the worst case the number of those intervals will only be one more than the number of non-undef values.

mshockwave requested review from preames, lukel97 and topperc August 20, 2025 00:44

llvmbot added the backend:RISC-V label Aug 20, 2025

mshockwave mentioned this pull request Aug 20, 2025

[RISCV] Handle more cases when combining (vfmv.s.f (extract_subvector X, 0)) #154175

Merged

lukel97 reviewed Aug 20, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

mshockwave added 3 commits August 20, 2025 10:56

Pre-commit tests

f74a607

[RISCV] Use slideup when the last build_vector operand is a reduction

c5b56c2

fixup! Generalize this into non-reduction operations as well

217402a

mshockwave force-pushed the patch/rvv/buildvec-slideup branch from e5aa2a8 to 217402a Compare August 20, 2025 17:59

mshockwave changed the title ~~[RISCV] Use slideup to lower build_vector when its last operand is a reduction~~ [RISCV] Use slideup to lower build_vector when its last operand is a extraction Aug 20, 2025

mshockwave changed the title ~~[RISCV] Use slideup to lower build_vector when its last operand is a extraction~~ [RISCV] Use slideup to lower build_vector when its last operand is an extraction Aug 20, 2025

topperc reviewed Aug 20, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

fixup! Use std::reverse

3dec8ff

topperc reviewed Aug 20, 2025

View reviewed changes

Update llvm/lib/Target/RISCV/RISCVISelLowering.cpp

1d8b13e

Co-authored-by: Craig Topper <craig.topper@sifive.com>

lukel97 reviewed Aug 21, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

fixup! Address review comments

ac83561

Co-Authored-By: Luke Lau <luke@igalia.com>

mshockwave mentioned this pull request Aug 21, 2025

[RISCV] Fold (vslide1up undef, v, (extract_elt x, 0)) into (vslideup x, v, 1) #154847

Open

lukel97 reviewed Aug 22, 2025

View reviewed changes

fixup! Address review comments and add more tests

ed3f456

preames reviewed Aug 26, 2025

View reviewed changes

fixup! Limit the condition to build_vector with *all* operands being …

41f1a97

…extraction from first element

mshockwave requested review from preames, lukel97 and topperc August 28, 2025 18:25

mshockwave commented Aug 28, 2025

View reviewed changes

mshockwave changed the title ~~[RISCV] Use slideup to lower build_vector when its last operand is an extraction~~ [RISCV] Use slideup to lower build_vector when all operand are (extract_element X, 0) Aug 28, 2025

	// General case: splat the first operand and sliding other operands down one
	// General case: splat the first operand and slide other operands down one

[RISCV] Use slideup to lower build_vector when all operand are (extract_element X, 0) #154450

Are you sure you want to change the base?

[RISCV] Use slideup to lower build_vector when all operand are (extract_element X, 0) #154450

Conversation

mshockwave commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

mshockwave commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshockwave commented Aug 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mshockwave commented Aug 20, 2025 •

edited

Loading

mshockwave commented Aug 26, 2025 •

edited

Loading