s390x: pattern match saturated truncation #155377

folkertdev · 2025-08-26T08:57:55Z

This does not yet work, but should make it easier to talk about.

After legalizing smin and smax (I'm still matching the older patterns below, not sure if that is really needed any more?), the DAG is

Optimized legalized selection DAG: %bb.0 'i32_signed:bb2'
SelectionDAG has 17 nodes:
  t0: ch,glue = EntryToken
  t17: v4i32 = BUILD_VECTOR Constant:i32<-32768>, Constant:i32<-32768>, Constant:i32<-32768>, Constant:i32<-32768>
  t16: v4i32 = BUILD_VECTOR Constant:i32<32767>, Constant:i32<32767>, Constant:i32<32767>, Constant:i32<32767>
          t2: v4i32,ch = CopyFromReg t0, Register:v4i32 %0
        t18: v4i32 = smax t2, t17
      t20: v4i32 = smin t18, t16
          t4: v4i32,ch = CopyFromReg t0, Register:v4i32 %1
        t19: v4i32 = smax t4, t17
      t21: v4i32 = smin t19, t16
    t50: v8i16 = SystemZISD::PACK t20, t21
  t14: ch,glue = CopyToReg t0, Register:v8i16 $v24, t50
  t15: ch = SystemZISD::RET_GLUE t14, Register:v8i16 $v24, t14:1

However, the pattern still does not match. I'm now suspicious of those BUILD_VECTORs, because the rest just looks like it should work.

I'm also aware that the order of the patterns matters, which is why the VPKSF pattern is now before VPKF.

cc @uweigand

github-actions · 2025-08-26T09:02:06Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/SystemZ/SystemZInstrVector.td

folkertdev · 2025-08-26T09:48:15Z

That cleans things up a bunch, still no luck on actually matching that DAG though...

folkertdev · 2025-08-26T11:16:11Z

found the issue: after some copy/pasting the literal value for min and max was the same. That's not right. Also to match on signed values, the pattern must use Imm.getSExtValue(), otherwise that match fails.

Now I'll just need to generalize this a bit.

llvm/lib/Target/SystemZ/SystemZOperators.td

llvm/lib/Target/SystemZ/SystemZInstrVector.td

uweigand

Looking pretty good now! I'm wondering - should we handle PCKLS similarly?

llvm/lib/Target/SystemZ/SystemZInstrVector.td

folkertdev · 2025-08-26T13:53:13Z

I added the 128-bit integer comparisons and support for PCKLS, and moved the multiclasses. Probably some naming/formatting nits remain, but otherwise it's looking alright.

uweigand

Looks all good to me. Only remaining question is whether we can drop the IntegerMinMaxVectorOps patterns now - if the tests still all pass, we should just do it as part of this patch.

folkertdev · 2025-08-26T14:05:01Z

Looks like that works out, bin/llvm-lit ../llvm/test/CodeGen/SystemZ is clean locally, so I've removed those patterns.

uweigand

Excellent, thanks! I think this is ready to move out of draft status.

llvm/test/CodeGen/SystemZ/inline-thresh-adjust.ll

The more general smin/umin/smax/umax are now legal, so these patterns are no longer needed

llvmbot · 2025-08-26T14:20:25Z

@llvm/pr-subscribers-backend-systemz

Author: Folkert de Vries (folkertdev)

Changes

fixes #153655

This does not yet work, but should make it easier to talk about.

After legalizing smin and smax (I'm still matching the older patterns below, not sure if that is really needed any more?), the DAG is

Optimized legalized selection DAG: %bb.0 'i32_signed:bb2'
SelectionDAG has 17 nodes:
  t0: ch,glue = EntryToken
  t17: v4i32 = BUILD_VECTOR Constant:i32&lt;-32768&gt;, Constant:i32&lt;-32768&gt;, Constant:i32&lt;-32768&gt;, Constant:i32&lt;-32768&gt;
  t16: v4i32 = BUILD_VECTOR Constant:i32&lt;32767&gt;, Constant:i32&lt;32767&gt;, Constant:i32&lt;32767&gt;, Constant:i32&lt;32767&gt;
          t2: v4i32,ch = CopyFromReg t0, Register:v4i32 %0
        t18: v4i32 = smax t2, t17
      t20: v4i32 = smin t18, t16
          t4: v4i32,ch = CopyFromReg t0, Register:v4i32 %1
        t19: v4i32 = smax t4, t17
      t21: v4i32 = smin t19, t16
    t50: v8i16 = SystemZISD::PACK t20, t21
  t14: ch,glue = CopyToReg t0, Register:v8i16 $v24, t50
  t15: ch = SystemZISD::RET_GLUE t14, Register:v8i16 $v24, t14:1

However, the pattern still does not match. I'm now suspicious of those BUILD_VECTORs, because the rest just looks like it should work.

I'm also aware that the order of the patterns matters, which is why the VPKSF pattern is now before VPKF.

cc @uweigand

Full diff: https://github.com/llvm/llvm-project/pull/155377.diff

6 Files Affected:

(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+6)
(modified) llvm/lib/Target/SystemZ/SystemZInstrVector.td (+55-64)
(modified) llvm/lib/Target/SystemZ/SystemZOperators.td (+25)
(modified) llvm/test/CodeGen/SystemZ/int-max-02.ll (+8-8)
(modified) llvm/test/CodeGen/SystemZ/int-min-02.ll (+8-8)
(added) llvm/test/CodeGen/SystemZ/saturating-truncation.ll (+95)

diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index c73dc3021eb42..040909949dc1d 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -287,6 +287,9 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
     // Additional instructions available with z17.
     if (Subtarget.hasVectorEnhancements3()) {
       setOperationAction(ISD::ABS, MVT::i128, Legal);
+
+      setOperationAction({ISD::SMIN, ISD::UMIN, ISD::SMAX, ISD::UMAX},
+                         MVT::i128, Legal);
     }
   }
 
@@ -492,6 +495,9 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
       // Map SETCCs onto one of VCE, VCH or VCHL, swapping the operands
       // and inverting the result as necessary.
       setOperationAction(ISD::SETCC, VT, Custom);
+
+      setOperationAction({ISD::SMIN, ISD::UMIN, ISD::SMAX, ISD::UMAX}, VT,
+                         Legal);
     }
   }
 
diff --git a/llvm/lib/Target/SystemZ/SystemZInstrVector.td b/llvm/lib/Target/SystemZ/SystemZInstrVector.td
index 10de8b05cf45f..479bab5ce62b8 100644
--- a/llvm/lib/Target/SystemZ/SystemZInstrVector.td
+++ b/llvm/lib/Target/SystemZ/SystemZInstrVector.td
@@ -680,41 +680,41 @@ let Predicates = [FeatureVector] in {
   let isCommutable = 1 in {
     // Maximum.
     def VMX  : BinaryVRRcGeneric<"vmx", 0xE7FF>;
-    def VMXB : BinaryVRRc<"vmxb", 0xE7FF, null_frag, v128b, v128b, 0>;
-    def VMXH : BinaryVRRc<"vmxh", 0xE7FF, null_frag, v128h, v128h, 1>;
-    def VMXF : BinaryVRRc<"vmxf", 0xE7FF, null_frag, v128f, v128f, 2>;
-    def VMXG : BinaryVRRc<"vmxg", 0xE7FF, null_frag, v128g, v128g, 3>;
+    def VMXB : BinaryVRRc<"vmxb", 0xE7FF, smax, v128b, v128b, 0>;
+    def VMXH : BinaryVRRc<"vmxh", 0xE7FF, smax, v128h, v128h, 1>;
+    def VMXF : BinaryVRRc<"vmxf", 0xE7FF, smax, v128f, v128f, 2>;
+    def VMXG : BinaryVRRc<"vmxg", 0xE7FF, smax, v128g, v128g, 3>;
     let Predicates = [FeatureVectorEnhancements3] in
-      def VMXQ : BinaryVRRc<"vmxq", 0xE7FF, null_frag, v128q, v128q, 4>;
+      def VMXQ : BinaryVRRc<"vmxq", 0xE7FF, smax, v128q, v128q, 4>;
 
     // Maximum logical.
     def VMXL  : BinaryVRRcGeneric<"vmxl", 0xE7FD>;
-    def VMXLB : BinaryVRRc<"vmxlb", 0xE7FD, null_frag, v128b, v128b, 0>;
-    def VMXLH : BinaryVRRc<"vmxlh", 0xE7FD, null_frag, v128h, v128h, 1>;
-    def VMXLF : BinaryVRRc<"vmxlf", 0xE7FD, null_frag, v128f, v128f, 2>;
-    def VMXLG : BinaryVRRc<"vmxlg", 0xE7FD, null_frag, v128g, v128g, 3>;
+    def VMXLB : BinaryVRRc<"vmxlb", 0xE7FD, umax, v128b, v128b, 0>;
+    def VMXLH : BinaryVRRc<"vmxlh", 0xE7FD, umax, v128h, v128h, 1>;
+    def VMXLF : BinaryVRRc<"vmxlf", 0xE7FD, umax, v128f, v128f, 2>;
+    def VMXLG : BinaryVRRc<"vmxlg", 0xE7FD, umax, v128g, v128g, 3>;
     let Predicates = [FeatureVectorEnhancements3] in
-      def VMXLQ : BinaryVRRc<"vmxlq", 0xE7FD, null_frag, v128q, v128q, 4>;
+      def VMXLQ : BinaryVRRc<"vmxlq", 0xE7FD, umax, v128q, v128q, 4>;
   }
 
   let isCommutable = 1 in {
     // Minimum.
     def VMN  : BinaryVRRcGeneric<"vmn", 0xE7FE>;
-    def VMNB : BinaryVRRc<"vmnb", 0xE7FE, null_frag, v128b, v128b, 0>;
-    def VMNH : BinaryVRRc<"vmnh", 0xE7FE, null_frag, v128h, v128h, 1>;
-    def VMNF : BinaryVRRc<"vmnf", 0xE7FE, null_frag, v128f, v128f, 2>;
-    def VMNG : BinaryVRRc<"vmng", 0xE7FE, null_frag, v128g, v128g, 3>;
+    def VMNB : BinaryVRRc<"vmnb", 0xE7FE, smin, v128b, v128b, 0>;
+    def VMNH : BinaryVRRc<"vmnh", 0xE7FE, smin, v128h, v128h, 1>;
+    def VMNF : BinaryVRRc<"vmnf", 0xE7FE, smin, v128f, v128f, 2>;
+    def VMNG : BinaryVRRc<"vmng", 0xE7FE, smin, v128g, v128g, 3>;
     let Predicates = [FeatureVectorEnhancements3] in
-      def VMNQ : BinaryVRRc<"vmnq", 0xE7FE, null_frag, v128q, v128q, 4>;
+      def VMNQ : BinaryVRRc<"vmnq", 0xE7FE, smin, v128q, v128q, 4>;
 
     // Minimum logical.
     def VMNL  : BinaryVRRcGeneric<"vmnl", 0xE7FC>;
-    def VMNLB : BinaryVRRc<"vmnlb", 0xE7FC, null_frag, v128b, v128b, 0>;
-    def VMNLH : BinaryVRRc<"vmnlh", 0xE7FC, null_frag, v128h, v128h, 1>;
-    def VMNLF : BinaryVRRc<"vmnlf", 0xE7FC, null_frag, v128f, v128f, 2>;
-    def VMNLG : BinaryVRRc<"vmnlg", 0xE7FC, null_frag, v128g, v128g, 3>;
+    def VMNLB : BinaryVRRc<"vmnlb", 0xE7FC, umin, v128b, v128b, 0>;
+    def VMNLH : BinaryVRRc<"vmnlh", 0xE7FC, umin, v128h, v128h, 1>;
+    def VMNLF : BinaryVRRc<"vmnlf", 0xE7FC, umin, v128f, v128f, 2>;
+    def VMNLG : BinaryVRRc<"vmnlg", 0xE7FC, umin, v128g, v128g, 3>;
     let Predicates = [FeatureVectorEnhancements3] in
-      def VMNLQ : BinaryVRRc<"vmnlq", 0xE7FC, null_frag, v128q, v128q, 4>;
+      def VMNLQ : BinaryVRRc<"vmnlq", 0xE7FC, umin, v128q, v128q, 4>;
   }
 
   let isCommutable = 1 in {
@@ -1250,54 +1250,45 @@ defm : IntegerAbsoluteVectorOps<v8i16, VLCH, VLPH, 15>;
 defm : IntegerAbsoluteVectorOps<v4i32, VLCF, VLPF, 31>;
 defm : IntegerAbsoluteVectorOps<v2i64, VLCG, VLPG, 63>;
 
-// Instantiate minimum- and maximum-related patterns for TYPE.  CMPH is the
-// signed or unsigned "set if greater than" comparison instruction and
-// MIN and MAX are the associated minimum and maximum instructions.
-multiclass IntegerMinMaxVectorOps<ValueType type, SDPatternOperator cmph,
-                                  Instruction min, Instruction max> {
-  let Predicates = [FeatureVector] in {
-    def : Pat<(type (vselect (cmph VR128:$x, VR128:$y), VR128:$x, VR128:$y)),
-              (max VR128:$x, VR128:$y)>;
-    def : Pat<(type (vselect (cmph VR128:$x, VR128:$y), VR128:$y, VR128:$x)),
-              (min VR128:$x, VR128:$y)>;
-    def : Pat<(type (vselect (z_vnot (cmph VR128:$x, VR128:$y)),
-                             VR128:$x, VR128:$y)),
-              (min VR128:$x, VR128:$y)>;
-    def : Pat<(type (vselect (z_vnot (cmph VR128:$x, VR128:$y)),
-                             VR128:$y, VR128:$x)),
-              (max VR128:$x, VR128:$y)>;
-  }
+// Instantiate packs/packu: recognize a saturating truncation and convert
+// into the corresponding packs/packu instruction.
+multiclass SignedSaturatingTruncate<ValueType input, ValueType output,
+                                    Instruction packs> {
+  def : Pat<
+    (output (z_pack
+      (smin (smax (input VR128:$a), ssat_trunc_min_vec), ssat_trunc_max_vec),
+      (smin (smax (input VR128:$b), ssat_trunc_min_vec), ssat_trunc_max_vec)
+    )),
+    (packs VR128:$a, VR128:$b)
+  >;
+
+  def : Pat<
+    (output (z_pack
+      (smax (smin (input VR128:$a), ssat_trunc_max_vec), ssat_trunc_min_vec),
+      (smax (smin (input VR128:$b), ssat_trunc_max_vec), ssat_trunc_min_vec)
+    )),
+    (packs VR128:$a, VR128:$b)
+  >;
 }
 
-// Signed min/max.
-defm : IntegerMinMaxVectorOps<v16i8, z_vicmph, VMNB, VMXB>;
-defm : IntegerMinMaxVectorOps<v8i16, z_vicmph, VMNH, VMXH>;
-defm : IntegerMinMaxVectorOps<v4i32, z_vicmph, VMNF, VMXF>;
-defm : IntegerMinMaxVectorOps<v2i64, z_vicmph, VMNG, VMXG>;
-
-let Predicates = [FeatureVectorEnhancements3] in {
-  def : Pat<(i128 (or (and VR128:$x, (z_vicmph VR128:$x, VR128:$y)),
-                      (and VR128:$y, (not (z_vicmph VR128:$x, VR128:$y))))),
-            (VMXQ VR128:$x, VR128:$y)>;
-  def : Pat<(i128 (or (and VR128:$y, (z_vicmph VR128:$x, VR128:$y)),
-                      (and VR128:$x, (not (z_vicmph VR128:$x, VR128:$y))))),
-            (VMNQ VR128:$x, VR128:$y)>;
+defm : SignedSaturatingTruncate<v8i16, v16i8, VPKSH>;
+defm : SignedSaturatingTruncate<v4i32, v8i16, VPKSF>;
+defm : SignedSaturatingTruncate<v2i64, v4i32, VPKSG>;
+
+multiclass UnsignedSaturatingTruncate<ValueType input, ValueType output,
+                                      Instruction packu> {
+  def : Pat<
+    (output (z_pack
+      (umin (input VR128:$a), usat_trunc_max_vec),
+      (umin (input VR128:$b), usat_trunc_max_vec)
+    )),
+    (packu VR128:$a, VR128:$b)
+  >;
 }
 
-// Unsigned min/max.
-defm : IntegerMinMaxVectorOps<v16i8, z_vicmphl, VMNLB, VMXLB>;
-defm : IntegerMinMaxVectorOps<v8i16, z_vicmphl, VMNLH, VMXLH>;
-defm : IntegerMinMaxVectorOps<v4i32, z_vicmphl, VMNLF, VMXLF>;
-defm : IntegerMinMaxVectorOps<v2i64, z_vicmphl, VMNLG, VMXLG>;
-
-let Predicates = [FeatureVectorEnhancements3] in {
-  def : Pat<(i128 (or (and VR128:$x, (z_vicmphl VR128:$x, VR128:$y)),
-                      (and VR128:$y, (not (z_vicmphl VR128:$x, VR128:$y))))),
-            (VMXLQ VR128:$x, VR128:$y)>;
-  def : Pat<(i128 (or (and VR128:$y, (z_vicmphl VR128:$x, VR128:$y)),
-                      (and VR128:$x, (not (z_vicmphl VR128:$x, VR128:$y))))),
-            (VMNLQ VR128:$x, VR128:$y)>;
-}
+defm : UnsignedSaturatingTruncate<v8i16, v16i8, VPKLSH>;
+defm : UnsignedSaturatingTruncate<v4i32, v8i16, VPKLSF>;
+defm : UnsignedSaturatingTruncate<v2i64, v4i32, VPKLSG>;
 
 // Instantiate comparison patterns to recognize VACC/VSCBI for TYPE.
 multiclass IntegerComputeCarryOrBorrow<ValueType type,
diff --git a/llvm/lib/Target/SystemZ/SystemZOperators.td b/llvm/lib/Target/SystemZ/SystemZOperators.td
index 39e216b993b11..547d3dcf92804 100644
--- a/llvm/lib/Target/SystemZ/SystemZOperators.td
+++ b/llvm/lib/Target/SystemZ/SystemZOperators.td
@@ -1067,6 +1067,31 @@ def vsplat_imm_eq_1 : PatFrag<(ops), (build_vector), [{
 }]>;
 def z_vzext1 : PatFrag<(ops node:$x), (and node:$x, vsplat_imm_eq_1)>;
 
+// Vector constants for saturating truncation, containing the minimum and
+// maximum value for the integer type that is half of the element width.
+def ssat_trunc_min_vec: PatFrag<(ops), (build_vector), [{
+  APInt Imm;
+  EVT EltTy = N->getValueType(0).getVectorElementType();
+  unsigned SizeInBits = EltTy.getSizeInBits();
+  APInt min = APInt::getSignedMinValue(SizeInBits / 2).sext(SizeInBits);
+  return ISD::isConstantSplatVector(N, Imm) && APInt::isSameValue(Imm, min);
+}]>;
+def ssat_trunc_max_vec: PatFrag<(ops), (build_vector), [{
+  APInt Imm;
+  EVT EltTy = N->getValueType(0).getVectorElementType();
+  unsigned SizeInBits = EltTy.getSizeInBits();
+  APInt max = APInt::getSignedMaxValue(SizeInBits / 2).sext(SizeInBits);
+  return ISD::isConstantSplatVector(N, Imm) && APInt::isSameValue(Imm, max);
+}]>;
+
+def usat_trunc_max_vec: PatFrag<(ops), (build_vector), [{
+  APInt Imm;
+  EVT EltTy = N->getValueType(0).getVectorElementType();
+  unsigned SizeInBits = EltTy.getSizeInBits();
+  APInt max = APInt::getMaxValue(SizeInBits / 2).zext(SizeInBits);
+  return ISD::isConstantSplatVector(N, Imm) && APInt::isSameValue(Imm, max);
+}]>;
+
 // Signed "integer greater than zero" on vectors.
 def z_vicmph_zero : PatFrag<(ops node:$x), (z_vicmph node:$x, immAllZerosV)>;
 
diff --git a/llvm/test/CodeGen/SystemZ/int-max-02.ll b/llvm/test/CodeGen/SystemZ/int-max-02.ll
index 5f5188c66065d..00fd01a0ccd63 100644
--- a/llvm/test/CodeGen/SystemZ/int-max-02.ll
+++ b/llvm/test/CodeGen/SystemZ/int-max-02.ll
@@ -7,8 +7,8 @@
 define i128 @f1(i128 %val1, i128 %val2) {
 ; CHECK-LABEL: f1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vl %v0, 0(%r3), 3
-; CHECK-NEXT:    vl %v1, 0(%r4), 3
+; CHECK-NEXT:    vl %v0, 0(%r4), 3
+; CHECK-NEXT:    vl %v1, 0(%r3), 3
 ; CHECK-NEXT:    vmxq %v0, %v1, %v0
 ; CHECK-NEXT:    vst %v0, 0(%r2), 3
 ; CHECK-NEXT:    br %r14
@@ -49,8 +49,8 @@ define i128 @f3(i128 %val1, i128 %val2) {
 define i128 @f4(i128 %val1, i128 %val2) {
 ; CHECK-LABEL: f4:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vl %v0, 0(%r3), 3
-; CHECK-NEXT:    vl %v1, 0(%r4), 3
+; CHECK-NEXT:    vl %v0, 0(%r4), 3
+; CHECK-NEXT:    vl %v1, 0(%r3), 3
 ; CHECK-NEXT:    vmxq %v0, %v1, %v0
 ; CHECK-NEXT:    vst %v0, 0(%r2), 3
 ; CHECK-NEXT:    br %r14
@@ -63,8 +63,8 @@ define i128 @f4(i128 %val1, i128 %val2) {
 define i128 @f5(i128 %val1, i128 %val2) {
 ; CHECK-LABEL: f5:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vl %v0, 0(%r3), 3
-; CHECK-NEXT:    vl %v1, 0(%r4), 3
+; CHECK-NEXT:    vl %v0, 0(%r4), 3
+; CHECK-NEXT:    vl %v1, 0(%r3), 3
 ; CHECK-NEXT:    vmxlq %v0, %v1, %v0
 ; CHECK-NEXT:    vst %v0, 0(%r2), 3
 ; CHECK-NEXT:    br %r14
@@ -105,8 +105,8 @@ define i128 @f7(i128 %val1, i128 %val2) {
 define i128 @f8(i128 %val1, i128 %val2) {
 ; CHECK-LABEL: f8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vl %v0, 0(%r3), 3
-; CHECK-NEXT:    vl %v1, 0(%r4), 3
+; CHECK-NEXT:    vl %v0, 0(%r4), 3
+; CHECK-NEXT:    vl %v1, 0(%r3), 3
 ; CHECK-NEXT:    vmxlq %v0, %v1, %v0
 ; CHECK-NEXT:    vst %v0, 0(%r2), 3
 ; CHECK-NEXT:    br %r14
diff --git a/llvm/test/CodeGen/SystemZ/int-min-02.ll b/llvm/test/CodeGen/SystemZ/int-min-02.ll
index 3066af924fb8e..f13db7c4b8995 100644
--- a/llvm/test/CodeGen/SystemZ/int-min-02.ll
+++ b/llvm/test/CodeGen/SystemZ/int-min-02.ll
@@ -7,8 +7,8 @@
 define i128 @f1(i128 %val1, i128 %val2) {
 ; CHECK-LABEL: f1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vl %v0, 0(%r4), 3
-; CHECK-NEXT:    vl %v1, 0(%r3), 3
+; CHECK-NEXT:    vl %v0, 0(%r3), 3
+; CHECK-NEXT:    vl %v1, 0(%r4), 3
 ; CHECK-NEXT:    vmnq %v0, %v1, %v0
 ; CHECK-NEXT:    vst %v0, 0(%r2), 3
 ; CHECK-NEXT:    br %r14
@@ -49,8 +49,8 @@ define i128 @f3(i128 %val1, i128 %val2) {
 define i128 @f4(i128 %val1, i128 %val2) {
 ; CHECK-LABEL: f4:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vl %v0, 0(%r4), 3
-; CHECK-NEXT:    vl %v1, 0(%r3), 3
+; CHECK-NEXT:    vl %v0, 0(%r3), 3
+; CHECK-NEXT:    vl %v1, 0(%r4), 3
 ; CHECK-NEXT:    vmnq %v0, %v1, %v0
 ; CHECK-NEXT:    vst %v0, 0(%r2), 3
 ; CHECK-NEXT:    br %r14
@@ -63,8 +63,8 @@ define i128 @f4(i128 %val1, i128 %val2) {
 define i128 @f5(i128 %val1, i128 %val2) {
 ; CHECK-LABEL: f5:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vl %v0, 0(%r4), 3
-; CHECK-NEXT:    vl %v1, 0(%r3), 3
+; CHECK-NEXT:    vl %v0, 0(%r3), 3
+; CHECK-NEXT:    vl %v1, 0(%r4), 3
 ; CHECK-NEXT:    vmnlq %v0, %v1, %v0
 ; CHECK-NEXT:    vst %v0, 0(%r2), 3
 ; CHECK-NEXT:    br %r14
@@ -105,8 +105,8 @@ define i128 @f7(i128 %val1, i128 %val2) {
 define i128 @f8(i128 %val1, i128 %val2) {
 ; CHECK-LABEL: f8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vl %v0, 0(%r4), 3
-; CHECK-NEXT:    vl %v1, 0(%r3), 3
+; CHECK-NEXT:    vl %v0, 0(%r3), 3
+; CHECK-NEXT:    vl %v1, 0(%r4), 3
 ; CHECK-NEXT:    vmnlq %v0, %v1, %v0
 ; CHECK-NEXT:    vst %v0, 0(%r2), 3
 ; CHECK-NEXT:    br %r14
diff --git a/llvm/test/CodeGen/SystemZ/saturating-truncation.ll b/llvm/test/CodeGen/SystemZ/saturating-truncation.ll
new file mode 100644
index 0000000000000..0ea29202c1ef5
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/saturating-truncation.ll
@@ -0,0 +1,95 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z17 | FileCheck %s
+
+declare <8 x i32> @llvm.smin.v8i32(<8 x i32>, <8 x i32>) #2
+declare <8 x i32> @llvm.smax.v8i32(<8 x i32>, <8 x i32>) #2
+
+define <16 x i8> @i16_signed(<8 x i16> %a, <8 x i16> %b) {
+; CHECK-LABEL: i16_signed:
+; CHECK:       # %bb.0: # %bb2
+; CHECK-NEXT:    vpksh %v24, %v24, %v26
+; CHECK-NEXT:    br %r14
+bb2:
+  %0 = shufflevector <8 x i16> %a, <8 x i16> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+  %1 = tail call <16 x i16> @llvm.smax.v16i16(<16 x i16> %0, <16 x i16> splat (i16 -128))
+  %2 = tail call <16 x i16> @llvm.smin.v16i16(<16 x i16> %1, <16 x i16> splat (i16 127))
+  %3 = trunc nsw <16 x i16> %2 to <16 x i8>
+  ret <16 x i8> %3
+  ret <16 x i8> %3
+}
+
+define <8 x i16> @i32_signed(<4 x i32> %a, <4 x i32> %b) {
+; CHECK-LABEL: i32_signed:
+; CHECK:       # %bb.0: # %bb2
+; CHECK-NEXT:    vpksf %v24, %v24, %v26
+; CHECK-NEXT:    br %r14
+bb2:
+  %0 = shufflevector <4 x i32> %a, <4 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+  %1 = tail call <8 x i32> @llvm.smax.v8i32(<8 x i32> %0, <8 x i32> splat (i32 -32768))
+  %2 = tail call <8 x i32> @llvm.smin.v8i32(<8 x i32> %1, <8 x i32> splat (i32 32767))
+  %3 = trunc nsw <8 x i32> %2 to <8 x i16>
+  ret <8 x i16> %3
+}
+
+define <4 x i32> @i64_signed(<2 x i64> %a, <2 x i64> %b) {
+; CHECK-LABEL: i64_signed:
+; CHECK:       # %bb.0: # %bb2
+; CHECK-NEXT:    vpksg %v24, %v24, %v26
+; CHECK-NEXT:    br %r14
+bb2:
+  %0 = shufflevector <2 x i64> %a, <2 x i64> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+  %1 = tail call <4 x i64> @llvm.smax.v4i64(<4 x i64> %0, <4 x i64> splat (i64 -2147483648))
+  %2 = tail call <4 x i64> @llvm.smin.v4i64(<4 x i64> %1, <4 x i64> splat (i64 2147483647))
+  %3 = trunc nsw <4 x i64> %2 to <4 x i32>
+  ret <4 x i32> %3
+}
+
+define <4 x i32> @i64_signed_flipped(<2 x i64> %a, <2 x i64> %b) {
+; CHECK-LABEL: i64_signed_flipped:
+; CHECK:       # %bb.0: # %bb2
+; CHECK-NEXT:    vpksg %v24, %v24, %v26
+; CHECK-NEXT:    br %r14
+bb2:
+  %0 = shufflevector <2 x i64> %a, <2 x i64> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+  %1 = tail call <4 x i64> @llvm.smin.v4i64(<4 x i64> splat (i64 2147483647), <4 x i64> %0)
+  %2 = tail call <4 x i64> @llvm.smax.v4i64(<4 x i64> splat (i64 -2147483648), <4 x i64> %1)
+  %3 = trunc nsw <4 x i64> %2 to <4 x i32>
+  ret <4 x i32> %3
+}
+
+define <16 x i8> @i16_unsigned(<8 x i16> %a, <8 x i16> %b) {
+; CHECK-LABEL: i16_unsigned:
+; CHECK:       # %bb.0: # %bb2
+; CHECK-NEXT:    vpklsh %v24, %v24, %v26
+; CHECK-NEXT:    br %r14
+bb2:
+  %0 = shufflevector <8 x i16> %a, <8 x i16> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+  %1 = tail call <16 x i16> @llvm.umin.v16i16(<16 x i16> %0, <16 x i16> splat (i16 255))
+  %2 = trunc nuw <16 x i16> %1 to <16 x i8>
+  ret <16 x i8> %2
+}
+
+define <8 x i16> @i32_unsigned(<4 x i32> %a, <4 x i32> %b) {
+; CHECK-LABEL: i32_unsigned:
+; CHECK:       # %bb.0: # %bb2
+; CHECK-NEXT:    vpklsf %v24, %v24, %v26
+; CHECK-NEXT:    br %r14
+bb2:
+  %0 = shufflevector <4 x i32> %a, <4 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+  %1 = tail call <8 x i32> @llvm.umin.v8i32(<8 x i32> %0, <8 x i32> splat (i32 65535))
+  %2 = trunc nsw <8 x i32> %1 to <8 x i16>
+  ret <8 x i16> %2
+}
+
+define <4 x i32> @i64_unsigned(<2 x i64> %a, <2 x i64> %b) {
+; CHECK-LABEL: i64_unsigned:
+; CHECK:       # %bb.0: # %bb2
+; CHECK-NEXT:    vpklsg %v24, %v24, %v26
+; CHECK-NEXT:    br %r14
+bb2:
+  %0 = shufflevector <2 x i64> %a, <2 x i64> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+  %1 = tail call <4 x i64> @llvm.umin.v4i64(<4 x i64> %0, <4 x i64> splat (i64 4294967295))
+  %2 = trunc nuw <4 x i64> %1 to <4 x i32>
+  ret <4 x i32> %2
+}

uweigand

LGTM assuming CI passes. Thanks for working on this!

uweigand · 2025-08-26T14:59:30Z

The failure looks unrelated (LLDB attach fails due to setup issue on the build bot server), so I think this is good to go!

folkertdev · 2025-08-26T15:10:16Z

Thank you for guiding me through this! LLVM can be kind of inscrutable, so that help is really useful.

Am I supposed to do something to merge this?

uweigand · 2025-08-26T15:15:02Z

I can do the merge. Thanks again!

folkertdev changed the title ~~S390x saturated truncation~~ s390x: pattern match saturated truncation Aug 26, 2025

uweigand reviewed Aug 26, 2025

View reviewed changes

folkertdev force-pushed the s390x-saturated-truncation branch from 779ff44 to f254507 Compare August 26, 2025 09:42

folkertdev force-pushed the s390x-saturated-truncation branch from f254507 to 5fa5714 Compare August 26, 2025 11:14

uweigand reviewed Aug 26, 2025

View reviewed changes

llvm/lib/Target/SystemZ/SystemZOperators.td Outdated Show resolved Hide resolved

llvm/lib/Target/SystemZ/SystemZInstrVector.td Outdated Show resolved Hide resolved

folkertdev force-pushed the s390x-saturated-truncation branch 2 times, most recently from a4b8e8c to 8addc30 Compare August 26, 2025 13:06

uweigand reviewed Aug 26, 2025

View reviewed changes

llvm/lib/Target/SystemZ/SystemZInstrVector.td Outdated Show resolved Hide resolved

s390x: legalize smin/smax/umin/umax for vectors

832f1f9

folkertdev force-pushed the s390x-saturated-truncation branch from 8addc30 to 6c164f4 Compare August 26, 2025 13:50

uweigand reviewed Aug 26, 2025

View reviewed changes

llvm/test/CodeGen/SystemZ/inline-thresh-adjust.ll Outdated Show resolved Hide resolved

folkertdev added 3 commits August 26, 2025 16:18

s390x: legalize smin/smax/umin/umax for v128q

76bf0f3

s390x: map saturating truncation to a packs/packu

656bd57

s390x: remove IntegerMinMaxVectorOps multiclass

47175b2

The more general smin/umin/smax/umax are now legal, so these patterns are no longer needed

folkertdev force-pushed the s390x-saturated-truncation branch from 01d2371 to 47175b2 Compare August 26, 2025 14:18

folkertdev marked this pull request as ready for review August 26, 2025 14:19

llvmbot added the backend:SystemZ label Aug 26, 2025

uweigand approved these changes Aug 26, 2025

View reviewed changes

Merge branch 'main' into s390x-saturated-truncation

ee13a92

uweigand merged commit 5586572 into llvm:main Aug 26, 2025
6 of 9 checks passed

folkertdev mentioned this pull request Aug 26, 2025

webassembly: recognize saturating truncation #155470

Open

aemerson mentioned this pull request Aug 30, 2025

[NFC] [clangd] [Modules] remove dot in log #156207

Closed

s390x: pattern match saturated truncation #155377

s390x: pattern match saturated truncation #155377

Conversation

folkertdev commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

folkertdev commented Aug 26, 2025

Uh oh!

folkertdev commented Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

uweigand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

folkertdev commented Aug 26, 2025

Uh oh!

uweigand left a comment

Choose a reason for hiding this comment

Uh oh!

folkertdev commented Aug 26, 2025

Uh oh!

uweigand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Aug 26, 2025

Uh oh!

uweigand left a comment

Choose a reason for hiding this comment

Uh oh!

uweigand commented Aug 26, 2025

Uh oh!

folkertdev commented Aug 26, 2025

Uh oh!

uweigand commented Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

folkertdev commented Aug 26, 2025 •

edited

Loading

github-actions bot commented Aug 26, 2025 •

edited

Loading