[NVPTX] Legalize aext-load to zext-load to expose more DAG combines #154251

AlexMaclean · 2025-08-19T01:57:12Z

No description provided.

llvmbot · 2025-08-19T01:57:43Z

@llvm/pr-subscribers-backend-mips

@llvm/pr-subscribers-llvm-selectiondag

Author: Alex MacLean (AlexMaclean)

Changes

Patch is 130.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154251.diff

9 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+1-1)
(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+64-100)
(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.h (-2)
(modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll (+5-5)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll (+180-198)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm70.ll (+180-198)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm90.ll (+180-198)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg.ll (+100-110)
(modified) llvm/test/CodeGen/NVPTX/mulwide.ll (+4-6)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c16ccaf926bc7..e63568bc3fb63 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -15122,7 +15122,7 @@ SDValue DAGCombiner::visitANY_EXTEND(SDNode *N) {
       return foldedExt;
   } else if (ISD::isNON_EXTLoad(N0.getNode()) &&
              ISD::isUNINDEXEDLoad(N0.getNode()) &&
-             TLI.isLoadExtLegal(ISD::EXTLOAD, VT, N0.getValueType())) {
+             TLI.isLoadExtLegalOrCustom(ISD::EXTLOAD, VT, N0.getValueType())) {
     bool DoXform = true;
     SmallVector<SDNode *, 4> SetCCs;
     if (!N0.hasOneUse())
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 74e6c139c610d..6933d21c5d8a9 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -746,57 +746,56 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   // intrinsics.
   setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);
 
-  // Turn FP extload into load/fpextend
-  setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f32, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f32, MVT::v2f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f32, MVT::v2bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2f32, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f32, MVT::v4f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f32, MVT::v4bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4f32, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v8f32, MVT::v8f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v8f64, MVT::v8f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v8f32, MVT::v8bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v8f64, MVT::v8bf16, Expand);
-  // Turn FP truncstore into trunc + store.
-  // FIXME: vector types should also be expanded
-  setTruncStoreAction(MVT::f32, MVT::f16, Expand);
-  setTruncStoreAction(MVT::f64, MVT::f16, Expand);
-  setTruncStoreAction(MVT::f32, MVT::bf16, Expand);
-  setTruncStoreAction(MVT::f64, MVT::bf16, Expand);
-  setTruncStoreAction(MVT::f64, MVT::f32, Expand);
-  setTruncStoreAction(MVT::v2f32, MVT::v2f16, Expand);
-  setTruncStoreAction(MVT::v2f32, MVT::v2bf16, Expand);
+  // FP extload/truncstore is not legal in PTX. We need to expand all these.
+  for (auto FloatVTs :
+       {MVT::fp_valuetypes(), MVT::fp_fixedlen_vector_valuetypes()}) {
+    for (MVT ValVT : FloatVTs) {
+      for (MVT MemVT : FloatVTs) {
+        setLoadExtAction(ISD::EXTLOAD, ValVT, MemVT, Expand);
+        setTruncStoreAction(ValVT, MemVT, Expand);
+      }
+    }
+  }
 
-  // PTX does not support load / store predicate registers
-  setOperationAction(ISD::LOAD, MVT::i1, Custom);
-  setOperationAction(ISD::STORE, MVT::i1, Custom);
+  // To improve CodeGen we'll legalize any-extend loads to zext loads. This is
+  // how they'll be lowered in ISel anyway, and by doing this a little earlier
+  // we allow for more DAG combine opportunities.
+  for (auto IntVTs :
+       {MVT::integer_valuetypes(), MVT::integer_fixedlen_vector_valuetypes()})
+    for (MVT ValVT : IntVTs)
+      for (MVT MemVT : IntVTs)
+        if (isTypeLegal(ValVT))
+          setLoadExtAction(ISD::EXTLOAD, ValVT, MemVT, Custom);
 
+  // PTX does not support load / store predicate registers
+  setOperationAction({ISD::LOAD, ISD::STORE}, MVT::i1, Custom);
   for (MVT VT : MVT::integer_valuetypes()) {
-    setLoadExtAction(ISD::SEXTLOAD, VT, MVT::i1, Promote);
-    setLoadExtAction(ISD::ZEXTLOAD, VT, MVT::i1, Promote);
-    setLoadExtAction(ISD::EXTLOAD, VT, MVT::i1, Promote);
+    setLoadExtAction({ISD::SEXTLOAD, ISD::ZEXTLOAD, ISD::EXTLOAD}, VT, MVT::i1,
+                     Promote);
     setTruncStoreAction(VT, MVT::i1, Expand);
   }
 
+  // Register custom handling for illegal type loads/stores. We'll try to custom
+  // lower almost all illegal types and logic in the lowering will discard cases
+  // we can't handle.
+  setOperationAction({ISD::LOAD, ISD::STORE}, {MVT::i128, MVT::f128}, Custom);
+  for (MVT VT : MVT::fixedlen_vector_valuetypes())
+    if (!isTypeLegal(VT) && VT.getStoreSizeInBits() <= 256)
+      setOperationAction({ISD::STORE, ISD::LOAD}, VT, Custom);
+
+  // Custom legalization for LDU intrinsics.
+  // TODO: The logic to lower these is not very robust and we should rewrite it.
+  //       Perhaps LDU should not be represented as an intrinsic at all.
+  setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);
+  for (MVT VT : MVT::fixedlen_vector_valuetypes())
+    if (IsPTXVectorType(VT))
+      setOperationAction(ISD::INTRINSIC_W_CHAIN, VT, Custom);
+
   setCondCodeAction({ISD::SETNE, ISD::SETEQ, ISD::SETUGE, ISD::SETULE,
                      ISD::SETUGT, ISD::SETULT, ISD::SETGT, ISD::SETLT,
                      ISD::SETGE, ISD::SETLE},
                     MVT::i1, Expand);
 
-  // expand extload of vector of integers.
-  setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, MVT::v2i16,
-                   MVT::v2i8, Expand);
-  setTruncStoreAction(MVT::v2i16, MVT::v2i8, Expand);
-
   // This is legal in NVPTX
   setOperationAction(ISD::ConstantFP, MVT::f64, Legal);
   setOperationAction(ISD::ConstantFP, MVT::f32, Legal);
@@ -811,24 +810,12 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   // DEBUGTRAP can be lowered to PTX brkpt
   setOperationAction(ISD::DEBUGTRAP, MVT::Other, Legal);
 
-  // Register custom handling for vector loads/stores
-  for (MVT VT : MVT::fixedlen_vector_valuetypes())
-    if (IsPTXVectorType(VT))
-      setOperationAction({ISD::LOAD, ISD::STORE, ISD::INTRINSIC_W_CHAIN}, VT,
-                         Custom);
-
-  setOperationAction({ISD::LOAD, ISD::STORE, ISD::INTRINSIC_W_CHAIN},
-                     {MVT::i128, MVT::f128}, Custom);
-
   // Support varargs.
   setOperationAction(ISD::VASTART, MVT::Other, Custom);
   setOperationAction(ISD::VAARG, MVT::Other, Custom);
   setOperationAction(ISD::VACOPY, MVT::Other, Expand);
   setOperationAction(ISD::VAEND, MVT::Other, Expand);
 
-  // Custom handling for i8 intrinsics
-  setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);
-
   setOperationAction({ISD::ABS, ISD::SMIN, ISD::SMAX, ISD::UMIN, ISD::UMAX},
                      {MVT::i16, MVT::i32, MVT::i64}, Legal);
 
@@ -3135,39 +3122,14 @@ static void replaceLoadVector(SDNode *N, SelectionDAG &DAG,
                               SmallVectorImpl<SDValue> &Results,
                               const NVPTXSubtarget &STI);
 
-SDValue NVPTXTargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
-  if (Op.getValueType() == MVT::i1)
-    return LowerLOADi1(Op, DAG);
-
-  EVT VT = Op.getValueType();
-
-  if (NVPTX::isPackedVectorTy(VT)) {
-    // v2f32/v2f16/v2bf16/v2i16/v4i8 are legal, so we can't rely on legalizer to
-    // handle unaligned loads and have to handle it here.
-    LoadSDNode *Load = cast<LoadSDNode>(Op);
-    EVT MemVT = Load->getMemoryVT();
-    if (!allowsMemoryAccessForAlignment(*DAG.getContext(), DAG.getDataLayout(),
-                                        MemVT, *Load->getMemOperand())) {
-      SDValue Ops[2];
-      std::tie(Ops[0], Ops[1]) = expandUnalignedLoad(Load, DAG);
-      return DAG.getMergeValues(Ops, SDLoc(Op));
-    }
-  }
-
-  return SDValue();
-}
-
 // v = ld i1* addr
 //   =>
 // v1 = ld i8* addr (-> i16)
 // v = trunc i16 to i1
-SDValue NVPTXTargetLowering::LowerLOADi1(SDValue Op, SelectionDAG &DAG) const {
-  SDNode *Node = Op.getNode();
-  LoadSDNode *LD = cast<LoadSDNode>(Node);
-  SDLoc dl(Node);
+static SDValue lowerLOADi1(LoadSDNode *LD, SelectionDAG &DAG) {
+  SDLoc dl(LD);
   assert(LD->getExtensionType() == ISD::NON_EXTLOAD);
-  assert(Node->getValueType(0) == MVT::i1 &&
-         "Custom lowering for i1 load only");
+  assert(LD->getValueType(0) == MVT::i1 && "Custom lowering for i1 load only");
   SDValue newLD = DAG.getExtLoad(ISD::ZEXTLOAD, dl, MVT::i16, LD->getChain(),
                                  LD->getBasePtr(), LD->getPointerInfo(),
                                  MVT::i8, LD->getAlign(),
@@ -3176,8 +3138,27 @@ SDValue NVPTXTargetLowering::LowerLOADi1(SDValue Op, SelectionDAG &DAG) const {
   // The legalizer (the caller) is expecting two values from the legalized
   // load, so we build a MergeValues node for it. See ExpandUnalignedLoad()
   // in LegalizeDAG.cpp which also uses MergeValues.
-  SDValue Ops[] = { result, LD->getChain() };
-  return DAG.getMergeValues(Ops, dl);
+  return DAG.getMergeValues({result, LD->getChain()}, dl);
+}
+
+SDValue NVPTXTargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
+  LoadSDNode *LD = cast<LoadSDNode>(Op);
+
+  if (Op.getValueType() == MVT::i1)
+    return lowerLOADi1(LD, DAG);
+
+  // To improve CodeGen we'll legalize any-extend loads to zext loads. This is
+  // how they'll be lowered in ISel anyway, and by doing this a little earlier
+  // we allow for more DAG combine opportunities.
+  if (LD->getExtensionType() == ISD::EXTLOAD) {
+    assert(LD->getValueType(0).isInteger() && LD->getMemoryVT().isInteger() &&
+           "Unexpected fpext-load");
+    return DAG.getExtLoad(ISD::ZEXTLOAD, SDLoc(Op), Op.getValueType(),
+                          LD->getChain(), LD->getBasePtr(), LD->getMemoryVT(),
+                          LD->getMemOperand());
+  }
+
+  llvm_unreachable("Unexpected custom lowering for load");
 }
 
 SDValue NVPTXTargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
@@ -3187,17 +3168,6 @@ SDValue NVPTXTargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
   if (VT == MVT::i1)
     return LowerSTOREi1(Op, DAG);
 
-  // v2f32/v2f16/v2bf16/v2i16/v4i8 are legal, so we can't rely on legalizer to
-  // handle unaligned stores and have to handle it here.
-  if (NVPTX::isPackedVectorTy(VT) &&
-      !allowsMemoryAccessForAlignment(*DAG.getContext(), DAG.getDataLayout(),
-                                      VT, *Store->getMemOperand()))
-    return expandUnalignedStore(Store, DAG);
-
-  // v2f16/v2bf16/v2i16 don't need special handling.
-  if (NVPTX::isPackedVectorTy(VT) && VT.is32BitVector())
-    return SDValue();
-
   // Lower store of any other vector type, including v2f32 as we want to break
   // it apart since this is not a widely-supported type.
   return LowerSTOREVector(Op, DAG);
@@ -4051,14 +4021,8 @@ bool NVPTXTargetLowering::getTgtMemIntrinsic(
   case Intrinsic::nvvm_ldu_global_i:
   case Intrinsic::nvvm_ldu_global_f:
   case Intrinsic::nvvm_ldu_global_p: {
-    auto &DL = I.getDataLayout();
     Info.opc = ISD::INTRINSIC_W_CHAIN;
-    if (Intrinsic == Intrinsic::nvvm_ldu_global_i)
-      Info.memVT = getValueType(DL, I.getType());
-    else if(Intrinsic == Intrinsic::nvvm_ldu_global_p)
-      Info.memVT = getPointerTy(DL);
-    else
-      Info.memVT = getValueType(DL, I.getType());
+    Info.memVT = getValueType(I.getDataLayout(), I.getType());
     Info.ptrVal = I.getArgOperand(0);
     Info.offset = 0;
     Info.flags = MachineMemOperand::MOLoad;
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index 27f099e220976..e7f1a4b4c98c4 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -309,8 +309,6 @@ class NVPTXTargetLowering : public TargetLowering {
   SDValue LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const;
 
   SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
-  SDValue LowerLOADi1(SDValue Op, SelectionDAG &DAG) const;
-
   SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerSTOREi1(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerSTOREVector(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
index b4641d01eb927..4fb294761e347 100644
--- a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
@@ -711,11 +711,11 @@ define <2 x bfloat> @test_copysign(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; CHECK-NEXT:    .reg .b32 %r<6>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
-; CHECK-NEXT:    ld.param.b32 %r1, [test_copysign_param_0];
-; CHECK-NEXT:    ld.param.b32 %r2, [test_copysign_param_1];
-; CHECK-NEXT:    and.b32 %r3, %r2, -2147450880;
-; CHECK-NEXT:    and.b32 %r4, %r1, 2147450879;
-; CHECK-NEXT:    or.b32 %r5, %r4, %r3;
+; CHECK-NEXT:    ld.param.b32 %r1, [test_copysign_param_1];
+; CHECK-NEXT:    and.b32 %r2, %r1, -2147450880;
+; CHECK-NEXT:    ld.param.b32 %r3, [test_copysign_param_0];
+; CHECK-NEXT:    and.b32 %r4, %r3, 2147450879;
+; CHECK-NEXT:    or.b32 %r5, %r4, %r2;
 ; CHECK-NEXT:    st.param.b32 [func_retval0], %r5;
 ; CHECK-NEXT:    ret;
   %r = call <2 x bfloat> @llvm.copysign.f16(<2 x bfloat> %a, <2 x bfloat> %b)
diff --git a/llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll b/llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll
index 63c389c36e87e..5e7d1a1cd8af1 100644
--- a/llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll
+++ b/llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll
@@ -7,7 +7,7 @@ define i8 @monotonic_monotonic_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8
 ; SM60:       {
 ; SM60-NEXT:    .reg .pred %p<3>;
 ; SM60-NEXT:    .reg .b16 %rs<2>;
-; SM60-NEXT:    .reg .b32 %r<21>;
+; SM60-NEXT:    .reg .b32 %r<20>;
 ; SM60-NEXT:    .reg .b64 %rd<3>;
 ; SM60-EMPTY:
 ; SM60-NEXT:  // %bb.0:
@@ -22,23 +22,22 @@ define i8 @monotonic_monotonic_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8
 ; SM60-NEXT:    shl.b32 %r13, %r12, %r1;
 ; SM60-NEXT:    not.b32 %r2, %r13;
 ; SM60-NEXT:    cvt.u32.u16 %r14, %rs1;
-; SM60-NEXT:    and.b32 %r15, %r14, 255;
-; SM60-NEXT:    shl.b32 %r3, %r15, %r1;
+; SM60-NEXT:    shl.b32 %r3, %r14, %r1;
 ; SM60-NEXT:    shl.b32 %r4, %r9, %r1;
-; SM60-NEXT:    ld.global.b32 %r16, [%rd1];
-; SM60-NEXT:    and.b32 %r20, %r16, %r2;
+; SM60-NEXT:    ld.global.b32 %r15, [%rd1];
+; SM60-NEXT:    and.b32 %r19, %r15, %r2;
 ; SM60-NEXT:  $L__BB0_1: // %partword.cmpxchg.loop
 ; SM60-NEXT:    // =>This Inner Loop Header: Depth=1
-; SM60-NEXT:    or.b32 %r17, %r20, %r3;
-; SM60-NEXT:    or.b32 %r18, %r20, %r4;
-; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r18, %r17;
-; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r18;
+; SM60-NEXT:    or.b32 %r16, %r19, %r3;
+; SM60-NEXT:    or.b32 %r17, %r19, %r4;
+; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r17, %r16;
+; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r17;
 ; SM60-NEXT:    @%p1 bra $L__BB0_3;
 ; SM60-NEXT:  // %bb.2: // %partword.cmpxchg.failure
 ; SM60-NEXT:    // in Loop: Header=BB0_1 Depth=1
 ; SM60-NEXT:    and.b32 %r8, %r7, %r2;
-; SM60-NEXT:    setp.ne.b32 %p2, %r20, %r8;
-; SM60-NEXT:    mov.b32 %r20, %r8;
+; SM60-NEXT:    setp.ne.b32 %p2, %r19, %r8;
+; SM60-NEXT:    mov.b32 %r19, %r8;
 ; SM60-NEXT:    @%p2 bra $L__BB0_1;
 ; SM60-NEXT:  $L__BB0_3: // %partword.cmpxchg.end
 ; SM60-NEXT:    st.param.b32 [func_retval0], %r14;
@@ -52,7 +51,7 @@ define i8 @monotonic_acquire_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60:       {
 ; SM60-NEXT:    .reg .pred %p<3>;
 ; SM60-NEXT:    .reg .b16 %rs<2>;
-; SM60-NEXT:    .reg .b32 %r<21>;
+; SM60-NEXT:    .reg .b32 %r<20>;
 ; SM60-NEXT:    .reg .b64 %rd<3>;
 ; SM60-EMPTY:
 ; SM60-NEXT:  // %bb.0:
@@ -67,23 +66,22 @@ define i8 @monotonic_acquire_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60-NEXT:    shl.b32 %r13, %r12, %r1;
 ; SM60-NEXT:    not.b32 %r2, %r13;
 ; SM60-NEXT:    cvt.u32.u16 %r14, %rs1;
-; SM60-NEXT:    and.b32 %r15, %r14, 255;
-; SM60-NEXT:    shl.b32 %r3, %r15, %r1;
+; SM60-NEXT:    shl.b32 %r3, %r14, %r1;
 ; SM60-NEXT:    shl.b32 %r4, %r9, %r1;
-; SM60-NEXT:    ld.global.b32 %r16, [%rd1];
-; SM60-NEXT:    and.b32 %r20, %r16, %r2;
+; SM60-NEXT:    ld.global.b32 %r15, [%rd1];
+; SM60-NEXT:    and.b32 %r19, %r15, %r2;
 ; SM60-NEXT:  $L__BB1_1: // %partword.cmpxchg.loop
 ; SM60-NEXT:    // =>This Inner Loop Header: Depth=1
-; SM60-NEXT:    or.b32 %r17, %r20, %r3;
-; SM60-NEXT:    or.b32 %r18, %r20, %r4;
-; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r18, %r17;
-; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r18;
+; SM60-NEXT:    or.b32 %r16, %r19, %r3;
+; SM60-NEXT:    or.b32 %r17, %r19, %r4;
+; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r17, %r16;
+; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r17;
 ; SM60-NEXT:    @%p1 bra $L__BB1_3;
 ; SM60-NEXT:  // %bb.2: // %partword.cmpxchg.failure
 ; SM60-NEXT:    // in Loop: Header=BB1_1 Depth=1
 ; SM60-NEXT:    and.b32 %r8, %r7, %r2;
-; SM60-NEXT:    setp.ne.b32 %p2, %r20, %r8;
-; SM60-NEXT:    mov.b32 %r20, %r8;
+; SM60-NEXT:    setp.ne.b32 %p2, %r19, %r8;
+; SM60-NEXT:    mov.b32 %r19, %r8;
 ; SM60-NEXT:    @%p2 bra $L__BB1_1;
 ; SM60-NEXT:  $L__BB1_3: // %partword.cmpxchg.end
 ; SM60-NEXT:    membar.cta;
@@ -98,7 +96,7 @@ define i8 @monotonic_seq_cst_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60:       {
 ; SM60-NEXT:    .reg .pred %p<3>;
 ; SM60-NEXT:    .reg .b16 %rs<2>;
-; SM60-NEXT:    .reg .b32 %r<21>;
+; SM60-NEXT:    .reg .b32 %r<20>;
 ; SM60-NEXT:    .reg .b64 %rd<3>;
 ; SM60-EMPTY:
 ; SM60-NEXT:  // %bb.0:
@@ -114,23 +112,22 @@ define i8 @monotonic_seq_cst_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60-NEXT:    shl.b32 %r13, %r12, %r1;
 ; SM60-NEXT:    not.b32 %r2, %r13;
 ; SM60-NEXT:    cvt.u32.u16 %r14, %rs1;
-; SM60-NEXT:    and.b32 %r15, %r14, 255;
-; SM60-NEXT:    shl.b32 %r3, %r15, %r1;
+; SM60-NEXT:    shl.b32 %r3, %r14, %r1;
 ; SM60-NEXT:    shl.b32 %r4, %r9, %r1;
-; SM60-NEXT:    ld.global.b32 %r16, [%rd1];
-; SM60-NEXT:    and.b32 %r20, %r16, %r2;
+; SM60-NEXT:    ld.global.b32 %r15, [%rd1];
+; SM60-NEXT:    and.b32 %r19, %r15, %r2;
 ; SM60-NEXT:  $L__BB2_1: // %partword.cmpxchg.loop
 ; SM60-NEXT:    // =>This Inner Loop Header: Depth=1
-; SM60-NEXT:    or.b32 %r17, %r20, %r3;
-; SM60-NEXT:    or.b32 %r18, %r20, %r4;
-; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r18, %r17;
-; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r18;
+; SM60-NEXT:    or.b32 %r16, %r19, %r3;
+; SM60-NEXT:    or.b32 %r17, %r19, %r4;
+; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r17, %r16;
+; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r17;
 ; SM60-NEXT:    @%p1 bra $L__BB2_3;
 ; SM60-NEXT:  // %bb.2: // %partword.cmpxchg.failure
 ; SM60-NEXT:    // in Loop: Header=BB2_1 Depth=1
 ; SM60-NEXT:    and.b32 %r8, %r7, %r2;
-; SM60-NEXT:    setp.ne.b32 %p2, %r20, %r8;
-; SM60-NEXT:    mov.b32 %r20, %r8;
+; SM60-NEXT:    setp.ne.b32 %p2, %r19, %r8;
+; SM60-NEXT:    mov.b32 %r19, %r8;
 ; SM60-NEXT:    @%p2 bra $L__BB2_1;
 ; SM60-NEXT:  $L__BB2_3: // %partword.cmpxchg.end
 ; SM60-NEXT:    membar.cta;
@@ -145,7 +142,7 @@ define i8 @acquire_monotonic_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60:       {
 ; SM60-NEXT:    .reg .pred %p<3>;
 ; SM60-NEXT:    .reg .b16 %rs<2>;
-; SM60-NEXT:    .reg .b32 %r<21>;
+; SM60-NEXT:    .reg .b32 %r<20>;
 ; SM60-NEXT:    .reg .b64 %rd<3>;
 ; SM60-EMPTY:
 ; SM60-NEXT:  // %bb.0:
@@ -160,23 +157,22 @@ define i8 @acquire_monotonic_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60-NEXT:    shl.b32 %r13, %r12, %r1;
 ; SM60-NEXT:    not.b32 %r2, %r13;
 ; SM60-NEXT:    cvt.u32.u16 %r14, %rs1;
-; SM60-NEXT:    and.b32 %r15, %r14, 255;
-; SM60-NEXT:    shl.b32 %r3, %r15, %r1;
+; SM60-NEXT:    shl.b32 %r3, %r14, %r1;
 ; SM60-NEXT:    shl.b32 %r4, %r9, %r1;
-; SM60-NEXT:    ld.global.b32 %r16, [%rd1];
-; SM60-NEXT:    and.b32 %r20, %r16, %r2;
+; SM60-NEXT:    ld.global.b32 %r15, [%rd1];
+; SM60-NEXT:    and.b32 %r19, %r15, %r2;
 ; SM60-NEXT:  $L__BB3_1: // %partword.cmpxchg.loop
 ; SM60-NEXT:    // =>This Inner Loop Header: Depth=1
-; SM60-NEXT:    or.b32 %r17, %r20, %r3;
-; SM60-NEXT:    or.b32 %r18, %r20, %r4;
-; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r18, %r17;
-; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r18;
+; SM60-NEXT:    or.b32 %r16, %r19, %r3;
+; SM60-NEXT:    or.b32 %r17,...
[truncated]

llvmbot · 2025-08-19T01:57:44Z

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Patch is 130.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154251.diff

9 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+1-1)
(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+64-100)
(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.h (-2)
(modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll (+5-5)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll (+180-198)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm70.ll (+180-198)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm90.ll (+180-198)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg.ll (+100-110)
(modified) llvm/test/CodeGen/NVPTX/mulwide.ll (+4-6)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c16ccaf926bc7..e63568bc3fb63 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -15122,7 +15122,7 @@ SDValue DAGCombiner::visitANY_EXTEND(SDNode *N) {
       return foldedExt;
   } else if (ISD::isNON_EXTLoad(N0.getNode()) &&
              ISD::isUNINDEXEDLoad(N0.getNode()) &&
-             TLI.isLoadExtLegal(ISD::EXTLOAD, VT, N0.getValueType())) {
+             TLI.isLoadExtLegalOrCustom(ISD::EXTLOAD, VT, N0.getValueType())) {
     bool DoXform = true;
     SmallVector<SDNode *, 4> SetCCs;
     if (!N0.hasOneUse())
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 74e6c139c610d..6933d21c5d8a9 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -746,57 +746,56 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   // intrinsics.
   setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);
 
-  // Turn FP extload into load/fpextend
-  setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f32, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f32, MVT::v2f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f32, MVT::v2bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2f32, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f32, MVT::v4f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f32, MVT::v4bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4f32, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v8f32, MVT::v8f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v8f64, MVT::v8f16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v8f32, MVT::v8bf16, Expand);
-  setLoadExtAction(ISD::EXTLOAD, MVT::v8f64, MVT::v8bf16, Expand);
-  // Turn FP truncstore into trunc + store.
-  // FIXME: vector types should also be expanded
-  setTruncStoreAction(MVT::f32, MVT::f16, Expand);
-  setTruncStoreAction(MVT::f64, MVT::f16, Expand);
-  setTruncStoreAction(MVT::f32, MVT::bf16, Expand);
-  setTruncStoreAction(MVT::f64, MVT::bf16, Expand);
-  setTruncStoreAction(MVT::f64, MVT::f32, Expand);
-  setTruncStoreAction(MVT::v2f32, MVT::v2f16, Expand);
-  setTruncStoreAction(MVT::v2f32, MVT::v2bf16, Expand);
+  // FP extload/truncstore is not legal in PTX. We need to expand all these.
+  for (auto FloatVTs :
+       {MVT::fp_valuetypes(), MVT::fp_fixedlen_vector_valuetypes()}) {
+    for (MVT ValVT : FloatVTs) {
+      for (MVT MemVT : FloatVTs) {
+        setLoadExtAction(ISD::EXTLOAD, ValVT, MemVT, Expand);
+        setTruncStoreAction(ValVT, MemVT, Expand);
+      }
+    }
+  }
 
-  // PTX does not support load / store predicate registers
-  setOperationAction(ISD::LOAD, MVT::i1, Custom);
-  setOperationAction(ISD::STORE, MVT::i1, Custom);
+  // To improve CodeGen we'll legalize any-extend loads to zext loads. This is
+  // how they'll be lowered in ISel anyway, and by doing this a little earlier
+  // we allow for more DAG combine opportunities.
+  for (auto IntVTs :
+       {MVT::integer_valuetypes(), MVT::integer_fixedlen_vector_valuetypes()})
+    for (MVT ValVT : IntVTs)
+      for (MVT MemVT : IntVTs)
+        if (isTypeLegal(ValVT))
+          setLoadExtAction(ISD::EXTLOAD, ValVT, MemVT, Custom);
 
+  // PTX does not support load / store predicate registers
+  setOperationAction({ISD::LOAD, ISD::STORE}, MVT::i1, Custom);
   for (MVT VT : MVT::integer_valuetypes()) {
-    setLoadExtAction(ISD::SEXTLOAD, VT, MVT::i1, Promote);
-    setLoadExtAction(ISD::ZEXTLOAD, VT, MVT::i1, Promote);
-    setLoadExtAction(ISD::EXTLOAD, VT, MVT::i1, Promote);
+    setLoadExtAction({ISD::SEXTLOAD, ISD::ZEXTLOAD, ISD::EXTLOAD}, VT, MVT::i1,
+                     Promote);
     setTruncStoreAction(VT, MVT::i1, Expand);
   }
 
+  // Register custom handling for illegal type loads/stores. We'll try to custom
+  // lower almost all illegal types and logic in the lowering will discard cases
+  // we can't handle.
+  setOperationAction({ISD::LOAD, ISD::STORE}, {MVT::i128, MVT::f128}, Custom);
+  for (MVT VT : MVT::fixedlen_vector_valuetypes())
+    if (!isTypeLegal(VT) && VT.getStoreSizeInBits() <= 256)
+      setOperationAction({ISD::STORE, ISD::LOAD}, VT, Custom);
+
+  // Custom legalization for LDU intrinsics.
+  // TODO: The logic to lower these is not very robust and we should rewrite it.
+  //       Perhaps LDU should not be represented as an intrinsic at all.
+  setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);
+  for (MVT VT : MVT::fixedlen_vector_valuetypes())
+    if (IsPTXVectorType(VT))
+      setOperationAction(ISD::INTRINSIC_W_CHAIN, VT, Custom);
+
   setCondCodeAction({ISD::SETNE, ISD::SETEQ, ISD::SETUGE, ISD::SETULE,
                      ISD::SETUGT, ISD::SETULT, ISD::SETGT, ISD::SETLT,
                      ISD::SETGE, ISD::SETLE},
                     MVT::i1, Expand);
 
-  // expand extload of vector of integers.
-  setLoadExtAction({ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}, MVT::v2i16,
-                   MVT::v2i8, Expand);
-  setTruncStoreAction(MVT::v2i16, MVT::v2i8, Expand);
-
   // This is legal in NVPTX
   setOperationAction(ISD::ConstantFP, MVT::f64, Legal);
   setOperationAction(ISD::ConstantFP, MVT::f32, Legal);
@@ -811,24 +810,12 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   // DEBUGTRAP can be lowered to PTX brkpt
   setOperationAction(ISD::DEBUGTRAP, MVT::Other, Legal);
 
-  // Register custom handling for vector loads/stores
-  for (MVT VT : MVT::fixedlen_vector_valuetypes())
-    if (IsPTXVectorType(VT))
-      setOperationAction({ISD::LOAD, ISD::STORE, ISD::INTRINSIC_W_CHAIN}, VT,
-                         Custom);
-
-  setOperationAction({ISD::LOAD, ISD::STORE, ISD::INTRINSIC_W_CHAIN},
-                     {MVT::i128, MVT::f128}, Custom);
-
   // Support varargs.
   setOperationAction(ISD::VASTART, MVT::Other, Custom);
   setOperationAction(ISD::VAARG, MVT::Other, Custom);
   setOperationAction(ISD::VACOPY, MVT::Other, Expand);
   setOperationAction(ISD::VAEND, MVT::Other, Expand);
 
-  // Custom handling for i8 intrinsics
-  setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);
-
   setOperationAction({ISD::ABS, ISD::SMIN, ISD::SMAX, ISD::UMIN, ISD::UMAX},
                      {MVT::i16, MVT::i32, MVT::i64}, Legal);
 
@@ -3135,39 +3122,14 @@ static void replaceLoadVector(SDNode *N, SelectionDAG &DAG,
                               SmallVectorImpl<SDValue> &Results,
                               const NVPTXSubtarget &STI);
 
-SDValue NVPTXTargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
-  if (Op.getValueType() == MVT::i1)
-    return LowerLOADi1(Op, DAG);
-
-  EVT VT = Op.getValueType();
-
-  if (NVPTX::isPackedVectorTy(VT)) {
-    // v2f32/v2f16/v2bf16/v2i16/v4i8 are legal, so we can't rely on legalizer to
-    // handle unaligned loads and have to handle it here.
-    LoadSDNode *Load = cast<LoadSDNode>(Op);
-    EVT MemVT = Load->getMemoryVT();
-    if (!allowsMemoryAccessForAlignment(*DAG.getContext(), DAG.getDataLayout(),
-                                        MemVT, *Load->getMemOperand())) {
-      SDValue Ops[2];
-      std::tie(Ops[0], Ops[1]) = expandUnalignedLoad(Load, DAG);
-      return DAG.getMergeValues(Ops, SDLoc(Op));
-    }
-  }
-
-  return SDValue();
-}
-
 // v = ld i1* addr
 //   =>
 // v1 = ld i8* addr (-> i16)
 // v = trunc i16 to i1
-SDValue NVPTXTargetLowering::LowerLOADi1(SDValue Op, SelectionDAG &DAG) const {
-  SDNode *Node = Op.getNode();
-  LoadSDNode *LD = cast<LoadSDNode>(Node);
-  SDLoc dl(Node);
+static SDValue lowerLOADi1(LoadSDNode *LD, SelectionDAG &DAG) {
+  SDLoc dl(LD);
   assert(LD->getExtensionType() == ISD::NON_EXTLOAD);
-  assert(Node->getValueType(0) == MVT::i1 &&
-         "Custom lowering for i1 load only");
+  assert(LD->getValueType(0) == MVT::i1 && "Custom lowering for i1 load only");
   SDValue newLD = DAG.getExtLoad(ISD::ZEXTLOAD, dl, MVT::i16, LD->getChain(),
                                  LD->getBasePtr(), LD->getPointerInfo(),
                                  MVT::i8, LD->getAlign(),
@@ -3176,8 +3138,27 @@ SDValue NVPTXTargetLowering::LowerLOADi1(SDValue Op, SelectionDAG &DAG) const {
   // The legalizer (the caller) is expecting two values from the legalized
   // load, so we build a MergeValues node for it. See ExpandUnalignedLoad()
   // in LegalizeDAG.cpp which also uses MergeValues.
-  SDValue Ops[] = { result, LD->getChain() };
-  return DAG.getMergeValues(Ops, dl);
+  return DAG.getMergeValues({result, LD->getChain()}, dl);
+}
+
+SDValue NVPTXTargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
+  LoadSDNode *LD = cast<LoadSDNode>(Op);
+
+  if (Op.getValueType() == MVT::i1)
+    return lowerLOADi1(LD, DAG);
+
+  // To improve CodeGen we'll legalize any-extend loads to zext loads. This is
+  // how they'll be lowered in ISel anyway, and by doing this a little earlier
+  // we allow for more DAG combine opportunities.
+  if (LD->getExtensionType() == ISD::EXTLOAD) {
+    assert(LD->getValueType(0).isInteger() && LD->getMemoryVT().isInteger() &&
+           "Unexpected fpext-load");
+    return DAG.getExtLoad(ISD::ZEXTLOAD, SDLoc(Op), Op.getValueType(),
+                          LD->getChain(), LD->getBasePtr(), LD->getMemoryVT(),
+                          LD->getMemOperand());
+  }
+
+  llvm_unreachable("Unexpected custom lowering for load");
 }
 
 SDValue NVPTXTargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
@@ -3187,17 +3168,6 @@ SDValue NVPTXTargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
   if (VT == MVT::i1)
     return LowerSTOREi1(Op, DAG);
 
-  // v2f32/v2f16/v2bf16/v2i16/v4i8 are legal, so we can't rely on legalizer to
-  // handle unaligned stores and have to handle it here.
-  if (NVPTX::isPackedVectorTy(VT) &&
-      !allowsMemoryAccessForAlignment(*DAG.getContext(), DAG.getDataLayout(),
-                                      VT, *Store->getMemOperand()))
-    return expandUnalignedStore(Store, DAG);
-
-  // v2f16/v2bf16/v2i16 don't need special handling.
-  if (NVPTX::isPackedVectorTy(VT) && VT.is32BitVector())
-    return SDValue();
-
   // Lower store of any other vector type, including v2f32 as we want to break
   // it apart since this is not a widely-supported type.
   return LowerSTOREVector(Op, DAG);
@@ -4051,14 +4021,8 @@ bool NVPTXTargetLowering::getTgtMemIntrinsic(
   case Intrinsic::nvvm_ldu_global_i:
   case Intrinsic::nvvm_ldu_global_f:
   case Intrinsic::nvvm_ldu_global_p: {
-    auto &DL = I.getDataLayout();
     Info.opc = ISD::INTRINSIC_W_CHAIN;
-    if (Intrinsic == Intrinsic::nvvm_ldu_global_i)
-      Info.memVT = getValueType(DL, I.getType());
-    else if(Intrinsic == Intrinsic::nvvm_ldu_global_p)
-      Info.memVT = getPointerTy(DL);
-    else
-      Info.memVT = getValueType(DL, I.getType());
+    Info.memVT = getValueType(I.getDataLayout(), I.getType());
     Info.ptrVal = I.getArgOperand(0);
     Info.offset = 0;
     Info.flags = MachineMemOperand::MOLoad;
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index 27f099e220976..e7f1a4b4c98c4 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -309,8 +309,6 @@ class NVPTXTargetLowering : public TargetLowering {
   SDValue LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const;
 
   SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
-  SDValue LowerLOADi1(SDValue Op, SelectionDAG &DAG) const;
-
   SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerSTOREi1(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerSTOREVector(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
index b4641d01eb927..4fb294761e347 100644
--- a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
@@ -711,11 +711,11 @@ define <2 x bfloat> @test_copysign(<2 x bfloat> %a, <2 x bfloat> %b) #0 {
 ; CHECK-NEXT:    .reg .b32 %r<6>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
-; CHECK-NEXT:    ld.param.b32 %r1, [test_copysign_param_0];
-; CHECK-NEXT:    ld.param.b32 %r2, [test_copysign_param_1];
-; CHECK-NEXT:    and.b32 %r3, %r2, -2147450880;
-; CHECK-NEXT:    and.b32 %r4, %r1, 2147450879;
-; CHECK-NEXT:    or.b32 %r5, %r4, %r3;
+; CHECK-NEXT:    ld.param.b32 %r1, [test_copysign_param_1];
+; CHECK-NEXT:    and.b32 %r2, %r1, -2147450880;
+; CHECK-NEXT:    ld.param.b32 %r3, [test_copysign_param_0];
+; CHECK-NEXT:    and.b32 %r4, %r3, 2147450879;
+; CHECK-NEXT:    or.b32 %r5, %r4, %r2;
 ; CHECK-NEXT:    st.param.b32 [func_retval0], %r5;
 ; CHECK-NEXT:    ret;
   %r = call <2 x bfloat> @llvm.copysign.f16(<2 x bfloat> %a, <2 x bfloat> %b)
diff --git a/llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll b/llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll
index 63c389c36e87e..5e7d1a1cd8af1 100644
--- a/llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll
+++ b/llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll
@@ -7,7 +7,7 @@ define i8 @monotonic_monotonic_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8
 ; SM60:       {
 ; SM60-NEXT:    .reg .pred %p<3>;
 ; SM60-NEXT:    .reg .b16 %rs<2>;
-; SM60-NEXT:    .reg .b32 %r<21>;
+; SM60-NEXT:    .reg .b32 %r<20>;
 ; SM60-NEXT:    .reg .b64 %rd<3>;
 ; SM60-EMPTY:
 ; SM60-NEXT:  // %bb.0:
@@ -22,23 +22,22 @@ define i8 @monotonic_monotonic_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8
 ; SM60-NEXT:    shl.b32 %r13, %r12, %r1;
 ; SM60-NEXT:    not.b32 %r2, %r13;
 ; SM60-NEXT:    cvt.u32.u16 %r14, %rs1;
-; SM60-NEXT:    and.b32 %r15, %r14, 255;
-; SM60-NEXT:    shl.b32 %r3, %r15, %r1;
+; SM60-NEXT:    shl.b32 %r3, %r14, %r1;
 ; SM60-NEXT:    shl.b32 %r4, %r9, %r1;
-; SM60-NEXT:    ld.global.b32 %r16, [%rd1];
-; SM60-NEXT:    and.b32 %r20, %r16, %r2;
+; SM60-NEXT:    ld.global.b32 %r15, [%rd1];
+; SM60-NEXT:    and.b32 %r19, %r15, %r2;
 ; SM60-NEXT:  $L__BB0_1: // %partword.cmpxchg.loop
 ; SM60-NEXT:    // =>This Inner Loop Header: Depth=1
-; SM60-NEXT:    or.b32 %r17, %r20, %r3;
-; SM60-NEXT:    or.b32 %r18, %r20, %r4;
-; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r18, %r17;
-; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r18;
+; SM60-NEXT:    or.b32 %r16, %r19, %r3;
+; SM60-NEXT:    or.b32 %r17, %r19, %r4;
+; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r17, %r16;
+; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r17;
 ; SM60-NEXT:    @%p1 bra $L__BB0_3;
 ; SM60-NEXT:  // %bb.2: // %partword.cmpxchg.failure
 ; SM60-NEXT:    // in Loop: Header=BB0_1 Depth=1
 ; SM60-NEXT:    and.b32 %r8, %r7, %r2;
-; SM60-NEXT:    setp.ne.b32 %p2, %r20, %r8;
-; SM60-NEXT:    mov.b32 %r20, %r8;
+; SM60-NEXT:    setp.ne.b32 %p2, %r19, %r8;
+; SM60-NEXT:    mov.b32 %r19, %r8;
 ; SM60-NEXT:    @%p2 bra $L__BB0_1;
 ; SM60-NEXT:  $L__BB0_3: // %partword.cmpxchg.end
 ; SM60-NEXT:    st.param.b32 [func_retval0], %r14;
@@ -52,7 +51,7 @@ define i8 @monotonic_acquire_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60:       {
 ; SM60-NEXT:    .reg .pred %p<3>;
 ; SM60-NEXT:    .reg .b16 %rs<2>;
-; SM60-NEXT:    .reg .b32 %r<21>;
+; SM60-NEXT:    .reg .b32 %r<20>;
 ; SM60-NEXT:    .reg .b64 %rd<3>;
 ; SM60-EMPTY:
 ; SM60-NEXT:  // %bb.0:
@@ -67,23 +66,22 @@ define i8 @monotonic_acquire_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60-NEXT:    shl.b32 %r13, %r12, %r1;
 ; SM60-NEXT:    not.b32 %r2, %r13;
 ; SM60-NEXT:    cvt.u32.u16 %r14, %rs1;
-; SM60-NEXT:    and.b32 %r15, %r14, 255;
-; SM60-NEXT:    shl.b32 %r3, %r15, %r1;
+; SM60-NEXT:    shl.b32 %r3, %r14, %r1;
 ; SM60-NEXT:    shl.b32 %r4, %r9, %r1;
-; SM60-NEXT:    ld.global.b32 %r16, [%rd1];
-; SM60-NEXT:    and.b32 %r20, %r16, %r2;
+; SM60-NEXT:    ld.global.b32 %r15, [%rd1];
+; SM60-NEXT:    and.b32 %r19, %r15, %r2;
 ; SM60-NEXT:  $L__BB1_1: // %partword.cmpxchg.loop
 ; SM60-NEXT:    // =>This Inner Loop Header: Depth=1
-; SM60-NEXT:    or.b32 %r17, %r20, %r3;
-; SM60-NEXT:    or.b32 %r18, %r20, %r4;
-; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r18, %r17;
-; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r18;
+; SM60-NEXT:    or.b32 %r16, %r19, %r3;
+; SM60-NEXT:    or.b32 %r17, %r19, %r4;
+; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r17, %r16;
+; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r17;
 ; SM60-NEXT:    @%p1 bra $L__BB1_3;
 ; SM60-NEXT:  // %bb.2: // %partword.cmpxchg.failure
 ; SM60-NEXT:    // in Loop: Header=BB1_1 Depth=1
 ; SM60-NEXT:    and.b32 %r8, %r7, %r2;
-; SM60-NEXT:    setp.ne.b32 %p2, %r20, %r8;
-; SM60-NEXT:    mov.b32 %r20, %r8;
+; SM60-NEXT:    setp.ne.b32 %p2, %r19, %r8;
+; SM60-NEXT:    mov.b32 %r19, %r8;
 ; SM60-NEXT:    @%p2 bra $L__BB1_1;
 ; SM60-NEXT:  $L__BB1_3: // %partword.cmpxchg.end
 ; SM60-NEXT:    membar.cta;
@@ -98,7 +96,7 @@ define i8 @monotonic_seq_cst_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60:       {
 ; SM60-NEXT:    .reg .pred %p<3>;
 ; SM60-NEXT:    .reg .b16 %rs<2>;
-; SM60-NEXT:    .reg .b32 %r<21>;
+; SM60-NEXT:    .reg .b32 %r<20>;
 ; SM60-NEXT:    .reg .b64 %rd<3>;
 ; SM60-EMPTY:
 ; SM60-NEXT:  // %bb.0:
@@ -114,23 +112,22 @@ define i8 @monotonic_seq_cst_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60-NEXT:    shl.b32 %r13, %r12, %r1;
 ; SM60-NEXT:    not.b32 %r2, %r13;
 ; SM60-NEXT:    cvt.u32.u16 %r14, %rs1;
-; SM60-NEXT:    and.b32 %r15, %r14, 255;
-; SM60-NEXT:    shl.b32 %r3, %r15, %r1;
+; SM60-NEXT:    shl.b32 %r3, %r14, %r1;
 ; SM60-NEXT:    shl.b32 %r4, %r9, %r1;
-; SM60-NEXT:    ld.global.b32 %r16, [%rd1];
-; SM60-NEXT:    and.b32 %r20, %r16, %r2;
+; SM60-NEXT:    ld.global.b32 %r15, [%rd1];
+; SM60-NEXT:    and.b32 %r19, %r15, %r2;
 ; SM60-NEXT:  $L__BB2_1: // %partword.cmpxchg.loop
 ; SM60-NEXT:    // =>This Inner Loop Header: Depth=1
-; SM60-NEXT:    or.b32 %r17, %r20, %r3;
-; SM60-NEXT:    or.b32 %r18, %r20, %r4;
-; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r18, %r17;
-; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r18;
+; SM60-NEXT:    or.b32 %r16, %r19, %r3;
+; SM60-NEXT:    or.b32 %r17, %r19, %r4;
+; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r17, %r16;
+; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r17;
 ; SM60-NEXT:    @%p1 bra $L__BB2_3;
 ; SM60-NEXT:  // %bb.2: // %partword.cmpxchg.failure
 ; SM60-NEXT:    // in Loop: Header=BB2_1 Depth=1
 ; SM60-NEXT:    and.b32 %r8, %r7, %r2;
-; SM60-NEXT:    setp.ne.b32 %p2, %r20, %r8;
-; SM60-NEXT:    mov.b32 %r20, %r8;
+; SM60-NEXT:    setp.ne.b32 %p2, %r19, %r8;
+; SM60-NEXT:    mov.b32 %r19, %r8;
 ; SM60-NEXT:    @%p2 bra $L__BB2_1;
 ; SM60-NEXT:  $L__BB2_3: // %partword.cmpxchg.end
 ; SM60-NEXT:    membar.cta;
@@ -145,7 +142,7 @@ define i8 @acquire_monotonic_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60:       {
 ; SM60-NEXT:    .reg .pred %p<3>;
 ; SM60-NEXT:    .reg .b16 %rs<2>;
-; SM60-NEXT:    .reg .b32 %r<21>;
+; SM60-NEXT:    .reg .b32 %r<20>;
 ; SM60-NEXT:    .reg .b64 %rd<3>;
 ; SM60-EMPTY:
 ; SM60-NEXT:  // %bb.0:
@@ -160,23 +157,22 @@ define i8 @acquire_monotonic_i8_global_cta(ptr addrspace(1) %addr, i8 %cmp, i8 %
 ; SM60-NEXT:    shl.b32 %r13, %r12, %r1;
 ; SM60-NEXT:    not.b32 %r2, %r13;
 ; SM60-NEXT:    cvt.u32.u16 %r14, %rs1;
-; SM60-NEXT:    and.b32 %r15, %r14, 255;
-; SM60-NEXT:    shl.b32 %r3, %r15, %r1;
+; SM60-NEXT:    shl.b32 %r3, %r14, %r1;
 ; SM60-NEXT:    shl.b32 %r4, %r9, %r1;
-; SM60-NEXT:    ld.global.b32 %r16, [%rd1];
-; SM60-NEXT:    and.b32 %r20, %r16, %r2;
+; SM60-NEXT:    ld.global.b32 %r15, [%rd1];
+; SM60-NEXT:    and.b32 %r19, %r15, %r2;
 ; SM60-NEXT:  $L__BB3_1: // %partword.cmpxchg.loop
 ; SM60-NEXT:    // =>This Inner Loop Header: Depth=1
-; SM60-NEXT:    or.b32 %r17, %r20, %r3;
-; SM60-NEXT:    or.b32 %r18, %r20, %r4;
-; SM60-NEXT:    atom.cta.global.cas.b32 %r7, [%rd1], %r18, %r17;
-; SM60-NEXT:    setp.eq.b32 %p1, %r7, %r18;
+; SM60-NEXT:    or.b32 %r16, %r19, %r3;
+; SM60-NEXT:    or.b32 %r17,...
[truncated]

Artem-B

LGTM.

So, in practice, the only observable effect on the tests is on loading i8 scalars that we store in i16 registers.

AlexMaclean · 2025-08-20T16:38:44Z

@topperc Do these changes to Mips tests look alright to you?

Artem-B · 2025-08-20T18:23:07Z

My hazy memory of MIPS is that signed/unsigned load only matters for byte/short variants lb/lh, so lw/lwu changes should be benign.

However, the changes that mention # implicit-def: $a0_64 may need a closer look from someone familiar with MIPS. MIPS assembly and ABIs are funky, and those extra register moves may have been there for a reason. It's probably OK, too. AFAICT, the code just loads function arguments into a0/a1/a2 (AKA $4/$5/$6) before the function call. If anything the old code looks somewhat questionable, as $1 is typically reserved for the assembler use on MIPS.

topperc · 2025-08-20T18:59:53Z

My hazy memory of MIPS is that signed/unsigned load only matters for byte/short variants lb/lh, so lw/lwu changes should be benign.

However, the changes that mention # implicit-def: $a0_64 may need a closer look from someone familiar with MIPS. MIPS assembly and ABIs are funky, and those extra register moves may have been there for a reason. It's probably OK, too. AFAICT, the code just loads function arguments into a0/a1/a2 (AKA $4/$5/$6) before the function call. If anything the old code looks somewhat questionable, as $1 is typically reserved for the assembler use on MIPS.

I think the Mips test with the extra moves was compiled with -O0 so some DAGCombine optimizations might have been disabled. This change may have caused some optimization on a different path.

llvm-ci · 2025-08-21T22:39:57Z

LLVM Buildbot has detected a new failure on builder clangd-ubuntu-tsan running on clangd-ubuntu-clang while building llvm at step 6 "test-build-clangd-clangd-index-server-clangd-in...".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/134/builds/24898

Here is the relevant piece of the build log for the reference

Step 6 (test-build-clangd-clangd-index-server-clangd-in...) failure: test (failure)
******************** TEST 'Clangd :: target_info.test' FAILED ********************
Exit Code: 66

Command Output (stderr):
--
rm -rf /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir && mkdir -p /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir # RUN: at line 5
+ rm -rf /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir
+ mkdir -p /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir
echo '[{"directory": "/vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir", "command": "/vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir/armv7-clang -x c++ the-file.cpp -v", "file": "the-file.cpp"}]' > /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir/compile_commands.json # RUN: at line 7
+ echo '[{"directory": "/vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir", "command": "/vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir/armv7-clang -x c++ the-file.cpp -v", "file": "the-file.cpp"}]'
sed -e "s|INPUT_DIR|/vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir|g" /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/clang-tools-extra/clangd/test/target_info.test > /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.test.1 # RUN: at line 9
+ sed -e 's|INPUT_DIR|/vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.dir|g' /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/clang-tools-extra/clangd/test/target_info.test
sed -E -e 's|"file://([A-Z]):/|"file:///\1:/|g' /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.test.1 > /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.test # RUN: at line 12
+ sed -E -e 's|"file://([A-Z]):/|"file:///\1:/|g' /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.test.1
clangd -lit-test < /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.test 2>&1 | /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/bin/FileCheck -strict-whitespace /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.test # RUN: at line 14
+ clangd -lit-test
+ /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/bin/FileCheck -strict-whitespace /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test/Output/target_info.test.tmp.test

--

********************

jhuber6 · 2025-08-22T19:42:54Z

I bisected some new libc failures to this PR. After reverting it and rebuilding the tests all pass https://lab.llvm.org/buildbot/#/builders/69/builds/26327/steps/12/logs/stdio. Should we go ahead and revert this for now?

AlexMaclean · 2025-08-22T19:46:49Z

I bisected some new libc failures to this PR. After reverting it and rebuilding the tests all pass https://lab.llvm.org/buildbot/#/builders/69/builds/26327/steps/12/logs/stdio. Should we go ahead and revert this for now?

@jhuber6 would you be able to attach any more information such as LLVM IR or PTX before/after? or reproduction instructions? If this is possible to reproduce I'm happy to take a look right now. If you think investigation will take a long time I'd be happy to revert as well.

jhuber6 · 2025-08-22T20:32:48Z

I bisected some new libc failures to this PR. After reverting it and rebuilding the tests all pass https://lab.llvm.org/buildbot/#/builders/69/builds/26327/steps/12/logs/stdio. Should we go ahead and revert this for now?

@jhuber6 would you be able to attach any more information such as LLVM IR or PTX before/after? or reproduction instructions? If this is possible to reproduce I'm happy to take a look right now. If you think investigation will take a long time I'd be happy to revert as well.

It caused a handful of tests to fail, you should be able to build and run libc tests locally if you have a working CUDA installation + GPU. I spent over forty minutes trying and failing to get a reproducer because the NVPTX toolchain is so weird and I can't spend all day doing this.

The documentation for building is at https://libc.llvm.org/gpu/building.html#cmake-options and it would probably help to run these tests locally when you make changes since it covers a lot of uncommon use-cases.

jhuber6 · 2025-08-22T20:48:29Z

Here https://godbolt.org/z/Eqqjoon7b, it's a ton of LLVM-IR but the difference I see in the assembly is only a small change the one on the left with three local stores works, the other one causes the error.

+ +--506 lines: // Generated by LLVM NVPTX Back-End··································································│+ +--506 lines: // Generated by LLVM NVPTX Back-End·································································
  // %bb.0:                               // %entry                                                                  │  // %bb.0:                               // %entry      
    mov.b64   %SPL, __local_depot9;                                                                                  │    mov.b64   %SPL, __local_depot9;      
    ld.param.b64  %rd158, [_ZN33LlvmLibcIntegerToStringTest_INT323RunEv_param_0];                                    │    ld.param.b64  %rd158, [_ZN33LlvmLibcIntegerToStringTest_INT323RunEv_param_0];      
    add.u64   %rd1, %SPL, 0;                                                                                         │    add.u64   %rd1, %SPL, 0;      
    add.u64   %rd159, %SPL, 24;                                                                                      │    add.u64   %rd159, %SPL, 24;      
    add.s64   %rd2, %rd159, 9;                                                                                       │    add.s64   %rd2, %rd159, 9;      
    st.local.b8   [%rd159+10], 49;                                                                                   │    st.local.b8   [%rd159+12], 0;      
  -------------------------------------------------------------------------------------------------------------------│    st.local.b8   [%rd159+11], 49;      
  -------------------------------------------------------------------------------------------------------------------│    st.local.b8   [%rd159+10], 0;      
    st.local.b8   [%rd159+9], 45;                                                                                    │    st.local.b8   [%rd159+9], 45;      
    st.local.b64  [%rd159+16], 2;                                                                                    │    st.local.b64  [%rd159+16], 2;      
    ld.b64  %rd3, [%rd158+16];                                                                                       │    ld.b64  %rd3, [%rd158+16];      
    mov.b64   %rd978, 0;                                                                                             │    mov.b64   %rd978, 0;      
    mov.b64   %rd161, _$_str;                                                                                        │    mov.b64   %rd161, _$_str;      
  $L__BB9_1:                              // %for.body.i.i.i.i                                                       │  $L__BB9_1:                              // %for.body.i.i.i.i      
+ +--17287 lines: =>This Inner Loop Header: Depth=1··································································│+ +--17287 lines: =>This Inner Loop Header: Depth=1·································································

rupprecht · 2025-08-22T20:58:49Z

We also see some JAX test failures that bisect here, specifically testSort and testSortAgainstNumpy: https://github.com/jax-ml/jax/blob/3d52f5f87095189c3c619fb9daff3259a519aaad/tests/lax_test.py#L2528. I don't have any repro instructions or public buildbots to point to however.

…mbines (#154251)" Causes failures in the LLVM libc test suite https://lab.llvm.org/buildbot/#/builders/69/builds/26327/steps/12/logs/stdio. This reverts commit a3ed96b.

jhuber6 · 2025-08-22T21:15:55Z

Went ahead and reverted it since the bot has been red for over a day.

…lvm#154251)

AlexMaclean · 2025-08-23T02:57:30Z

Looks like this change accidentally removed b6e19b3. I've added this bit back with tests to make sure this doesn't happen again. I'd appreciate both of your review on #155063 when you have a minute!

AlexMaclean requested review from Artem-B, justinfargnoli and kalxr August 19, 2025 01:57

AlexMaclean self-assigned this Aug 19, 2025

llvmbot added backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well labels Aug 19, 2025

llvmbot added the backend:MIPS label Aug 19, 2025

Artem-B approved these changes Aug 19, 2025

View reviewed changes

AlexMaclean requested a review from topperc August 20, 2025 16:38

AlexMaclean force-pushed the dev/amaclean/ld-work branch from 6e5e4d4 to c359e0d Compare August 20, 2025 16:52

AlexMaclean added 2 commits August 21, 2025 21:34

[NVPTX] Legalize aext-load to zext-load to expose more DAG combines

d0634dc

update mips tests

947856e

AlexMaclean force-pushed the dev/amaclean/ld-work branch from 322a43f to 245886f Compare August 21, 2025 21:41

update tests

245886f

AlexMaclean merged commit a3ed96b into llvm:main Aug 21, 2025
9 checks passed

AlexMaclean added a commit to AlexMaclean/llvm-project that referenced this pull request Aug 23, 2025

[NVPTX] Legalize aext-load to zext-load to expose more DAG combines (l…

75179c7

…lvm#154251)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Legalize aext-load to zext-load to expose more DAG combines #154251

[NVPTX] Legalize aext-load to zext-load to expose more DAG combines #154251

Uh oh!

AlexMaclean commented Aug 19, 2025

Uh oh!

llvmbot commented Aug 19, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 19, 2025

Uh oh!

Artem-B left a comment

Uh oh!

AlexMaclean commented Aug 20, 2025

Uh oh!

Artem-B commented Aug 20, 2025

Uh oh!

topperc commented Aug 20, 2025

Uh oh!

Uh oh!

llvm-ci commented Aug 21, 2025

Uh oh!

jhuber6 commented Aug 22, 2025

Uh oh!

AlexMaclean commented Aug 22, 2025

Uh oh!

jhuber6 commented Aug 22, 2025 •

edited

Loading

Uh oh!

jhuber6 commented Aug 22, 2025

Uh oh!

rupprecht commented Aug 22, 2025

Uh oh!

jhuber6 commented Aug 22, 2025

Uh oh!

AlexMaclean commented Aug 23, 2025

Uh oh!

Uh oh!

[NVPTX] Legalize aext-load to zext-load to expose more DAG combines #154251

[NVPTX] Legalize aext-load to zext-load to expose more DAG combines #154251

Uh oh!

Conversation

AlexMaclean commented Aug 19, 2025

Uh oh!

llvmbot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 19, 2025

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

AlexMaclean commented Aug 20, 2025

Uh oh!

Artem-B commented Aug 20, 2025

Uh oh!

topperc commented Aug 20, 2025

Uh oh!

Uh oh!

llvm-ci commented Aug 21, 2025

Uh oh!

jhuber6 commented Aug 22, 2025

Uh oh!

AlexMaclean commented Aug 22, 2025

Uh oh!

jhuber6 commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhuber6 commented Aug 22, 2025

Uh oh!

rupprecht commented Aug 22, 2025

Uh oh!

jhuber6 commented Aug 22, 2025

Uh oh!

AlexMaclean commented Aug 23, 2025

Uh oh!

Uh oh!

llvmbot commented Aug 19, 2025 •

edited

Loading

jhuber6 commented Aug 22, 2025 •

edited

Loading