[X86] Recognise VPMADD52L pattern with AVX512IFMA/AVXIFMA (#153787) #156714

Arghnews · 2025-09-03T16:44:07Z

Match (X * Y) + Z in combineAdd. If target supports and we don't overflow (ie. we know the top 12 bits are unset), rewrite using VPMADD52L

Have deliberately not put tests into existing combine-vpmadd52.ll as the flags make it a pain

Have just done the L version for now at least, wanted to get feedback before continuing

@RKSimon

Match (X * Y) + Z in combineAdd. If target supports and we don't overflow, rewrite using VPMADD52L

llvmbot · 2025-09-03T16:44:42Z

@llvm/pr-subscribers-backend-x86

Author: Justin Riddell (Arghnews)

Changes

Match (X * Y) + Z in combineAdd. If target supports and we don't overflow (ie. we know the top 12 bits are unset), rewrite using VPMADD52L

Have deliberately not put tests into existing combine-vpmadd52.ll as the flags make it a pain

Have just done the L version for now at least, wanted to get feedback before continuing

@RKSimon

Full diff: https://github.com/llvm/llvm-project/pull/156714.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+50)
(added) llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll (+111)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 47cea933d0836..bd0ab5fe96630 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -57966,6 +57966,51 @@ static SDValue pushAddIntoCmovOfConsts(SDNode *N, const SDLoc &DL,
                      Cmov.getOperand(3));
 }
 
+static SDValue matchIntegerMultiplyAdd(SDNode *N, SelectionDAG &DAG,
+                                       SDValue Op0, SDValue Op1,
+                                       const SDLoc &DL, EVT VT,
+                                       const X86Subtarget &Subtarget) {
+  using namespace SDPatternMatch;
+  if (!VT.isVector() || VT.getScalarType() != MVT::i64 ||
+      !Subtarget.hasAVX512() ||
+      (!Subtarget.hasAVXIFMA() && !Subtarget.hasIFMA()) ||
+      !DAG.getTargetLoweringInfo().isOperationLegalOrCustom(X86ISD::VPMADD52L,
+                                                            VT) ||
+      Op0.getValueType() != VT || Op1.getValueType() != VT)
+    return SDValue();
+
+  SDValue X, Y, Acc;
+  if (!sd_match(N, m_Add(m_Mul(m_Value(X), m_Value(Y)), m_Value(Acc))))
+    return SDValue();
+
+  auto CheckMulOperand = [&DAG, &VT](const SDValue &M, SDValue &Xval,
+                                     SDValue &Yval) -> bool {
+    if (M.getOpcode() != ISD::MUL)
+      return false;
+    const SDValue A = M.getOperand(0);
+    const SDValue B = M.getOperand(1);
+    const APInt Top12Set = APInt::getHighBitsSet(64, 12);
+    if (A.getValueType() != VT || B.getValueType() != VT ||
+        !DAG.MaskedValueIsZero(A, Top12Set) ||
+        !DAG.MaskedValueIsZero(B, Top12Set) ||
+        !DAG.MaskedValueIsZero(M, Top12Set))
+      return false;
+    Xval = A;
+    Yval = B;
+    return true;
+  };
+
+  if (CheckMulOperand(Op0, X, Y)) {
+    Acc = Op1;
+  } else if (CheckMulOperand(Op1, X, Y)) {
+    Acc = Op0;
+  } else {
+    return SDValue();
+  }
+
+  return DAG.getNode(X86ISD::VPMADD52L, DL, VT, Acc, X, Y);
+}
+
 static SDValue combineAdd(SDNode *N, SelectionDAG &DAG,
                           TargetLowering::DAGCombinerInfo &DCI,
                           const X86Subtarget &Subtarget) {
@@ -58069,6 +58114,11 @@ static SDValue combineAdd(SDNode *N, SelectionDAG &DAG,
                        Op0.getOperand(0), Op0.getOperand(2));
   }
 
+  if (SDValue node =
+          matchIntegerMultiplyAdd(N, DAG, Op0, Op1, DL, VT, Subtarget)) {
+    return node;
+  }
+
   return combineAddOrSubToADCOrSBB(N, DL, DAG);
 }
 
diff --git a/llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll b/llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll
new file mode 100644
index 0000000000000..6a37b1b814cdc
--- /dev/null
+++ b/llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll
@@ -0,0 +1,111 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -O1 -mtriple=x86_64-unknown-unknown -mattr=+avx512dq | FileCheck %s --check-prefixes=X64
+
+; 67108863 == (1 << 26) - 1
+
+define dso_local <8 x i64> @test_512_combine_evex(<8 x i64> noundef %0, <8 x i64> noundef %1, <8 x i64> noundef %2) local_unnamed_addr #0 {
+; X64-LABEL: test_512_combine_evex:
+; X64:       # %bb.0:
+; X64-NEXT:    vpbroadcastq {{.*#+}} zmm3 = [67108863,67108863,67108863,67108863,67108863,67108863,67108863,67108863]
+; X64-NEXT:    vpandq %zmm3, %zmm0, %zmm0
+; X64-NEXT:    vpandq %zmm3, %zmm1, %zmm1
+; X64-NEXT:    vpandq %zmm3, %zmm2, %zmm2
+; X64-NOT:     vpmul
+; X64-NOT:     vpadd
+; X64-NEXT:    vpmadd52luq %zmm1, %zmm2, %zmm0
+; X64-NEXT:    retq
+  %4 = and <8 x i64> %0, splat (i64 67108863)
+  %5 = and <8 x i64> %1, splat (i64 67108863)
+  %6 = and <8 x i64> %2, splat (i64 67108863)
+  %7 = mul nuw nsw <8 x i64> %5, %4
+  %8 = add nuw nsw <8 x i64> %7, %6
+  ret <8 x i64> %8
+}
+
+define dso_local <8 x i64> @fff(<8 x i64> noundef %0, <8 x i64> noundef %1, <8 x i64> noundef %2) local_unnamed_addr #0 {
+  %4 = and <8 x i64> %0, splat (i64 67108863)
+  %5 = and <8 x i64> %1, splat (i64 67108863)
+  %6 = and <8 x i64> %2, splat (i64 67108863)
+  %7 = mul nuw nsw <8 x i64> %5, %4
+  %8 = mul nuw nsw <8 x i64> %7, %6
+  %9 = add nuw nsw <8 x i64> %8, %7
+  ret <8 x i64> %9
+}
+
+define dso_local noundef <8 x i64> @test_512_no_combine_evex(<8 x i64> noundef %0, <8 x i64> noundef %1, <8 x i64> noundef %2) local_unnamed_addr #0 {
+; X64-LABEL: test_512_no_combine_evex:
+; X64:       # %bb.0:
+; X64-NOT:     vpmadd52
+; X64-NEXT:    vpmullq %zmm0, %zmm1, %zmm0
+; X64-NEXT:    vpaddq %zmm2, %zmm0, %zmm0
+; X64-NEXT:    retq
+  %4 = mul <8 x i64> %1, %0
+  %5 = add <8 x i64> %4, %2
+  ret <8 x i64> %5
+}
+
+define dso_local <4 x i64> @test_256_combine_evex(<4 x i64> noundef %0, <4 x i64> noundef %1, <4 x i64> noundef %2) local_unnamed_addr #1 {
+; X64-LABEL: test_256_combine_evex:
+; X64:       # %bb.0:
+; X64-NEXT:    vpbroadcastq {{.*#+}} ymm3 = [67108863,67108863,67108863,67108863]
+; X64-NEXT:    vpand %ymm3, %ymm0, %ymm0
+; X64-NEXT:    vpand %ymm3, %ymm1, %ymm1
+; X64-NEXT:    vpand %ymm3, %ymm2, %ymm2
+; X64-NOT:     vpmul
+; X64-NOT:     vpadd
+; X64-NEXT:    vpmadd52luq %ymm1, %ymm2, %ymm0
+; X64-NEXT:    retq
+  %4 = and <4 x i64> %0, <i64 67108863, i64 67108863, i64 67108863, i64 67108863>
+  %5 = and <4 x i64> %1, <i64 67108863, i64 67108863, i64 67108863, i64 67108863>
+  %6 = and <4 x i64> %2, <i64 67108863, i64 67108863, i64 67108863, i64 67108863>
+  %7 = mul nuw nsw <4 x i64> %5, %4
+  %8 = add nuw nsw <4 x i64> %7, %6
+  ret <4 x i64> %8
+}
+
+define dso_local noundef <4 x i64> @test_256_no_combine_evex(<4 x i64> noundef %0, <4 x i64> noundef %1, <4 x i64> noundef %2) local_unnamed_addr #1 {
+; X64-LABEL: test_256_no_combine_evex:
+; X64:       # %bb.0:
+; X64-NOT:     vpmadd52
+; X64-NEXT:    vpmullq %ymm0, %ymm1, %ymm0
+; X64-NEXT:    vpaddq %ymm2, %ymm0, %ymm0
+; X64-NEXT:    retq
+  %4 = mul <4 x i64> %1, %0
+  %5 = add <4 x i64> %4, %2
+  ret <4 x i64> %5
+}
+
+define dso_local <4 x i64> @test_256_combine_vex(<4 x i64> noundef %0, <4 x i64> noundef %1, <4 x i64> noundef %2) local_unnamed_addr #2 {
+; X64-LABEL: test_256_combine_vex:
+; X64:       # %bb.0:
+; X64-NEXT:    vpbroadcastq {{.*#+}} ymm3 = [67108863,67108863,67108863,67108863]
+; X64-NEXT:    vpand %ymm3, %ymm0, %ymm0
+; X64-NEXT:    vpand %ymm3, %ymm1, %ymm1
+; X64-NEXT:    vpand %ymm3, %ymm2, %ymm2
+; X64-NOT:     vpmul
+; X64-NOT:     vpadd
+; X64-NEXT:    {vex} vpmadd52luq %ymm1, %ymm2, %ymm0
+; X64-NEXT:    retq
+  %4 = and <4 x i64> %0, <i64 67108863, i64 67108863, i64 67108863, i64 67108863>
+  %5 = and <4 x i64> %1, <i64 67108863, i64 67108863, i64 67108863, i64 67108863>
+  %6 = and <4 x i64> %2, <i64 67108863, i64 67108863, i64 67108863, i64 67108863>
+  %7 = mul nuw nsw <4 x i64> %5, %4
+  %8 = add nuw nsw <4 x i64> %7, %6
+  ret <4 x i64> %8
+}
+
+define dso_local noundef <4 x i64> @test_256_no_combine_vex(<4 x i64> noundef %0, <4 x i64> noundef %1, <4 x i64> noundef %2) local_unnamed_addr #2 {
+; X64-LABEL: test_256_no_combine_vex:
+; X64:       # %bb.0:
+; X64-NOT:     vpmadd52
+; X64-NEXT:    vpmullq %ymm0, %ymm1, %ymm0
+; X64-NEXT:    vpaddq %ymm2, %ymm0, %ymm0
+; X64-NEXT:    retq
+  %4 = mul <4 x i64> %1, %0
+  %5 = add <4 x i64> %4, %2
+  ret <4 x i64> %5
+}
+
+attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "min-legal-vector-width"="512" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+avx,+avx2,+avx512dq,+avx512f,+avx512ifma,+cmov,+crc32,+cx8,+evex512,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" }
+attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "min-legal-vector-width"="256" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+avx,+avx2,+avx512dq,+avx512f,+avx512ifma,+avx512vl,+cmov,+crc32,+cx8,+evex512,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" }
+attributes #2 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "min-legal-vector-width"="256" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+avx,+avx2,+avx512dq,+avx512f,+avx512vl,+avxifma,+cmov,+crc32,+cx8,+evex512,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" }

llvm/lib/Target/X86/X86ISelLowering.cpp

github-actions · 2025-09-04T18:48:52Z

✅ With the latest revision this PR passed the undef deprecator.

Arghnews · 2025-09-05T04:59:17Z

Have addressed and implemented feedback for the VPMADD52L match, added more tests to cover more cases.

Please re-review

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll

llvm/lib/Target/X86/X86ISelLowering.cpp

XChy

Please split the patch into two commits. The first commit is to add the test cases. The second adds the code change and testcase diff.

llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll

Changed all constants to splats Removed const_2pow51_times_2 as folds early Remove unnecessary acc masks Change numbered vars to more readable names Remove attributes and use RUN lines Remove dso_local/noundef/local_unnamed_addr

Address feedback: Removed ValueTracking.h (sorry think clion autoadded and I missed it) Removed leftover comment // if (0 Added !Subtarget.hasAVXIFMA guard to prevent missed application of this change on AVXIFMA 128/256bit Add assert message Change X86ISD::VPMADD52L call to reflect different X86 op order Add test CHECK lines from script

Arghnews · 2025-09-05T14:21:42Z

Have addressed feedback and implemented changes, have split it over 2 commits to try and make the diff easier too etc. (hope that's what you meant @XChy )

Appreciate the feedback from both. Apologies, I did try and follow conventions on things like numbered variable names in tests or attributes, I just searched for others existing in the repo which they do, and presumed it was therefore acceptable (but I guess it is a large repo and has evolved over time presumably etc.). The test file is much neater now.

Do you guys have thoughts are there general preferences on some of these tests, that test for not doing the transform. Ie. using X64-NOT: vpmadd52luq versus matching on the specific output? Is there a preference/convention here?

Setting back to review

llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll

XChy

LGTM, cheers. Please wait for @RKSimon to confirm.
It would be better if the unoptimized testcases are updated in the first commit. For example, f4deffa. The diff makes review easier. Just a reminder for the follow-up patches.

[X86] Recognise VPMADD52L pattern with AVX512IFMA/AVXIFMA (llvm#153787)

d3581fa

Match (X * Y) + Z in combineAdd. If target supports and we don't overflow, rewrite using VPMADD52L

llvmbot added the backend:X86 label Sep 3, 2025

RKSimon requested changes Sep 3, 2025

View reviewed changes

Apply review feedback, simplify, add more tests

bfb34a2

Arghnews added 2 commits September 4, 2025 20:03

Refactor test, remove deprecated undef

8042037

Implement SplitOpsAndApply

6338a61

Arghnews requested a review from RKSimon September 5, 2025 04:35

RKSimon requested changes Sep 5, 2025

View reviewed changes

RKSimon requested a review from XChy September 5, 2025 09:39

XChy reviewed Sep 5, 2025

View reviewed changes

llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll Outdated Show resolved Hide resolved

Arghnews added 2 commits September 5, 2025 15:06

Refactor and cleanup tests

1798672

Changed all constants to splats Removed const_2pow51_times_2 as folds early Remove unnecessary acc masks Change numbered vars to more readable names Remove attributes and use RUN lines Remove dso_local/noundef/local_unnamed_addr

Arghnews requested review from XChy and RKSimon September 5, 2025 14:24

RKSimon reviewed Sep 5, 2025

View reviewed changes

llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll Outdated Show resolved Hide resolved

Add nounwind to silence cfi noise

9a21131

Arghnews requested a review from RKSimon September 5, 2025 15:43

XChy approved these changes Sep 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Recognise VPMADD52L pattern with AVX512IFMA/AVXIFMA (#153787) #156714

[X86] Recognise VPMADD52L pattern with AVX512IFMA/AVXIFMA (#153787) #156714

Arghnews commented Sep 3, 2025

Uh oh!

llvmbot commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

Arghnews commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XChy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Arghnews commented Sep 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

XChy left a comment

Uh oh!

Uh oh!

[X86] Recognise VPMADD52L pattern with AVX512IFMA/AVXIFMA (#153787) #156714

Are you sure you want to change the base?

[X86] Recognise VPMADD52L pattern with AVX512IFMA/AVXIFMA (#153787) #156714

Conversation

Arghnews commented Sep 3, 2025

Uh oh!

llvmbot commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Arghnews commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XChy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Arghnews commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

XChy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 4, 2025 •

edited

Loading

Arghnews commented Sep 5, 2025 •

edited

Loading