-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[CodeGen][RISCV] Add support of RISCV nontemporal to vector predication instructions. #153033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on instructions. Make RISC-V vector predicated intrinsics (specifically: load, store, gather, scatter, strided.load and strided.store) aware of the non-temporal related metadata.
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-llvm-selectiondag Author: None (daniel-trujillo-bsc) ChangesThis PR adds support for VP intrinsics to be aware of the nontemporal metadata information. First time contributor here. I hope these changes are simple enough to not be much of a pain to review, and I'm looking forward hear your feedback!. I'm not a GitHub user, so I had to create a throwaway account for this, but you can write to my BSC email (in the commit and in the web: https://www.bsc.es/trujillo-daniel) to verify my identity. Patch is 1.80 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153033.diff 6 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index dc00db9daa3b6..3dab1b1e8712d 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -91,6 +91,7 @@ class TargetLowering;
class TargetMachine;
class TargetSubtargetInfo;
class Value;
+class VPIntrinsic;
template <typename T> class GenericSSAContext;
using SSAContext = GenericSSAContext<Function>;
@@ -1007,6 +1008,11 @@ class SelectionDAG {
llvm_unreachable("Unknown opcode");
}
+ static MachineMemOperand::Flags
+ getNonTemporalMemFlag(const VPIntrinsic &VPIntrin);
+
+ static MachineMemOperand::Flags getNonTemporalMemFlag(const MemSDNode &N);
+
/// Convert Op, which must be of integer type, to the
/// integer type VT, by either any-extending or truncating it.
LLVM_ABI SDValue getAnyExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index bc2dbfb4cbaae..a21a9b518fcde 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -2476,10 +2476,13 @@ void DAGTypeLegalizer::SplitVecRes_Gather(MemSDNode *N, SDValue &Lo,
else
std::tie(IndexLo, IndexHi) = DAG.SplitVector(Ops.Index, dl);
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOLoad |
+ TLI.getTargetMMOFlags(*N) |
+ SelectionDAG::getNonTemporalMemFlag(*N);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- N->getPointerInfo(), MachineMemOperand::MOLoad,
- LocationSize::beforeOrAfterPointer(), Alignment, N->getAAInfo(),
- N->getRanges());
+ N->getPointerInfo(), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ Alignment, N->getAAInfo(), N->getRanges());
if (auto *MGT = dyn_cast<MaskedGatherSDNode>(N)) {
SDValue PassThru = MGT->getPassThru();
@@ -4248,10 +4251,13 @@ SDValue DAGTypeLegalizer::SplitVecOp_Scatter(MemSDNode *N, unsigned OpNo) {
std::tie(IndexLo, IndexHi) = DAG.SplitVector(Ops.Index, DL);
SDValue Lo;
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOStore |
+ TLI.getTargetMMOFlags(*N) |
+ SelectionDAG::getNonTemporalMemFlag(*N);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- N->getPointerInfo(), MachineMemOperand::MOStore,
- LocationSize::beforeOrAfterPointer(), Alignment, N->getAAInfo(),
- N->getRanges());
+ N->getPointerInfo(), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ Alignment, N->getAAInfo(), N->getRanges());
if (auto *MSC = dyn_cast<MaskedScatterSDNode>(N)) {
SDValue OpsLo[] = {Ch, DataLo, MaskLo, Ptr, IndexLo, Ops.Scale};
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 5ef1746333040..4e6d52846ae44 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -56,6 +56,7 @@
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"
+#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Metadata.h"
#include "llvm/IR/Type.h"
#include "llvm/Support/Casting.h"
@@ -14055,6 +14056,19 @@ void SelectionDAG::copyExtraInfo(SDNode *From, SDNode *To) {
SDEI[To] = std::move(NEI);
}
+MachineMemOperand::Flags
+SelectionDAG::getNonTemporalMemFlag(const VPIntrinsic &VPIntrin) {
+ return VPIntrin.hasMetadata(LLVMContext::MD_nontemporal)
+ ? MachineMemOperand::MONonTemporal
+ : MachineMemOperand::MONone;
+}
+
+MachineMemOperand::Flags
+SelectionDAG::getNonTemporalMemFlag(const MemSDNode &N) {
+ return N.isNonTemporal() ? MachineMemOperand::MONonTemporal
+ : MachineMemOperand::MONone;
+}
+
#ifndef NDEBUG
static void checkForCyclesHelper(const SDNode *N,
SmallPtrSetImpl<const SDNode*> &Visited,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 7aa1fadd10dfc..a21992af3ce42 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8438,8 +8438,12 @@ void SelectionDAGBuilder::visitVPLoad(
MemoryLocation ML = MemoryLocation::getAfter(PtrOperand, AAInfo);
bool AddToChain = !BatchAA || !BatchAA->pointsToConstantMemory(ML);
SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOLoad | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(PtrOperand), MachineMemOperand::MOLoad,
+ MachinePointerInfo(PtrOperand), MMOFlags,
LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo, Ranges);
LD = DAG.getLoadVP(VT, DL, InChain, OpValues[0], OpValues[1], OpValues[2],
MMO, false /*IsExpanding */);
@@ -8490,9 +8494,12 @@ void SelectionDAGBuilder::visitVPGather(
Alignment = DAG.getEVTAlign(VT.getScalarType());
unsigned AS =
PtrOperand->getType()->getScalarType()->getPointerAddressSpace();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOLoad | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(AS), MachineMemOperand::MOLoad,
- LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo, Ranges);
+ MachinePointerInfo(AS), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ *Alignment, AAInfo, Ranges);
SDValue Base, Index, Scale;
ISD::MemIndexType IndexType;
bool UniformBase = getUniformBase(PtrOperand, Base, Index, IndexType, Scale,
@@ -8530,8 +8537,12 @@ void SelectionDAGBuilder::visitVPStore(
Alignment = DAG.getEVTAlign(VT);
SDValue Ptr = OpValues[1];
SDValue Offset = DAG.getUNDEF(Ptr.getValueType());
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOStore | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(PtrOperand), MachineMemOperand::MOStore,
+ MachinePointerInfo(PtrOperand), MMOFlags,
LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo);
ST = DAG.getStoreVP(getMemoryRoot(), DL, OpValues[0], Ptr, Offset,
OpValues[2], OpValues[3], VT, MMO, ISD::UNINDEXED,
@@ -8553,9 +8564,12 @@ void SelectionDAGBuilder::visitVPScatter(
Alignment = DAG.getEVTAlign(VT.getScalarType());
unsigned AS =
PtrOperand->getType()->getScalarType()->getPointerAddressSpace();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOStore | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(AS), MachineMemOperand::MOStore,
- LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo);
+ MachinePointerInfo(AS), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ *Alignment, AAInfo);
SDValue Base, Index, Scale;
ISD::MemIndexType IndexType;
bool UniformBase = getUniformBase(PtrOperand, Base, Index, IndexType, Scale,
@@ -8596,9 +8610,13 @@ void SelectionDAGBuilder::visitVPStridedLoad(
bool AddToChain = !BatchAA || !BatchAA->pointsToConstantMemory(ML);
SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();
unsigned AS = PtrOperand->getType()->getPointerAddressSpace();
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOLoad | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(AS), MachineMemOperand::MOLoad,
- LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo, Ranges);
+ MachinePointerInfo(AS), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ *Alignment, AAInfo, Ranges);
SDValue LD = DAG.getStridedLoadVP(VT, DL, InChain, OpValues[0], OpValues[1],
OpValues[2], OpValues[3], MMO,
@@ -8619,9 +8637,13 @@ void SelectionDAGBuilder::visitVPStridedStore(
Alignment = DAG.getEVTAlign(VT.getScalarType());
AAMDNodes AAInfo = VPIntrin.getAAMetadata();
unsigned AS = PtrOperand->getType()->getPointerAddressSpace();
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOStore | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(AS), MachineMemOperand::MOStore,
- LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo);
+ MachinePointerInfo(AS), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ *Alignment, AAInfo);
SDValue ST = DAG.getStridedStoreVP(
getMemoryRoot(), DL, OpValues[0], OpValues[1],
diff --git a/llvm/test/CodeGen/RISCV/nontemporal-vp-scalable.ll b/llvm/test/CodeGen/RISCV/nontemporal-vp-scalable.ll
new file mode 100644
index 0000000000000..4bc6313494d41
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/nontemporal-vp-scalable.ll
@@ -0,0 +1,40677 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 -mattr=+zihintntl,+f,+d,+zfh,+v < %s | FileCheck %s -check-prefix=CHECK-RV64V
+; RUN: llc -mtriple=riscv32 -mattr=+zihintntl,+f,+d,+zfh,+v < %s | FileCheck %s -check-prefix=CHECK-RV32V
+; RUN: llc -mtriple=riscv64 -mattr=+zihintntl,+f,+d,+zfh,+v,+c < %s | FileCheck %s -check-prefix=CHECK-RV64VC
+; RUN: llc -mtriple=riscv32 -mattr=+zihintntl,+f,+d,+zfh,+v,+c < %s | FileCheck %s -check-prefix=CHECK-RV32VC
+
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_P1(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_P1:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.p1
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_P1:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.p1
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_P1:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.p1
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_P1:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.p1
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !1
+ ret <vscale x 1 x i8> %x
+}
+
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_PALL(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_PALL:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.pall
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_PALL:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.pall
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_PALL:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.pall
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_PALL:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.pall
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !2
+ ret <vscale x 1 x i8> %x
+}
+
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_S1(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_S1:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.s1
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_S1:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.s1
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_S1:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.s1
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_S1:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.s1
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !3
+ ret <vscale x 1 x i8> %x
+}
+
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_ALL(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_ALL:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.all
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_ALL:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.all
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_ALL:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.all
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_ALL:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.all
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !4
+ ret <vscale x 1 x i8> %x
+}
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_DEFAULT(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_DEFAULT:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.all
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_DEFAULT:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.all
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_DEFAULT:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.all
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_DEFAULT:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.all
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0
+ ret <vscale x 1 x i8> %x
+}
+
+
+define void @test_nontemporal_vp_store_nxv1i8_P1(<vscale x 1 x i8> %val, ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_store_nxv1i8_P1:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.p1
+; CHECK-RV64V-NEXT: vse8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_store_nxv1i8_P1:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.p1
+; CHECK-RV32V-NEXT: vse8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_store_nxv1i8_P1:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.p1
+; CHECK-RV64VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_store_nxv1i8_P1:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.p1
+; CHECK-RV32VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ call void @llvm.vp.store.nxv1i8.p0(<vscale x 1 x i8> %val, ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !1
+ ret void
+}
+
+
+define void @test_nontemporal_vp_store_nxv1i8_PALL(<vscale x 1 x i8> %val, ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_store_nxv1i8_PALL:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.pall
+; CHECK-RV64V-NEXT: vse8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_store_nxv1i8_PALL:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.pall
+; CHECK-RV32V-NEXT: vse8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_store_nxv1i8_PALL:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.pall
+; CHECK-RV64VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_store_nxv1i8_PALL:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.pall
+; CHECK-RV32VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ call void @llvm.vp.store.nxv1i8.p0(<vscale x 1 x i8> %val, ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !2
+ ret void
+}
+
+
+define void @test_nontemporal_vp_store_nxv1i8_S1(<vscale x 1 x i8> %val, ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_store_nxv1i8_S1:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.s1
+; CHECK-RV64V-NEXT: vse8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_store_nxv1i8_S1:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.s1
+; CHECK-RV32V-NEXT: vse8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_store_nxv1i8_S1:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.s1
+; CHECK-RV64VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL:...
[truncated]
|
@llvm/pr-subscribers-backend-risc-v Author: None (daniel-trujillo-bsc) ChangesThis PR adds support for VP intrinsics to be aware of the nontemporal metadata information. First time contributor here. I hope these changes are simple enough to not be much of a pain to review, and I'm looking forward hear your feedback!. I'm not a GitHub user, so I had to create a throwaway account for this, but you can write to my BSC email (in the commit and in the web: https://www.bsc.es/trujillo-daniel) to verify my identity. Patch is 1.80 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153033.diff 6 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index dc00db9daa3b6..3dab1b1e8712d 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -91,6 +91,7 @@ class TargetLowering;
class TargetMachine;
class TargetSubtargetInfo;
class Value;
+class VPIntrinsic;
template <typename T> class GenericSSAContext;
using SSAContext = GenericSSAContext<Function>;
@@ -1007,6 +1008,11 @@ class SelectionDAG {
llvm_unreachable("Unknown opcode");
}
+ static MachineMemOperand::Flags
+ getNonTemporalMemFlag(const VPIntrinsic &VPIntrin);
+
+ static MachineMemOperand::Flags getNonTemporalMemFlag(const MemSDNode &N);
+
/// Convert Op, which must be of integer type, to the
/// integer type VT, by either any-extending or truncating it.
LLVM_ABI SDValue getAnyExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index bc2dbfb4cbaae..a21a9b518fcde 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -2476,10 +2476,13 @@ void DAGTypeLegalizer::SplitVecRes_Gather(MemSDNode *N, SDValue &Lo,
else
std::tie(IndexLo, IndexHi) = DAG.SplitVector(Ops.Index, dl);
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOLoad |
+ TLI.getTargetMMOFlags(*N) |
+ SelectionDAG::getNonTemporalMemFlag(*N);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- N->getPointerInfo(), MachineMemOperand::MOLoad,
- LocationSize::beforeOrAfterPointer(), Alignment, N->getAAInfo(),
- N->getRanges());
+ N->getPointerInfo(), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ Alignment, N->getAAInfo(), N->getRanges());
if (auto *MGT = dyn_cast<MaskedGatherSDNode>(N)) {
SDValue PassThru = MGT->getPassThru();
@@ -4248,10 +4251,13 @@ SDValue DAGTypeLegalizer::SplitVecOp_Scatter(MemSDNode *N, unsigned OpNo) {
std::tie(IndexLo, IndexHi) = DAG.SplitVector(Ops.Index, DL);
SDValue Lo;
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOStore |
+ TLI.getTargetMMOFlags(*N) |
+ SelectionDAG::getNonTemporalMemFlag(*N);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- N->getPointerInfo(), MachineMemOperand::MOStore,
- LocationSize::beforeOrAfterPointer(), Alignment, N->getAAInfo(),
- N->getRanges());
+ N->getPointerInfo(), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ Alignment, N->getAAInfo(), N->getRanges());
if (auto *MSC = dyn_cast<MaskedScatterSDNode>(N)) {
SDValue OpsLo[] = {Ch, DataLo, MaskLo, Ptr, IndexLo, Ops.Scale};
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 5ef1746333040..4e6d52846ae44 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -56,6 +56,7 @@
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"
+#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Metadata.h"
#include "llvm/IR/Type.h"
#include "llvm/Support/Casting.h"
@@ -14055,6 +14056,19 @@ void SelectionDAG::copyExtraInfo(SDNode *From, SDNode *To) {
SDEI[To] = std::move(NEI);
}
+MachineMemOperand::Flags
+SelectionDAG::getNonTemporalMemFlag(const VPIntrinsic &VPIntrin) {
+ return VPIntrin.hasMetadata(LLVMContext::MD_nontemporal)
+ ? MachineMemOperand::MONonTemporal
+ : MachineMemOperand::MONone;
+}
+
+MachineMemOperand::Flags
+SelectionDAG::getNonTemporalMemFlag(const MemSDNode &N) {
+ return N.isNonTemporal() ? MachineMemOperand::MONonTemporal
+ : MachineMemOperand::MONone;
+}
+
#ifndef NDEBUG
static void checkForCyclesHelper(const SDNode *N,
SmallPtrSetImpl<const SDNode*> &Visited,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 7aa1fadd10dfc..a21992af3ce42 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8438,8 +8438,12 @@ void SelectionDAGBuilder::visitVPLoad(
MemoryLocation ML = MemoryLocation::getAfter(PtrOperand, AAInfo);
bool AddToChain = !BatchAA || !BatchAA->pointsToConstantMemory(ML);
SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOLoad | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(PtrOperand), MachineMemOperand::MOLoad,
+ MachinePointerInfo(PtrOperand), MMOFlags,
LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo, Ranges);
LD = DAG.getLoadVP(VT, DL, InChain, OpValues[0], OpValues[1], OpValues[2],
MMO, false /*IsExpanding */);
@@ -8490,9 +8494,12 @@ void SelectionDAGBuilder::visitVPGather(
Alignment = DAG.getEVTAlign(VT.getScalarType());
unsigned AS =
PtrOperand->getType()->getScalarType()->getPointerAddressSpace();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOLoad | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(AS), MachineMemOperand::MOLoad,
- LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo, Ranges);
+ MachinePointerInfo(AS), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ *Alignment, AAInfo, Ranges);
SDValue Base, Index, Scale;
ISD::MemIndexType IndexType;
bool UniformBase = getUniformBase(PtrOperand, Base, Index, IndexType, Scale,
@@ -8530,8 +8537,12 @@ void SelectionDAGBuilder::visitVPStore(
Alignment = DAG.getEVTAlign(VT);
SDValue Ptr = OpValues[1];
SDValue Offset = DAG.getUNDEF(Ptr.getValueType());
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOStore | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(PtrOperand), MachineMemOperand::MOStore,
+ MachinePointerInfo(PtrOperand), MMOFlags,
LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo);
ST = DAG.getStoreVP(getMemoryRoot(), DL, OpValues[0], Ptr, Offset,
OpValues[2], OpValues[3], VT, MMO, ISD::UNINDEXED,
@@ -8553,9 +8564,12 @@ void SelectionDAGBuilder::visitVPScatter(
Alignment = DAG.getEVTAlign(VT.getScalarType());
unsigned AS =
PtrOperand->getType()->getScalarType()->getPointerAddressSpace();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOStore | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(AS), MachineMemOperand::MOStore,
- LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo);
+ MachinePointerInfo(AS), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ *Alignment, AAInfo);
SDValue Base, Index, Scale;
ISD::MemIndexType IndexType;
bool UniformBase = getUniformBase(PtrOperand, Base, Index, IndexType, Scale,
@@ -8596,9 +8610,13 @@ void SelectionDAGBuilder::visitVPStridedLoad(
bool AddToChain = !BatchAA || !BatchAA->pointsToConstantMemory(ML);
SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();
unsigned AS = PtrOperand->getType()->getPointerAddressSpace();
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOLoad | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(AS), MachineMemOperand::MOLoad,
- LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo, Ranges);
+ MachinePointerInfo(AS), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ *Alignment, AAInfo, Ranges);
SDValue LD = DAG.getStridedLoadVP(VT, DL, InChain, OpValues[0], OpValues[1],
OpValues[2], OpValues[3], MMO,
@@ -8619,9 +8637,13 @@ void SelectionDAGBuilder::visitVPStridedStore(
Alignment = DAG.getEVTAlign(VT.getScalarType());
AAMDNodes AAInfo = VPIntrin.getAAMetadata();
unsigned AS = PtrOperand->getType()->getPointerAddressSpace();
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ MachineMemOperand::Flags MMOFlags =
+ MachineMemOperand::MOStore | TLI.getTargetMMOFlags(VPIntrin) |
+ SelectionDAG::getNonTemporalMemFlag(VPIntrin);
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
- MachinePointerInfo(AS), MachineMemOperand::MOStore,
- LocationSize::beforeOrAfterPointer(), *Alignment, AAInfo);
+ MachinePointerInfo(AS), MMOFlags, LocationSize::beforeOrAfterPointer(),
+ *Alignment, AAInfo);
SDValue ST = DAG.getStridedStoreVP(
getMemoryRoot(), DL, OpValues[0], OpValues[1],
diff --git a/llvm/test/CodeGen/RISCV/nontemporal-vp-scalable.ll b/llvm/test/CodeGen/RISCV/nontemporal-vp-scalable.ll
new file mode 100644
index 0000000000000..4bc6313494d41
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/nontemporal-vp-scalable.ll
@@ -0,0 +1,40677 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 -mattr=+zihintntl,+f,+d,+zfh,+v < %s | FileCheck %s -check-prefix=CHECK-RV64V
+; RUN: llc -mtriple=riscv32 -mattr=+zihintntl,+f,+d,+zfh,+v < %s | FileCheck %s -check-prefix=CHECK-RV32V
+; RUN: llc -mtriple=riscv64 -mattr=+zihintntl,+f,+d,+zfh,+v,+c < %s | FileCheck %s -check-prefix=CHECK-RV64VC
+; RUN: llc -mtriple=riscv32 -mattr=+zihintntl,+f,+d,+zfh,+v,+c < %s | FileCheck %s -check-prefix=CHECK-RV32VC
+
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_P1(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_P1:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.p1
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_P1:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.p1
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_P1:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.p1
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_P1:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.p1
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !1
+ ret <vscale x 1 x i8> %x
+}
+
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_PALL(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_PALL:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.pall
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_PALL:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.pall
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_PALL:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.pall
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_PALL:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.pall
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !2
+ ret <vscale x 1 x i8> %x
+}
+
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_S1(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_S1:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.s1
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_S1:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.s1
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_S1:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.s1
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_S1:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.s1
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !3
+ ret <vscale x 1 x i8> %x
+}
+
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_ALL(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_ALL:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.all
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_ALL:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.all
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_ALL:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.all
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_ALL:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.all
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !4
+ ret <vscale x 1 x i8> %x
+}
+
+define <vscale x 1 x i8> @test_nontemporal_vp_load_nxv1i8_DEFAULT(ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_load_nxv1i8_DEFAULT:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.all
+; CHECK-RV64V-NEXT: vle8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_load_nxv1i8_DEFAULT:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.all
+; CHECK-RV32V-NEXT: vle8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_load_nxv1i8_DEFAULT:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.all
+; CHECK-RV64VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_load_nxv1i8_DEFAULT:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.all
+; CHECK-RV32VC-NEXT: vle8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ %x = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0
+ ret <vscale x 1 x i8> %x
+}
+
+
+define void @test_nontemporal_vp_store_nxv1i8_P1(<vscale x 1 x i8> %val, ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_store_nxv1i8_P1:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.p1
+; CHECK-RV64V-NEXT: vse8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_store_nxv1i8_P1:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.p1
+; CHECK-RV32V-NEXT: vse8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_store_nxv1i8_P1:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.p1
+; CHECK-RV64VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_store_nxv1i8_P1:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.p1
+; CHECK-RV32VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ call void @llvm.vp.store.nxv1i8.p0(<vscale x 1 x i8> %val, ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !1
+ ret void
+}
+
+
+define void @test_nontemporal_vp_store_nxv1i8_PALL(<vscale x 1 x i8> %val, ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_store_nxv1i8_PALL:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.pall
+; CHECK-RV64V-NEXT: vse8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_store_nxv1i8_PALL:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.pall
+; CHECK-RV32V-NEXT: vse8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_store_nxv1i8_PALL:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.pall
+; CHECK-RV64VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL: test_nontemporal_vp_store_nxv1i8_PALL:
+; CHECK-RV32VC: # %bb.0:
+; CHECK-RV32VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32VC-NEXT: c.ntl.pall
+; CHECK-RV32VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV32VC-NEXT: ret
+ call void @llvm.vp.store.nxv1i8.p0(<vscale x 1 x i8> %val, ptr %p, <vscale x 1 x i1> splat(i1 true), i32 %vl), !nontemporal !0, !riscv-nontemporal-domain !2
+ ret void
+}
+
+
+define void @test_nontemporal_vp_store_nxv1i8_S1(<vscale x 1 x i8> %val, ptr %p, i32 zeroext %vl) {
+; CHECK-RV64V-LABEL: test_nontemporal_vp_store_nxv1i8_S1:
+; CHECK-RV64V: # %bb.0:
+; CHECK-RV64V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64V-NEXT: ntl.s1
+; CHECK-RV64V-NEXT: vse8.v v8, (a0)
+; CHECK-RV64V-NEXT: ret
+;
+; CHECK-RV32V-LABEL: test_nontemporal_vp_store_nxv1i8_S1:
+; CHECK-RV32V: # %bb.0:
+; CHECK-RV32V-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV32V-NEXT: ntl.s1
+; CHECK-RV32V-NEXT: vse8.v v8, (a0)
+; CHECK-RV32V-NEXT: ret
+;
+; CHECK-RV64VC-LABEL: test_nontemporal_vp_store_nxv1i8_S1:
+; CHECK-RV64VC: # %bb.0:
+; CHECK-RV64VC-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
+; CHECK-RV64VC-NEXT: c.ntl.s1
+; CHECK-RV64VC-NEXT: vse8.v v8, (a0)
+; CHECK-RV64VC-NEXT: ret
+;
+; CHECK-RV32VC-LABEL:...
[truncated]
|
Since I'm a new contributor and can't edit the reviewers, according to the docs I have to @mention. In this case, I think the most appropriate reviewer may be @topperc |
@@ -2476,10 +2476,13 @@ void DAGTypeLegalizer::SplitVecRes_Gather(MemSDNode *N, SDValue &Lo, | |||
else | |||
std::tie(IndexLo, IndexHi) = DAG.SplitVector(Ops.Index, dl); | |||
|
|||
const TargetLowering &TLI = DAG.getTargetLoweringInfo(); | |||
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOLoad | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just copy the flags from the original node like SplitVecRes_LOAD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the feedback!
This is a much cleaner (and I bet, correct) way of doing it, thanks, I'm sorry I missed it.
Nonetheless, I haven't been able to find anything similar for the changes in the SelectionDAGBuilder.cpp file.
I've fixed this in the type legalizer in the new commit (af7b616)
@@ -4248,10 +4251,13 @@ SDValue DAGTypeLegalizer::SplitVecOp_Scatter(MemSDNode *N, unsigned OpNo) { | |||
std::tie(IndexLo, IndexHi) = DAG.SplitVector(Ops.Index, DL); | |||
|
|||
SDValue Lo; | |||
const TargetLowering &TLI = DAG.getTargetLoweringInfo(); | |||
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MOStore | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just copy the flags from the original node like SplitVecOp_STORE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the previous comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test should be in the rvv
directory with the other vector tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved! Thanks!
I initially thought that it made sense to be where the other nontemporal tests are, but since these are also RVV tests, it makes also sense to put them in rvv.
I've checked the existing tests (not new in this PR) nontemporal-scalable.ll and nontemporal.ll, and they might also be moved to rvv. Should I do it?
…tests into the rvv folder
@@ -1007,6 +1008,9 @@ class SelectionDAG { | |||
llvm_unreachable("Unknown opcode"); | |||
} | |||
|
|||
static MachineMemOperand::Flags | |||
getNonTemporalMemFlag(const VPIntrinsic &VPIntrin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this more like TargetLoweringBase::getLoadMemOperandFlags
, TargetLoweringBase::getStoreMemOperandFlags
, TargetLoweringBase::getAtomicMemOperandFlags
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, just to be sure I understood it, are you suggesting to remove this procedure and create another in TargetLoweringBase like these you mention, and call it instead of doing MachineMemOperand::MOLoad | TLI.getTargetMMOFlags(VPIntrin) | SelectionDAG::getNonTemporalMemFlag(VPIntrin);
? In fact, these already compute the flags I computed, but also others. May be that I can just use getLoadMemOperandFlags
and getStoreMemOperandFlags
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately getLoadMemOperandFlags and getStoreMemOperandFlags operate LoadInst and StoreInst. So I think we need a getVPIntrinMemOperandFlags.
We'll need to distinquish when to set MOLoad vs MOStore. We could decode the intrinsic ID in getVPIntrinMemOperandFlags or pass a bool IsStore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've created the procedure and decoded the intrinsic there. (I've added a safeguard to ensure the procedure is not used for untested intrinsics), and now the computation of the flags is much cleaner.
…gBase. Delete dead code frem SelectionDAG after refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I guess you need me to merge this? |
I guess so!... I can't merge myself since I (rightfully) don't have permissions. But I don't know the procedure in this case, and in particular, if the other reviewers must provide feedback. |
@daniel-trujillo-bsc Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
This PR adds support for VP intrinsics to be aware of the nontemporal metadata information.
First time contributor here. I hope these changes are simple enough to not be much of a pain to review, and I'm looking forward hear your feedback!.
I'm not a GitHub user, so I had to create a throwaway account for this, but you can write to my BSC email (in the commit and in the web: https://www.bsc.es/trujillo-daniel) to verify my identity.