Skip to content

Conversation

ElvisWang123
Copy link
Contributor

@ElvisWang123 ElvisWang123 commented Jul 22, 2025

This patch implements the getAddressComputationCost() in RISCV TTI which
make the gather/scatter with address calculation more expansive that
stride cost.

Note that the only user of getAddressComputationCost() with vector
type is in VPWidenMemoryRecipe::computeCost(). So this patch make some
LV tests changes.

I've checked the tests changes in LV and seems those changes can be
divided into two groups.

  • gather/scatter with uniform vector ptr, seems can be optimized to
    masked.load.
  • can optimize to stride load/store.

@llvmbot llvmbot added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jul 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 22, 2025

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Elvis Wang (ElvisWang123)

Changes

This patch add the address computation cost for the getGatherScatterOpCost() by add <base_addr>, <offset>.
This can help the cost of the gather/scatter more expensive than strided memory access.

This patch is also the preparation of generating the strided memory recipes in the LV.

Note that some tests changes shows that some loops won't be vectorized after this patch. It's fine since it should use strided memory access which has same cost with gather/scatter before the patch.


Patch is 125.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149955.diff

12 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+7-1)
  • (modified) llvm/test/Analysis/CostModel/RISCV/fixed-vector-gather.ll (+43-43)
  • (modified) llvm/test/Analysis/CostModel/RISCV/fixed-vector-scatter.ll (+43-43)
  • (modified) llvm/test/Analysis/CostModel/RISCV/gep.ll (+4-4)
  • (modified) llvm/test/Analysis/CostModel/RISCV/scalable-gather.ll (+41-41)
  • (modified) llvm/test/Analysis/CostModel/RISCV/scalable-scatter.ll (+41-41)
  • (modified) llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+48-48)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+29-29)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+8-72)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vf-will-not-generate-any-vector-insts.ll (+2-39)
  • (modified) llvm/test/Transforms/SLPVectorizer/RISCV/remarks-insert-into-small-vector.ll (+1-1)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 56ead92187b04..a7bc80b721db7 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1085,6 +1085,12 @@ InstructionCost RISCVTTIImpl::getGatherScatterOpCost(
     return BaseT::getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,
                                          Alignment, CostKind, I);
 
+  // Gather/Scatter instruction will need to calculate the address of each
+  // element before accessing memory.
+  InstructionCost AddrCost = getArithmeticInstrCost(
+      Instruction::Add, Ptr->getType(), CostKind,
+      {TTI::OK_AnyValue, TTI::OP_None}, {TTI::OK_AnyValue, TTI::OP_None}, {});
+
   // Cost is proportional to the number of memory operations implied.  For
   // scalable vectors, we use an estimate on that number since we don't
   // know exactly what VL will be.
@@ -1093,7 +1099,7 @@ InstructionCost RISCVTTIImpl::getGatherScatterOpCost(
       getMemoryOpCost(Opcode, VTy.getElementType(), Alignment, 0, CostKind,
                       {TTI::OK_AnyValue, TTI::OP_None}, I);
   unsigned NumLoads = getEstimatedVLFor(&VTy);
-  return NumLoads * MemOpCost;
+  return AddrCost + NumLoads * MemOpCost;
 }
 
 InstructionCost RISCVTTIImpl::getExpandCompressMemoryOpCost(
diff --git a/llvm/test/Analysis/CostModel/RISCV/fixed-vector-gather.ll b/llvm/test/Analysis/CostModel/RISCV/fixed-vector-gather.ll
index 6eec7ed2f98ec..e01bfe4670314 100644
--- a/llvm/test/Analysis/CostModel/RISCV/fixed-vector-gather.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/fixed-vector-gather.ll
@@ -6,49 +6,49 @@
 
 define i32 @masked_gather() {
 ; CHECK-LABEL: 'masked_gather'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8F64 = call <8 x double> @llvm.masked.gather.v8f64.v8p0(<8 x ptr> undef, i32 8, <8 x i1> undef, <8 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4F64 = call <4 x double> @llvm.masked.gather.v4f64.v4p0(<4 x ptr> undef, i32 8, <4 x i1> undef, <4 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F64 = call <2 x double> @llvm.masked.gather.v2f64.v2p0(<2 x ptr> undef, i32 8, <2 x i1> undef, <2 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V1F64 = call <1 x double> @llvm.masked.gather.v1f64.v1p0(<1 x ptr> undef, i32 8, <1 x i1> undef, <1 x double> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16F32 = call <16 x float> @llvm.masked.gather.v16f32.v16p0(<16 x ptr> undef, i32 4, <16 x i1> undef, <16 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8F32 = call <8 x float> @llvm.masked.gather.v8f32.v8p0(<8 x ptr> undef, i32 4, <8 x i1> undef, <8 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4F32 = call <4 x float> @llvm.masked.gather.v4f32.v4p0(<4 x ptr> undef, i32 4, <4 x i1> undef, <4 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F32 = call <2 x float> @llvm.masked.gather.v2f32.v2p0(<2 x ptr> undef, i32 4, <2 x i1> undef, <2 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V1F32 = call <1 x float> @llvm.masked.gather.v1f32.v1p0(<1 x ptr> undef, i32 4, <1 x i1> undef, <1 x float> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %V32BF16 = call <32 x bfloat> @llvm.masked.gather.v32bf16.v32p0(<32 x ptr> undef, i32 2, <32 x i1> undef, <32 x bfloat> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16BF16 = call <16 x bfloat> @llvm.masked.gather.v16bf16.v16p0(<16 x ptr> undef, i32 2, <16 x i1> undef, <16 x bfloat> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8BF16 = call <8 x bfloat> @llvm.masked.gather.v8bf16.v8p0(<8 x ptr> undef, i32 2, <8 x i1> undef, <8 x bfloat> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4BF16 = call <4 x bfloat> @llvm.masked.gather.v4bf16.v4p0(<4 x ptr> undef, i32 2, <4 x i1> undef, <4 x bfloat> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2BF16 = call <2 x bfloat> @llvm.masked.gather.v2bf16.v2p0(<2 x ptr> undef, i32 2, <2 x i1> undef, <2 x bfloat> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V1BF16 = call <1 x bfloat> @llvm.masked.gather.v1bf16.v1p0(<1 x ptr> undef, i32 2, <1 x i1> undef, <1 x bfloat> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %V32F16 = call <32 x half> @llvm.masked.gather.v32f16.v32p0(<32 x ptr> undef, i32 2, <32 x i1> undef, <32 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16F16 = call <16 x half> @llvm.masked.gather.v16f16.v16p0(<16 x ptr> undef, i32 2, <16 x i1> undef, <16 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8F16 = call <8 x half> @llvm.masked.gather.v8f16.v8p0(<8 x ptr> undef, i32 2, <8 x i1> undef, <8 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4F16 = call <4 x half> @llvm.masked.gather.v4f16.v4p0(<4 x ptr> undef, i32 2, <4 x i1> undef, <4 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2F16 = call <2 x half> @llvm.masked.gather.v2f16.v2p0(<2 x ptr> undef, i32 2, <2 x i1> undef, <2 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V1F16 = call <1 x half> @llvm.masked.gather.v1f16.v1p0(<1 x ptr> undef, i32 2, <1 x i1> undef, <1 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8I64 = call <8 x i64> @llvm.masked.gather.v8i64.v8p0(<8 x ptr> undef, i32 8, <8 x i1> undef, <8 x i64> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I64 = call <4 x i64> @llvm.masked.gather.v4i64.v4p0(<4 x ptr> undef, i32 8, <4 x i1> undef, <4 x i64> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2I64 = call <2 x i64> @llvm.masked.gather.v2i64.v2p0(<2 x ptr> undef, i32 8, <2 x i1> undef, <2 x i64> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V1I64 = call <1 x i64> @llvm.masked.gather.v1i64.v1p0(<1 x ptr> undef, i32 8, <1 x i1> undef, <1 x i64> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16I32 = call <16 x i32> @llvm.masked.gather.v16i32.v16p0(<16 x ptr> undef, i32 4, <16 x i1> undef, <16 x i32> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8I32 = call <8 x i32> @llvm.masked.gather.v8i32.v8p0(<8 x ptr> undef, i32 4, <8 x i1> undef, <8 x i32> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I32 = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> undef, i32 4, <4 x i1> undef, <4 x i32> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2I32 = call <2 x i32> @llvm.masked.gather.v2i32.v2p0(<2 x ptr> undef, i32 4, <2 x i1> undef, <2 x i32> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V1I32 = call <1 x i32> @llvm.masked.gather.v1i32.v1p0(<1 x ptr> undef, i32 4, <1 x i1> undef, <1 x i32> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %V32I16 = call <32 x i16> @llvm.masked.gather.v32i16.v32p0(<32 x ptr> undef, i32 2, <32 x i1> undef, <32 x i16> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16I16 = call <16 x i16> @llvm.masked.gather.v16i16.v16p0(<16 x ptr> undef, i32 2, <16 x i1> undef, <16 x i16> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8I16 = call <8 x i16> @llvm.masked.gather.v8i16.v8p0(<8 x ptr> undef, i32 2, <8 x i1> undef, <8 x i16> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I16 = call <4 x i16> @llvm.masked.gather.v4i16.v4p0(<4 x ptr> undef, i32 2, <4 x i1> undef, <4 x i16> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2I16 = call <2 x i16> @llvm.masked.gather.v2i16.v2p0(<2 x ptr> undef, i32 2, <2 x i1> undef, <2 x i16> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V1I16 = call <1 x i16> @llvm.masked.gather.v1i16.v1p0(<1 x ptr> undef, i32 2, <1 x i1> undef, <1 x i16> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %V64I8 = call <64 x i8> @llvm.masked.gather.v64i8.v64p0(<64 x ptr> undef, i32 1, <64 x i1> undef, <64 x i8> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %V32I8 = call <32 x i8> @llvm.masked.gather.v32i8.v32p0(<32 x ptr> undef, i32 1, <32 x i1> undef, <32 x i8> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16I8 = call <16 x i8> @llvm.masked.gather.v16i8.v16p0(<16 x ptr> undef, i32 1, <16 x i1> undef, <16 x i8> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8I8 = call <8 x i8> @llvm.masked.gather.v8i8.v8p0(<8 x ptr> undef, i32 1, <8 x i1> undef, <8 x i8> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4I8 = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> undef, i32 1, <4 x i1> undef, <4 x i8> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2I8 = call <2 x i8> @llvm.masked.gather.v2i8.v2p0(<2 x ptr> undef, i32 1, <2 x i1> undef, <2 x i8> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V1I8 = call <1 x i8> @llvm.masked.gather.v1i8.v1p0(<1 x ptr> undef, i32 1, <1 x i1> undef, <1 x i8> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V8F64 = call <8 x double> @llvm.masked.gather.v8f64.v8p0(<8 x ptr> undef, i32 8, <8 x i1> undef, <8 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V4F64 = call <4 x double> @llvm.masked.gather.v4f64.v4p0(<4 x ptr> undef, i32 8, <4 x i1> undef, <4 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2F64 = call <2 x double> @llvm.masked.gather.v2f64.v2p0(<2 x ptr> undef, i32 8, <2 x i1> undef, <2 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V1F64 = call <1 x double> @llvm.masked.gather.v1f64.v1p0(<1 x ptr> undef, i32 8, <1 x i1> undef, <1 x double> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V16F32 = call <16 x float> @llvm.masked.gather.v16f32.v16p0(<16 x ptr> undef, i32 4, <16 x i1> undef, <16 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V8F32 = call <8 x float> @llvm.masked.gather.v8f32.v8p0(<8 x ptr> undef, i32 4, <8 x i1> undef, <8 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V4F32 = call <4 x float> @llvm.masked.gather.v4f32.v4p0(<4 x ptr> undef, i32 4, <4 x i1> undef, <4 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2F32 = call <2 x float> @llvm.masked.gather.v2f32.v2p0(<2 x ptr> undef, i32 4, <2 x i1> undef, <2 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V1F32 = call <1 x float> @llvm.masked.gather.v1f32.v1p0(<1 x ptr> undef, i32 4, <1 x i1> undef, <1 x float> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: %V32BF16 = call <32 x bfloat> @llvm.masked.gather.v32bf16.v32p0(<32 x ptr> undef, i32 2, <32 x i1> undef, <32 x bfloat> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V16BF16 = call <16 x bfloat> @llvm.masked.gather.v16bf16.v16p0(<16 x ptr> undef, i32 2, <16 x i1> undef, <16 x bfloat> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V8BF16 = call <8 x bfloat> @llvm.masked.gather.v8bf16.v8p0(<8 x ptr> undef, i32 2, <8 x i1> undef, <8 x bfloat> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V4BF16 = call <4 x bfloat> @llvm.masked.gather.v4bf16.v4p0(<4 x ptr> undef, i32 2, <4 x i1> undef, <4 x bfloat> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2BF16 = call <2 x bfloat> @llvm.masked.gather.v2bf16.v2p0(<2 x ptr> undef, i32 2, <2 x i1> undef, <2 x bfloat> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V1BF16 = call <1 x bfloat> @llvm.masked.gather.v1bf16.v1p0(<1 x ptr> undef, i32 2, <1 x i1> undef, <1 x bfloat> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: %V32F16 = call <32 x half> @llvm.masked.gather.v32f16.v32p0(<32 x ptr> undef, i32 2, <32 x i1> undef, <32 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V16F16 = call <16 x half> @llvm.masked.gather.v16f16.v16p0(<16 x ptr> undef, i32 2, <16 x i1> undef, <16 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V8F16 = call <8 x half> @llvm.masked.gather.v8f16.v8p0(<8 x ptr> undef, i32 2, <8 x i1> undef, <8 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V4F16 = call <4 x half> @llvm.masked.gather.v4f16.v4p0(<4 x ptr> undef, i32 2, <4 x i1> undef, <4 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2F16 = call <2 x half> @llvm.masked.gather.v2f16.v2p0(<2 x ptr> undef, i32 2, <2 x i1> undef, <2 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V1F16 = call <1 x half> @llvm.masked.gather.v1f16.v1p0(<1 x ptr> undef, i32 2, <1 x i1> undef, <1 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V8I64 = call <8 x i64> @llvm.masked.gather.v8i64.v8p0(<8 x ptr> undef, i32 8, <8 x i1> undef, <8 x i64> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V4I64 = call <4 x i64> @llvm.masked.gather.v4i64.v4p0(<4 x ptr> undef, i32 8, <4 x i1> undef, <4 x i64> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2I64 = call <2 x i64> @llvm.masked.gather.v2i64.v2p0(<2 x ptr> undef, i32 8, <2 x i1> undef, <2 x i64> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V1I64 = call <1 x i64> @llvm.masked.gather.v1i64.v1p0(<1 x ptr> undef, i32 8, <1 x i1> undef, <1 x i64> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V16I32 = call <16 x i32> @llvm.masked.gather.v16i32.v16p0(<16 x ptr> undef, i32 4, <16 x i1> undef, <16 x i32> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V8I32 = call <8 x i32> @llvm.masked.gather.v8i32.v8p0(<8 x ptr> undef, i32 4, <8 x i1> undef, <8 x i32> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V4I32 = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> undef, i32 4, <4 x i1> undef, <4 x i32> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = call <2 x i32> @llvm.masked.gather.v2i32.v2p0(<2 x ptr> undef, i32 4, <2 x i1> undef, <2 x i32> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V1I32 = call <1 x i32> @llvm.masked.gather.v1i32.v1p0(<1 x ptr> undef, i32 4, <1 x i1> undef, <1 x i32> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: %V32I16 = call <32 x i16> @llvm.masked.gather.v32i16.v32p0(<32 x ptr> undef, i32 2, <32 x i1> undef, <32 x i16> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V16I16 = call <16 x i16> @llvm.masked.gather.v16i16.v16p0(<16 x ptr> undef, i32 2, <16 x i1> undef, <16 x i16> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V8I16 = call <8 x i16> @llvm.masked.gather.v8i16.v8p0(<8 x ptr> undef, i32 2, <8 x i1> undef, <8 x i16> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V4I16 = call <4 x i16> @llvm.masked.gather.v4i16.v4p0(<4 x ptr> undef, i32 2, <4 x i1> undef, <4 x i16> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2I16 = call <2 x i16> @llvm.masked.gather.v2i16.v2p0(<2 x ptr> undef, i32 2, <2 x i1> undef, <2 x i16> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V1I16 = call <1 x i16> @llvm.masked.gather.v1i16.v1p0(<1 x ptr> undef, i32 2, <1 x i1> undef, <1 x i16> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 96 for instruction: %V64I8 = call <64 x i8> @llvm.masked.gather.v64i8.v64p0(<64 x ptr> undef, i32 1, <64 x i1> undef, <64 x i8> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: %V32I8 = call <32 x i8> @llvm.masked.gather.v32i8.v32p0(<32 x ptr> undef, i32 1, <32 x i1> undef, <32 x i8> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V16I8 = call <16 x i8> @llvm.masked.gather.v16i8.v16p0(<16 x ptr> undef, i32 1, <16 x i1> undef, <16 x i8> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V8I8 = call <8 x i8> @llvm.masked.gather.v8i8.v8p0(<8 x ptr> undef, i32 1, <8 x i1> undef, <8 x i8> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V4I8 = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> undef, i32 1, <4 x i1> undef, <4 x i8> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2I8 = call <2 x i8> @llvm.masked.gather.v2i8.v2p0(<2 x ptr> undef, i32 1, <2 x i1> undef, <2 x i8> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V1I8 = call <1 x i8> @llvm.masked.gather.v1i8.v1p0(<1 x ptr> undef, i32 1, <1 x i1> undef, <1 x i8> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 28 for instruction: %V8F64.u = call <8 x double> @llvm.masked.gather.v8f64.v8p0(<8 x ptr> undef, i32 2, <8 x i1> undef, <8 x double> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: %V4F64.u = call <4 x double> @llvm.masked.gather.v4f64.v4p0(<4 x ptr> undef, i32 2, <4 x i1> undef, <4 x double> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V2F64.u = call <2 x double> @llvm.masked.gather.v2f64.v2p0(<2 x ptr> undef, i32 2, <2 x i1> undef, <2 x double> undef)
diff --git a/llvm/test/Analysis/CostModel/RISCV/fixed-vector-scatter.ll b/llvm/test/Analysis/CostModel/RISCV/fixed-vector-scatter.ll
index 338683e12654c..bdab59e47d21d 100644
--- a/llvm/test/Analysis/CostModel/RISCV/fixed-vector-scatter.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/fixed-vector-scatter.ll
@@ -6,49 +6,49 @@
 
 define i32 @masked_scatter() {
 ; CHECK-LABEL: 'masked_scatter'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.v8f64.v8p0(<8 x double> undef, <8 x ptr> undef, i32 8, <8 x i1> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.scatter.v4f64.v4p0(<4 x double> undef, <4 x ptr> undef, i32 8, <4 x i1> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.scatter.v2f64.v2p0(<2 x double> undef, <2 x ptr> undef, i32 8, <2 x i1> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.scatter.v1f64.v1p0(<1 x double> undef, <1 x ptr> undef, i32 8, <1 x i1> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.v16f32.v16p0(<16 x float> undef, <16 x ptr> undef, i32 4, <16 x i1> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.v8f32.v8p0(<8 x float> undef, <8 x ptr> undef, i32 4, <8 x i1> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.scatter.v4f32.v4p0(<4 x float> undef, <4 x ptr> undef, i32 4, <4 x i1> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.scatter.v2f32.v2p0(<2 x float> undef, <2 x ptr> undef, i32 4, <2 x i1> ...
[truncated]

@@ -8,7 +8,7 @@
; YAML-NEXT: Function: test
; YAML-NEXT: Args:
; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
; YAML-NEXT: - Cost: '-2'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be affected here, SLP already includes cost for address calculation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the address computation cost to getAddressComputationCost().

@topperc topperc changed the title [RISCV][TTI] Add address computation cost for getGaterhScatterOpCost(). [RISCV][TTI] Add address computation cost for getGatherScatterOpCost(). Jul 23, 2025
@ElvisWang123 ElvisWang123 force-pushed the fix-gather/scatter-cost branch from 27ffbad to b833781 Compare July 23, 2025 04:07
@ElvisWang123 ElvisWang123 changed the title [RISCV][TTI] Add address computation cost for getGatherScatterOpCost(). [RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. Jul 23, 2025
Copy link
Contributor Author

@ElvisWang123 ElvisWang123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the address computation cost to getAddressComputationCost.

Some of the LV tests changes seems can prevent.
I will try to optimize the gather/scatter with uniform address if possible.

@@ -8,7 +8,7 @@
; YAML-NEXT: Function: test
; YAML-NEXT: Args:
; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
; YAML-NEXT: - Cost: '-2'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the address computation cost to getAddressComputationCost().

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't believe we haven't already implemented this, nice find.

; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_PHI:%.*]] = phi <vscale x 2 x i1> [ zeroinitializer, %[[VECTOR_PH]] ], [ [[PREDPHI:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x ptr> @llvm.masked.gather.nxv2p0.nxv2p0(<vscale x 2 x ptr> [[BROADCAST_SPLAT4]], i32 8, <vscale x 2 x i1> [[TMP8]], <vscale x 2 x ptr> poison)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say, I think this should only be adding the scalar cost here because it's uniform. Is checking that the SCEV isn't an addrec good enough? I think we can use !BaseT::isStridedAccess(Ptr) for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!
I think using !BaseT::isStridedAccess(Ptr) need to change the VPWIdenMemory::computeCost() and rewrite some functions in vplan.
Currently VPWidenMemoryRecipe doesn't contain SCEV ptr and it hard to infer by the ptr in the underlying instruction (getAddressAccessSCEV() needs LoopVectorizationLegality, PredicatedScalarEvolution and Loop which is not in the VPlan).

I think check if the addr is uniform during VPWidenMemory::computeCost is better. If the addr is uniform, query getAddressComputationCost() with scalar type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do that first before landing this PR? Otherwise I think this test diff is a regression, it should be profitable to vectorize this loop but we're bailing now because the address computation cost is too high

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will fix VPWIdenMemoryRecipe::computeCost() first to prevent the regression in this PR.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM!

Copy link
Contributor Author

@ElvisWang123 ElvisWang123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base on updating VPWidenMemoryRecipe::computeCost() in LV #150371.

vectorie-force-tail-with-evl-gather-scatter looks like won't be vectorized since the cost of gather/scatter increase.

InstructionCost Cost = 0;

// If the address value is uniform across all lane, then the address can be
// calculated with scalar type and broacast.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// calculated with scalar type and broacast.
// calculated with scalar type and broadcast.

@ElvisWang123 ElvisWang123 force-pushed the fix-gather/scatter-cost branch from 2c80b3b to 00143e8 Compare July 29, 2025 04:58
@ElvisWang123 ElvisWang123 force-pushed the fix-gather/scatter-cost branch from 00143e8 to f37a84c Compare August 11, 2025 02:59
@ElvisWang123
Copy link
Contributor Author

Rebase on #150371 to show how test changes.

@alexey-bataev
Copy link
Member

Rebase

@ElvisWang123 ElvisWang123 force-pushed the fix-gather/scatter-cost branch from f37a84c to 768c654 Compare August 12, 2025 02:17
ElvisWang123 added a commit that referenced this pull request Aug 14, 2025
This patch add cost kind to `getAddressComputationCost()` for #149955.

Note that this patch also remove all the default value in `getAddressComputationCost()`.
@ElvisWang123 ElvisWang123 force-pushed the fix-gather/scatter-cost branch from 768c654 to a382b45 Compare August 15, 2025 00:28
@ElvisWang123 ElvisWang123 force-pushed the fix-gather/scatter-cost branch from a382b45 to 4716215 Compare August 25, 2025 03:04
ElvisWang123 added a commit that referenced this pull request Aug 26, 2025
…or gather/scatter. (NFC) (#150371)

This patch query `getAddressComputationCost()` with scalar type if the
address is uniform. This can help the cost for gather/scatter more
accurate.

In current LV, non consecutive VPWidenMemoryRecipe (gather/scatter) will
account the cost of address computation. But there are some cases that
the address is uniform across all lanes, that makes the address can be
calculated with scalar type and broadcast.

I have a followup optimization that tries to convert gather/scatter with
uniform memory access to scalar load/store + broadcast (and select if
needed). With this optimization, we can remove this temporary change.

This patch is preparation for #149955 to prevent regressions.
@ElvisWang123 ElvisWang123 force-pushed the fix-gather/scatter-cost branch from 4716215 to 5db0a39 Compare August 26, 2025 04:09
@ElvisWang123
Copy link
Contributor Author

After #150371 landed, the potential regression is gone (5db0a39)

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I looked at the two loops in the diff and it does look like they were probably unprofitably vectorized. gather_scatter needs a vsll.vi to scale the offsets, and store_factor_4_reverse needs a vwmulsu.vx.

This patch impelemt the `getAddressComputationCost()` in RISCV TTI which
make the gather/scatter with address calculation more expansive that
stride cost.

Note that the only user of `getAddressComputationCost()` with vector
type is in `VPWidenMemoryRecipe::computeCost()`. So this patch make some
LV tests changes.

I've checked the tests changes in LV and seems those changes can be
divided into two groups.
 * gather/scatter with uniform vector ptr, seems can be optimized to
 masked.load.
 * can optimize to stirde load/store.
@ElvisWang123 ElvisWang123 force-pushed the fix-gather/scatter-cost branch from d9b4254 to 0210fc9 Compare August 26, 2025 23:50
@ElvisWang123 ElvisWang123 merged commit dfd3833 into llvm:main Aug 27, 2025
9 checks passed
@ElvisWang123 ElvisWang123 deleted the fix-gather/scatter-cost branch August 27, 2025 00:40
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 27, 2025

LLVM Buildbot has detected a new failure on builder clang-riscv-gauntlet running on rise-worker-1 while building llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/210/builds/1935

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/rise-riscv-gauntlet-build.sh --jobs=32' (failure)
...
[1735/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcapimin.c.o
[1736/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcmainct.c.o
[1737/6087] Building CXX object MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CMakeFiles/CLAMR.dir/partition.cpp.o
[1738/6087] Building CXX object MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CMakeFiles/CLAMR.dir/clamr_cpuonly.cpp.o
[1739/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/djpeg.c.o
[1740/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jccolor.c.o
[1741/6087] Building CXX object MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CMakeFiles/CLAMR.dir/MallocPlus.cpp.o
[1742/6087] Building C object MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/CMakeFiles/miniGMG.dir/mg.c.o
[1743/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcomapi.c.o
[1744/6087] Building C object MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o 
/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang --target=riscv64-linux-gnu --sysroot=/home/buildbot-worker/bbroot/clang-riscv-gauntlet/../rvsysroot -DNDEBUG  -march=rva23u64 -DSMALL_PROBLEM_SIZE    -O3 -DNDEBUG   -w -Werror=date-time -std=c99 -DDOUBLE -MD -MT MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -MF MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o.d -o MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -c /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/linkCells.c
clang: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7071: VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || BestPlan.hasEarlyExit() || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) || planContainsAdditionalSimplifications( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang --target=riscv64-linux-gnu --sysroot=/home/buildbot-worker/bbroot/clang-riscv-gauntlet/../rvsysroot -DNDEBUG -march=rva23u64 -DSMALL_PROBLEM_SIZE -O3 -DNDEBUG -w -Werror=date-time -std=c99 -DDOUBLE -MD -MT MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -MF MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o.d -o MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -c /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/linkCells.c
1.	<eof> parser at end of file
2.	Optimizer
3.	Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,chr,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O3>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/linkCells.c"
4.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "getNeighborBoxes"
 #0 0x000055d91e2669a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4e539a6)
 #1 0x000055d91e263ef5 llvm::sys::RunSignalHandlers() (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4e50ef5)
 #2 0x000055d91e1c91dd CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f80a4a4def0 (/usr/lib/libc.so.6+0x3def0)
 #4 0x00007f80a4aa774c (/usr/lib/libc.so.6+0x9774c)
 #5 0x00007f80a4a4ddc0 raise (/usr/lib/libc.so.6+0x3ddc0)
 #6 0x00007f80a4a3557a abort (/usr/lib/libc.so.6+0x2557a)
 #7 0x00007f80a4a354e3 __assert_perror_fail (/usr/lib/libc.so.6+0x254e3)
 #8 0x000055d91f4c87b8 (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60b57b8)
 #9 0x000055d91f4dd625 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60ca625)
#10 0x000055d91f4e5f2b llvm::LoopVectorizePass::runImpl(llvm::Function&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60d2f2b)
#11 0x000055d91f4e67e5 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60d37e5)
#12 0x000055d91f4672bd llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#13 0x000055d91dd2e1c8 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491b1c8)
#14 0x000055d91d17d06d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) X86CodeGenPassBuilder.cpp:0:0
#15 0x000055d91dd321df llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491f1df)
#16 0x000055d91d17d1dd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) X86CodeGenPassBuilder.cpp:0:0
#17 0x000055d91dd2d1b8 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491a1b8)
#18 0x000055d91ea23b3d (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#19 0x000055d91ea1ae0c clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5607e0c)
#20 0x000055d91ea303bf clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x561d3bf)
#21 0x000055d920683079 clang::ParseAST(clang::Sema&, bool, bool) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x7270079)
#22 0x000055d91ef94ab4 clang::FrontendAction::Execute() (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5b81ab4)
#23 0x000055d91eef81ad clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5ae51ad)
#24 0x000055d91f0945e8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5c815e8)
#25 0x000055d91cc21d21 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x380ed21)
#26 0x000055d91cc1da6f ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#27 0x000055d91ed41c79 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#28 0x000055d91e1c8e0e llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4db5e0e)
Step 17 (rva23: llvm-test-suite build) failure: rva23: llvm-test-suite build (failure)
...
[1735/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcapimin.c.o
[1736/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcmainct.c.o
[1737/6087] Building CXX object MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CMakeFiles/CLAMR.dir/partition.cpp.o
[1738/6087] Building CXX object MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CMakeFiles/CLAMR.dir/clamr_cpuonly.cpp.o
[1739/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/djpeg.c.o
[1740/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jccolor.c.o
[1741/6087] Building CXX object MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CMakeFiles/CLAMR.dir/MallocPlus.cpp.o
[1742/6087] Building C object MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/CMakeFiles/miniGMG.dir/mg.c.o
[1743/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcomapi.c.o
[1744/6087] Building C object MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o 
/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang --target=riscv64-linux-gnu --sysroot=/home/buildbot-worker/bbroot/clang-riscv-gauntlet/../rvsysroot -DNDEBUG  -march=rva23u64 -DSMALL_PROBLEM_SIZE    -O3 -DNDEBUG   -w -Werror=date-time -std=c99 -DDOUBLE -MD -MT MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -MF MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o.d -o MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -c /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/linkCells.c
clang: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7071: VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || BestPlan.hasEarlyExit() || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) || planContainsAdditionalSimplifications( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang --target=riscv64-linux-gnu --sysroot=/home/buildbot-worker/bbroot/clang-riscv-gauntlet/../rvsysroot -DNDEBUG -march=rva23u64 -DSMALL_PROBLEM_SIZE -O3 -DNDEBUG -w -Werror=date-time -std=c99 -DDOUBLE -MD -MT MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -MF MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o.d -o MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -c /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/linkCells.c
1.	<eof> parser at end of file
2.	Optimizer
3.	Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,chr,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O3>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/linkCells.c"
4.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "getNeighborBoxes"
 #0 0x000055d91e2669a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4e539a6)
 #1 0x000055d91e263ef5 llvm::sys::RunSignalHandlers() (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4e50ef5)
 #2 0x000055d91e1c91dd CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f80a4a4def0 (/usr/lib/libc.so.6+0x3def0)
 #4 0x00007f80a4aa774c (/usr/lib/libc.so.6+0x9774c)
 #5 0x00007f80a4a4ddc0 raise (/usr/lib/libc.so.6+0x3ddc0)
 #6 0x00007f80a4a3557a abort (/usr/lib/libc.so.6+0x2557a)
 #7 0x00007f80a4a354e3 __assert_perror_fail (/usr/lib/libc.so.6+0x254e3)
 #8 0x000055d91f4c87b8 (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60b57b8)
 #9 0x000055d91f4dd625 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60ca625)
#10 0x000055d91f4e5f2b llvm::LoopVectorizePass::runImpl(llvm::Function&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60d2f2b)
#11 0x000055d91f4e67e5 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60d37e5)
#12 0x000055d91f4672bd llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#13 0x000055d91dd2e1c8 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491b1c8)
#14 0x000055d91d17d06d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) X86CodeGenPassBuilder.cpp:0:0
#15 0x000055d91dd321df llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491f1df)
#16 0x000055d91d17d1dd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) X86CodeGenPassBuilder.cpp:0:0
#17 0x000055d91dd2d1b8 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491a1b8)
#18 0x000055d91ea23b3d (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#19 0x000055d91ea1ae0c clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5607e0c)
#20 0x000055d91ea303bf clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x561d3bf)
#21 0x000055d920683079 clang::ParseAST(clang::Sema&, bool, bool) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x7270079)
#22 0x000055d91ef94ab4 clang::FrontendAction::Execute() (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5b81ab4)
#23 0x000055d91eef81ad clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5ae51ad)
#24 0x000055d91f0945e8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5c815e8)
#25 0x000055d91cc21d21 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x380ed21)
#26 0x000055d91cc1da6f ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#27 0x000055d91ed41c79 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#28 0x000055d91e1c8e0e llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4db5e0e)
Step 19 (rva23-zvl1024b: llvm-test-suite build) failure: rva23-zvl1024b: llvm-test-suite build (failure)
...
[172/6087] Building C object MultiSource/Applications/JM/ldecod/CMakeFiles/ldecod.dir/erc_do_p.c.o
[173/6087] Building C object MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/cabac.c.o
[174/6087] Building C object MultiSource/Applications/JM/ldecod/CMakeFiles/ldecod.dir/vlc.c.o
[175/6087] Building C object MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/configfile.c.o
[176/6087] Building C object MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/fmo.c.o
[177/6087] Building C object MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/header.c.o
[178/6087] Building C object MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/img_chroma.c.o
[179/6087] Building C object MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/intrarefresh.c.o
[180/6087] Building C object MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/leaky_bucket.c.o
[181/6087] Building C object MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/explicit_gop.c.o
FAILED: MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/explicit_gop.c.o 
/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang --target=riscv64-linux-gnu --sysroot=/home/buildbot-worker/bbroot/clang-riscv-gauntlet/../rvsysroot -DNDEBUG  -march=rva23u64_zvl1024b -DSMALL_PROBLEM_SIZE    -O3 -DNDEBUG   -w -Werror=date-time -fcommon -D__USE_LARGEFILE64 -D_FILE_OFFSET_BITS=64 -MD -MT MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/explicit_gop.c.o -MF MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/explicit_gop.c.o.d -o MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/explicit_gop.c.o -c /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Applications/JM/lencod/explicit_gop.c
clang: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7071: VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || BestPlan.hasEarlyExit() || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) || planContainsAdditionalSimplifications( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang --target=riscv64-linux-gnu --sysroot=/home/buildbot-worker/bbroot/clang-riscv-gauntlet/../rvsysroot -DNDEBUG -march=rva23u64_zvl1024b -DSMALL_PROBLEM_SIZE -O3 -DNDEBUG -w -Werror=date-time -fcommon -D__USE_LARGEFILE64 -D_FILE_OFFSET_BITS=64 -MD -MT MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/explicit_gop.c.o -MF MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/explicit_gop.c.o.d -o MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/explicit_gop.c.o -c /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Applications/JM/lencod/explicit_gop.c
1.	<eof> parser at end of file
2.	Optimizer
3.	Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,chr,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O3>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Applications/JM/lencod/explicit_gop.c"
4.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "create_hierarchy"
 #0 0x0000562a8e2479a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4e539a6)
 #1 0x0000562a8e244ef5 llvm::sys::RunSignalHandlers() (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4e50ef5)
 #2 0x0000562a8e1aa1dd CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f793ce4def0 (/usr/lib/libc.so.6+0x3def0)
 #4 0x00007f793cea774c (/usr/lib/libc.so.6+0x9774c)
 #5 0x00007f793ce4ddc0 raise (/usr/lib/libc.so.6+0x3ddc0)
 #6 0x00007f793ce3557a abort (/usr/lib/libc.so.6+0x2557a)
 #7 0x00007f793ce354e3 __assert_perror_fail (/usr/lib/libc.so.6+0x254e3)
 #8 0x0000562a8f4a97b8 (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60b57b8)
 #9 0x0000562a8f4be625 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60ca625)
#10 0x0000562a8f4c6f2b llvm::LoopVectorizePass::runImpl(llvm::Function&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60d2f2b)
#11 0x0000562a8f4c77e5 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60d37e5)
#12 0x0000562a8f4482bd llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#13 0x0000562a8dd0f1c8 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491b1c8)
#14 0x0000562a8d15e06d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) X86CodeGenPassBuilder.cpp:0:0
#15 0x0000562a8dd131df llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491f1df)
#16 0x0000562a8d15e1dd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) X86CodeGenPassBuilder.cpp:0:0
#17 0x0000562a8dd0e1b8 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491a1b8)
#18 0x0000562a8ea04b3d (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#19 0x0000562a8e9fbe0c clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5607e0c)
#20 0x0000562a8ea113bf clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x561d3bf)
#21 0x0000562a90664079 clang::ParseAST(clang::Sema&, bool, bool) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x7270079)
#22 0x0000562a8ef75ab4 clang::FrontendAction::Execute() (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5b81ab4)
#23 0x0000562a8eed91ad clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5ae51ad)
#24 0x0000562a8f0755e8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5c815e8)
#25 0x0000562a8cc02d21 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x380ed21)
#26 0x0000562a8cbfea6f ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#27 0x0000562a8ed22c79 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#28 0x0000562a8e1a9e0e llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4db5e0e)
Step 21 (rva23-mrvv-vec-bits: llvm-test-suite build) failure: rva23-mrvv-vec-bits: llvm-test-suite build (failure)
...
[1729/6087] Building C object MultiSource/Benchmarks/MiBench/automotive-bitcount/CMakeFiles/automotive-bitcount.dir/bitfiles.c.o
[1730/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/cdjpeg.c.o
[1731/6087] Building C object MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/eam.c.o
[1732/6087] Building C object MultiSource/Benchmarks/MiBench/automotive-bitcount/CMakeFiles/automotive-bitcount.dir/bitcnts.c.o
[1733/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcapimin.c.o
[1734/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcinit.c.o
[1735/6087] Building C object MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/CMakeFiles/miniGMG.dir/mg.c.o
[1736/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcmainct.c.o
[1737/6087] Building C object MultiSource/Benchmarks/MiBench/consumer-jpeg/CMakeFiles/consumer-jpeg.dir/jcapistd.c.o
[1738/6087] Building C object MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o 
/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang --target=riscv64-linux-gnu --sysroot=/home/buildbot-worker/bbroot/clang-riscv-gauntlet/../rvsysroot -DNDEBUG  -march=rva23u64 -mrvv-vector-bits=zvl -DSMALL_PROBLEM_SIZE    -O3 -DNDEBUG   -w -Werror=date-time -std=c99 -DDOUBLE -MD -MT MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -MF MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o.d -o MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -c /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/linkCells.c
clang: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7071: VectorizationFactor llvm::LoopVectorizationPlanner::computeBestVF(): Assertion `(BestFactor.Width == LegacyVF.Width || BestPlan.hasEarlyExit() || planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width), CostCtx, OrigLoop, BestFactor.Width) || planContainsAdditionalSimplifications( getPlanFor(LegacyVF.Width), CostCtx, OrigLoop, LegacyVF.Width)) && " VPlan cost model and legacy cost model disagreed"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang --target=riscv64-linux-gnu --sysroot=/home/buildbot-worker/bbroot/clang-riscv-gauntlet/../rvsysroot -DNDEBUG -march=rva23u64 -mrvv-vector-bits=zvl -DSMALL_PROBLEM_SIZE -O3 -DNDEBUG -w -Werror=date-time -std=c99 -DDOUBLE -MD -MT MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -MF MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o.d -o MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CMakeFiles/CoMD.dir/linkCells.c.o -c /home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/linkCells.c
1.	<eof> parser at end of file
2.	Optimizer
3.	Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,chr,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O3>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-test-suite/MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/linkCells.c"
4.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "getNeighborBoxes"
 #0 0x00005599c8f1b9a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4e539a6)
 #1 0x00005599c8f18ef5 llvm::sys::RunSignalHandlers() (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4e50ef5)
 #2 0x00005599c8e7e1dd CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007fb469e4def0 (/usr/lib/libc.so.6+0x3def0)
 #4 0x00007fb469ea774c (/usr/lib/libc.so.6+0x9774c)
 #5 0x00007fb469e4ddc0 raise (/usr/lib/libc.so.6+0x3ddc0)
 #6 0x00007fb469e3557a abort (/usr/lib/libc.so.6+0x2557a)
 #7 0x00007fb469e354e3 __assert_perror_fail (/usr/lib/libc.so.6+0x254e3)
 #8 0x00005599ca17d7b8 (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60b57b8)
 #9 0x00005599ca192625 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60ca625)
#10 0x00005599ca19af2b llvm::LoopVectorizePass::runImpl(llvm::Function&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60d2f2b)
#11 0x00005599ca19b7e5 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x60d37e5)
#12 0x00005599ca11c2bd llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#13 0x00005599c89e31c8 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491b1c8)
#14 0x00005599c7e3206d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) X86CodeGenPassBuilder.cpp:0:0
#15 0x00005599c89e71df llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491f1df)
#16 0x00005599c7e321dd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) X86CodeGenPassBuilder.cpp:0:0
#17 0x00005599c89e21b8 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x491a1b8)
#18 0x00005599c96d8b3d (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#19 0x00005599c96cfe0c clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5607e0c)
#20 0x00005599c96e53bf clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x561d3bf)
#21 0x00005599cb338079 clang::ParseAST(clang::Sema&, bool, bool) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x7270079)
#22 0x00005599c9c49ab4 clang::FrontendAction::Execute() (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5b81ab4)
#23 0x00005599c9bad1ad clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5ae51ad)
#24 0x00005599c9d495e8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x5c815e8)
#25 0x00005599c78d6d21 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x380ed21)
#26 0x00005599c78d2a6f ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#27 0x00005599c99f6c79 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#28 0x00005599c8e7de0e llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/buildbot-worker/bbroot/clang-riscv-gauntlet/llvm-project/build/stage1/bin/clang+0x4db5e0e)

@ElvisWang123
Copy link
Contributor Author

ElvisWang123 commented Aug 27, 2025

Reverted, will fix the assertion (cost-model disagree) in llvm-test-suite and try to land again.

@ElvisWang123 ElvisWang123 restored the fix-gather/scatter-cost branch August 27, 2025 02:39
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 27, 2025
ElvisWang123 added a commit to ElvisWang123/llvm-project that referenced this pull request Aug 28, 2025
This patch check if the addr is uniform in legacy cost model to align
vplan-based cost model after llvm#150371.

This patch fixes llvm-test-suite assertion due to cost model
misaligned after llvm#149955 under RISCV.

I've tested this patch (on top of llvm#149955) on the llvm-test-suite
locally with crached options `rva23u64`, `rva23u64_zvl1024b` and
build successfully.
ElvisWang123 added a commit that referenced this pull request Sep 2, 2025
…w/ uniform addr. (#155739)

This patch check if the addr is uniform in legacy cost model to align
vplan-based cost model after #150371.

This patch fixes llvm-test-suite assertion
(https://lab.llvm.org/buildbot/#/builders/210/builds/1935) due to cost
model misaligned after #149955 under RISCV.

I've tested this patch (on top of #149955) on the llvm-test-suite
locally with crashed options `rva23u64`, `rva23u64_zvl1024b` and build
successfully.

Since this fix will change LV, I think would be better to create a PR to
fix this.
ElvisWang123 added a commit that referenced this pull request Sep 2, 2025
… TTI. #149955" (#156386)

This patch implements the `getAddressComputationCost()` in RISCV TTI
which
make the gather/scatter with address calculation more expansive that
stride cost.

Note that the only user of `getAddressComputationCost()` with vector
type is in `VPWidenMemoryRecipe::computeCost()`. So this patch make some
LV tests changes.

I've checked the tests changes in LV and seems those changes can be
divided into two groups.
 * gather/scatter with uniform vector ptr, seems can be optimized to
 masked.load.
 * can optimize to stride load/store.

----
After #155739 landed, the assertion (cost mis-aligned) is fixed.
I've tested llvm-test-suite w/ rva23u64 and rva23u64_zvl1024b locally
and no assertion occurred.
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's another crash: https://llvm.godbolt.org/z/7dzxxzYcK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms vectorizers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants