AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select #153174

petar-avramovic · 2025-08-12T12:30:43Z

No description provided.

petar-avramovic · 2025-08-12T12:31:01Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-08-12T12:33:55Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Petar Avramovic (petar-avramovic)

Changes

Patch is 74.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153174.diff

35 Files Affected:

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/add_shl.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.i1.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-asserts.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/br-constant-invalid-sgpr-copy.ll (+10-6)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-imm-chain.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-of-shifted-logic.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll (+192-203)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.gfx.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.end.cf.i32.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.if.break.i32.ll (+6-8)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.implicit.ptr.buffer.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memcpy.inline.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-add.s32.mir (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.readfirstlane.mir (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.getpc.mir (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-bitcast.mir (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-brcond.mir (+27-20)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-build-vector.mir (+48-53)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-default.mir (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-fadd.mir (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-frame-index.mir (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ptr-add.mir (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-split-scalar-load-metadata.mir (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sub.mir (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-uitofp.mir (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/shlN_add.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/trunc.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/v_bfe_i32.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/allow-check.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn-cs-chain-intrinsic-dyn-vgpr-w32.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/bitop3.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/invalid-addrspacecast.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/twoaddr-constrain.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/wait-before-stores-with-scope_sys.ll (+1-1)

diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/add_shl.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/add_shl.ll
index b68df4fbbbb9e..59036338eaf15 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/add_shl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/add_shl.ll
@@ -1,8 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=fiji < %s | FileCheck -check-prefix=VI %s
-; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx900 < %s | FileCheck -check-prefix=GFX9 %s
-; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 < %s | FileCheck -check-prefix=GFX10 %s
-; RUN: llc -global-isel -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | FileCheck -check-prefix=GFX10 %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-mesa3d -mcpu=fiji < %s | FileCheck -check-prefix=VI %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-mesa3d -mcpu=gfx900 < %s | FileCheck -check-prefix=GFX9 %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1010 < %s | FileCheck -check-prefix=GFX10 %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-mesa3d -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | FileCheck -check-prefix=GFX10 %s
 
 ; ===================================================================================
 ; V_ADD_LSHL_U32
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.i1.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.i1.ll
index 74422a1962344..25d70002a7a8e 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.i1.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.i1.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize64 < %s | FileCheck -check-prefix=WAVE64 %s
-; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 < %s | FileCheck -check-prefix=WAVE32 %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize64 < %s | FileCheck -check-prefix=WAVE64 %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 < %s | FileCheck -check-prefix=WAVE32 %s
 
 define i32 @s_andn2_i1_vcc(i32 %arg0, i32 %arg1) {
 ; WAVE64-LABEL: s_andn2_i1_vcc:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-asserts.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-asserts.ll
index cdcc3a4f27071..fae3a75101ee5 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-asserts.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-asserts.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 < %s | FileCheck %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 < %s | FileCheck %s
 
 define hidden <2 x i64> @icmp_v2i32_sext_to_v2i64(<2 x i32> %arg) {
 ; CHECK-LABEL: icmp_v2i32_sext_to_v2i64:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/br-constant-invalid-sgpr-copy.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/br-constant-invalid-sgpr-copy.ll
index 439ffbac960b8..22324e62c2ab5 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/br-constant-invalid-sgpr-copy.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/br-constant-invalid-sgpr-copy.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -global-isel -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefix=WAVE64 %s
-; RUN: llc -global-isel -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 < %s | FileCheck -check-prefix=WAVE32 %s
+; RUN: llc -global-isel -new-reg-bank-select -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefix=WAVE64 %s
+; RUN: llc -global-isel -new-reg-bank-select -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1031 < %s | FileCheck -check-prefix=WAVE32 %s
 
 ; This was mishandling the constant true and false values used as a
 ; scalar branch condition.
@@ -76,7 +76,8 @@ define void @br_undef() {
 ; WAVE64-NEXT:  .LBB2_1: ; %bb0
 ; WAVE64-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; WAVE64-NEXT:    ; implicit-def: $sgpr4
-; WAVE64-NEXT:    s_and_b32 s4, s4, 1
+; WAVE64-NEXT:    s_mov_b32 s5, 1
+; WAVE64-NEXT:    s_and_b32 s4, s4, s5
 ; WAVE64-NEXT:    s_cmp_lg_u32 s4, 0
 ; WAVE64-NEXT:    s_cbranch_scc1 .LBB2_1
 ; WAVE64-NEXT:  ; %bb.2: ; %.exit5
@@ -88,7 +89,8 @@ define void @br_undef() {
 ; WAVE32-NEXT:  .LBB2_1: ; %bb0
 ; WAVE32-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; WAVE32-NEXT:    ; implicit-def: $sgpr4
-; WAVE32-NEXT:    s_and_b32 s4, s4, 1
+; WAVE32-NEXT:    s_mov_b32 s5, 1
+; WAVE32-NEXT:    s_and_b32 s4, s4, s5
 ; WAVE32-NEXT:    s_cmp_lg_u32 s4, 0
 ; WAVE32-NEXT:    s_cbranch_scc1 .LBB2_1
 ; WAVE32-NEXT:  ; %bb.2: ; %.exit5
@@ -110,7 +112,8 @@ define void @br_poison() {
 ; WAVE64-NEXT:  .LBB3_1: ; %bb0
 ; WAVE64-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; WAVE64-NEXT:    ; implicit-def: $sgpr4
-; WAVE64-NEXT:    s_and_b32 s4, s4, 1
+; WAVE64-NEXT:    s_mov_b32 s5, 1
+; WAVE64-NEXT:    s_and_b32 s4, s4, s5
 ; WAVE64-NEXT:    s_cmp_lg_u32 s4, 0
 ; WAVE64-NEXT:    s_cbranch_scc1 .LBB3_1
 ; WAVE64-NEXT:  ; %bb.2: ; %.exit5
@@ -122,7 +125,8 @@ define void @br_poison() {
 ; WAVE32-NEXT:  .LBB3_1: ; %bb0
 ; WAVE32-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; WAVE32-NEXT:    ; implicit-def: $sgpr4
-; WAVE32-NEXT:    s_and_b32 s4, s4, 1
+; WAVE32-NEXT:    s_mov_b32 s5, 1
+; WAVE32-NEXT:    s_and_b32 s4, s4, s5
 ; WAVE32-NEXT:    s_cmp_lg_u32 s4, 0
 ; WAVE32-NEXT:    s_cbranch_scc1 .LBB3_1
 ; WAVE32-NEXT:  ; %bb.2: ; %.exit5
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-imm-chain.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-imm-chain.ll
index 2d3088f3edb72..917cdb3f49a26 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-imm-chain.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-imm-chain.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -global-isel -mtriple=amdgcn < %s | FileCheck %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn < %s | FileCheck %s
 
 define amdgpu_cs i32 @test_shl_1(i32 inreg %arg1) {
 ; CHECK-LABEL: test_shl_1:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-of-shifted-logic.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-of-shifted-logic.ll
index 5532443c0dfc8..914a26b2fb525 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-of-shifted-logic.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shift-of-shifted-logic.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -global-isel -mtriple=amdgcn < %s | FileCheck %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn < %s | FileCheck %s
 
 define amdgpu_cs i32 @test_shl_and_1(i32 inreg %arg1) {
 ; CHECK-LABEL: test_shl_and_1:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll
index a8a75cd2ffaa8..dd01112d97a18 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
-; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 < %s | FileCheck -check-prefix=GFX10 %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 < %s | FileCheck -check-prefix=GFX10 %s
 
 ; This file contains various tests that have divergent i1s used outside of
 ; the loop. These are lane masks is sgpr and need to have correct value in
@@ -13,30 +13,27 @@ define void @divergent_i1_phi_used_outside_loop(float %val, float %pre.cond.val,
 ; GFX10-LABEL: divergent_i1_phi_used_outside_loop:
 ; GFX10:       ; %bb.0: ; %entry
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    s_mov_b32 s4, 0
 ; GFX10-NEXT:    v_cmp_lt_f32_e64 s5, 1.0, v1
-; GFX10-NEXT:    v_mov_b32_e32 v1, s4
-; GFX10-NEXT:    ; implicit-def: $sgpr6
+; GFX10-NEXT:    s_mov_b32 s4, 0
+; GFX10-NEXT:    s_mov_b32 s6, 0
 ; GFX10-NEXT:    ; implicit-def: $sgpr7
 ; GFX10-NEXT:  .LBB0_1: ; %loop
 ; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX10-NEXT:    v_cvt_f32_u32_e32 v4, v1
-; GFX10-NEXT:    s_xor_b32 s8, s5, -1
-; GFX10-NEXT:    v_add_nc_u32_e32 v1, 1, v1
-; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v4, v0
+; GFX10-NEXT:    v_cvt_f32_u32_e32 v1, s6
+; GFX10-NEXT:    s_mov_b32 s8, exec_lo
+; GFX10-NEXT:    s_add_i32 s6, s6, 1
+; GFX10-NEXT:    s_xor_b32 s8, s5, s8
+; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v1, v0
 ; GFX10-NEXT:    s_or_b32 s4, vcc_lo, s4
 ; GFX10-NEXT:    s_andn2_b32 s7, s7, exec_lo
-; GFX10-NEXT:    s_and_b32 s5, exec_lo, s5
-; GFX10-NEXT:    s_andn2_b32 s6, s6, exec_lo
-; GFX10-NEXT:    s_or_b32 s7, s7, s5
+; GFX10-NEXT:    s_and_b32 s9, exec_lo, s5
 ; GFX10-NEXT:    s_mov_b32 s5, s8
-; GFX10-NEXT:    s_and_b32 s9, exec_lo, s7
-; GFX10-NEXT:    s_or_b32 s6, s6, s9
+; GFX10-NEXT:    s_or_b32 s7, s7, s9
 ; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s4
 ; GFX10-NEXT:    s_cbranch_execnz .LBB0_1
 ; GFX10-NEXT:  ; %bb.2: ; %exit
 ; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s4
-; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s6
+; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s7
 ; GFX10-NEXT:    flat_store_dword v[2:3], v0
 ; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX10-NEXT:    s_setpc_b64 s[30:31]
@@ -63,43 +60,44 @@ define void @divergent_i1_phi_used_outside_loop_larger_loop_body(float %val, ptr
 ; GFX10-LABEL: divergent_i1_phi_used_outside_loop_larger_loop_body:
 ; GFX10:       ; %bb.0: ; %entry
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    s_mov_b32 s4, -1
-; GFX10-NEXT:    ; implicit-def: $sgpr6
-; GFX10-NEXT:    v_mov_b32_e32 v0, s4
 ; GFX10-NEXT:    s_andn2_b32 s5, s4, exec_lo
-; GFX10-NEXT:    s_and_b32 s4, exec_lo, -1
-; GFX10-NEXT:    s_or_b32 s4, s5, s4
+; GFX10-NEXT:    s_and_b32 s6, exec_lo, exec_lo
+; GFX10-NEXT:    s_mov_b32 s4, -1
+; GFX10-NEXT:    s_or_b32 s7, s5, s6
+; GFX10-NEXT:    ; implicit-def: $sgpr5
 ; GFX10-NEXT:    s_branch .LBB1_2
 ; GFX10-NEXT:  .LBB1_1: ; %loop.cond
 ; GFX10-NEXT:    ; in Loop: Header=BB1_2 Depth=1
-; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s4
-; GFX10-NEXT:    v_add_co_u32 v1, s4, v1, 4
-; GFX10-NEXT:    v_add_nc_u32_e32 v0, 1, v0
-; GFX10-NEXT:    v_add_co_ci_u32_e64 v2, s4, 0, v2, s4
-; GFX10-NEXT:    s_andn2_b32 s7, s5, exec_lo
-; GFX10-NEXT:    s_and_b32 s8, exec_lo, s6
-; GFX10-NEXT:    v_cmp_le_i32_e32 vcc_lo, 10, v0
-; GFX10-NEXT:    s_or_b32 s4, s7, s8
-; GFX10-NEXT:    s_cbranch_vccz .LBB1_4
+; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s7
+; GFX10-NEXT:    s_add_i32 s4, s4, 1
+; GFX10-NEXT:    v_add_co_u32 v1, vcc_lo, v1, 4
+; GFX10-NEXT:    s_cmp_ge_i32 s4, 10
+; GFX10-NEXT:    v_add_co_ci_u32_e32 v2, vcc_lo, 0, v2, vcc_lo
+; GFX10-NEXT:    s_cselect_b32 s8, 1, 0
+; GFX10-NEXT:    s_andn2_b32 s7, s6, exec_lo
+; GFX10-NEXT:    s_and_b32 s9, exec_lo, s5
+; GFX10-NEXT:    s_or_b32 s7, s7, s9
+; GFX10-NEXT:    s_cmp_lg_u32 s8, 0
+; GFX10-NEXT:    s_cbranch_scc0 .LBB1_4
 ; GFX10-NEXT:  .LBB1_2: ; %loop.start
 ; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX10-NEXT:    s_mov_b32 s5, s4
-; GFX10-NEXT:    s_andn2_b32 s4, s6, exec_lo
-; GFX10-NEXT:    s_and_b32 s6, exec_lo, s5
-; GFX10-NEXT:    s_or_b32 s6, s4, s6
-; GFX10-NEXT:    s_and_saveexec_b32 s4, s5
+; GFX10-NEXT:    s_mov_b32 s6, s7
+; GFX10-NEXT:    s_andn2_b32 s5, s5, exec_lo
+; GFX10-NEXT:    s_and_b32 s7, exec_lo, s7
+; GFX10-NEXT:    s_or_b32 s5, s5, s7
+; GFX10-NEXT:    s_and_saveexec_b32 s7, s6
 ; GFX10-NEXT:    s_cbranch_execz .LBB1_1
 ; GFX10-NEXT:  ; %bb.3: ; %is.eq.zero
 ; GFX10-NEXT:    ; in Loop: Header=BB1_2 Depth=1
-; GFX10-NEXT:    global_load_dword v5, v[1:2], off
-; GFX10-NEXT:    s_andn2_b32 s6, s6, exec_lo
+; GFX10-NEXT:    global_load_dword v0, v[1:2], off
+; GFX10-NEXT:    s_andn2_b32 s5, s5, exec_lo
 ; GFX10-NEXT:    s_waitcnt vmcnt(0)
-; GFX10-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v5
-; GFX10-NEXT:    s_and_b32 s7, exec_lo, vcc_lo
-; GFX10-NEXT:    s_or_b32 s6, s6, s7
+; GFX10-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX10-NEXT:    s_and_b32 s8, exec_lo, vcc_lo
+; GFX10-NEXT:    s_or_b32 s5, s5, s8
 ; GFX10-NEXT:    s_branch .LBB1_1
 ; GFX10-NEXT:  .LBB1_4: ; %exit
-; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s5
+; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s6
 ; GFX10-NEXT:    flat_store_dword v[3:4], v0
 ; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX10-NEXT:    s_setpc_b64 s[30:31]
@@ -135,29 +133,26 @@ define void @divergent_i1_xor_used_outside_loop(float %val, float %pre.cond.val,
 ; GFX10-LABEL: divergent_i1_xor_used_outside_loop:
 ; GFX10:       ; %bb.0: ; %entry
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    s_mov_b32 s4, 0
 ; GFX10-NEXT:    v_cmp_lt_f32_e64 s5, 1.0, v1
-; GFX10-NEXT:    v_mov_b32_e32 v1, s4
-; GFX10-NEXT:    ; implicit-def: $sgpr6
+; GFX10-NEXT:    s_mov_b32 s4, 0
+; GFX10-NEXT:    s_mov_b32 s6, 0
 ; GFX10-NEXT:    ; implicit-def: $sgpr7
 ; GFX10-NEXT:  .LBB2_1: ; %loop
 ; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX10-NEXT:    v_cvt_f32_u32_e32 v4, v1
-; GFX10-NEXT:    s_xor_b32 s5, s5, -1
-; GFX10-NEXT:    v_add_nc_u32_e32 v1, 1, v1
-; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v4, v0
+; GFX10-NEXT:    v_cvt_f32_u32_e32 v1, s6
+; GFX10-NEXT:    s_mov_b32 s8, exec_lo
+; GFX10-NEXT:    s_add_i32 s6, s6, 1
+; GFX10-NEXT:    s_xor_b32 s5, s5, s8
+; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v1, v0
 ; GFX10-NEXT:    s_or_b32 s4, vcc_lo, s4
 ; GFX10-NEXT:    s_andn2_b32 s7, s7, exec_lo
 ; GFX10-NEXT:    s_and_b32 s8, exec_lo, s5
-; GFX10-NEXT:    s_andn2_b32 s6, s6, exec_lo
 ; GFX10-NEXT:    s_or_b32 s7, s7, s8
-; GFX10-NEXT:    s_and_b32 s8, exec_lo, s7
-; GFX10-NEXT:    s_or_b32 s6, s6, s8
 ; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s4
 ; GFX10-NEXT:    s_cbranch_execnz .LBB2_1
 ; GFX10-NEXT:  ; %bb.2: ; %exit
 ; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s4
-; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s6
+; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s7
 ; GFX10-NEXT:    flat_store_dword v[2:3], v0
 ; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX10-NEXT:    s_setpc_b64 s[30:31]
@@ -184,23 +179,20 @@ define void @divergent_i1_xor_used_outside_loop_twice(float %val, float %pre.con
 ; GFX10-LABEL: divergent_i1_xor_used_outside_loop_twice:
 ; GFX10:       ; %bb.0: ; %entry
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    s_mov_b32 s4, 0
 ; GFX10-NEXT:    v_cmp_lt_f32_e64 s5, 1.0, v1
-; GFX10-NEXT:    v_mov_b32_e32 v1, s4
+; GFX10-NEXT:    s_mov_b32 s4, 0
+; GFX10-NEXT:    s_mov_b32 s7, 0
 ; GFX10-NEXT:    ; implicit-def: $sgpr6
-; GFX10-NEXT:    ; implicit-def: $sgpr7
 ; GFX10-NEXT:  .LBB3_1: ; %loop
 ; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX10-NEXT:    v_cvt_f32_u32_e32 v6, v1
-; GFX10-NEXT:    s_xor_b32 s5, s5, -1
-; GFX10-NEXT:    v_add_nc_u32_e32 v1, 1, v1
-; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v6, v0
+; GFX10-NEXT:    v_cvt_f32_u32_e32 v1, s7
+; GFX10-NEXT:    s_mov_b32 s8, exec_lo
+; GFX10-NEXT:    s_add_i32 s7, s7, 1
+; GFX10-NEXT:    s_xor_b32 s5, s5, s8
+; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v1, v0
 ; GFX10-NEXT:    s_or_b32 s4, vcc_lo, s4
-; GFX10-NEXT:    s_andn2_b32 s7, s7, exec_lo
-; GFX10-NEXT:    s_and_b32 s8, exec_lo, s5
 ; GFX10-NEXT:    s_andn2_b32 s6, s6, exec_lo
-; GFX10-NEXT:    s_or_b32 s7, s7, s8
-; GFX10-NEXT:    s_and_b32 s8, exec_lo, s7
+; GFX10-NEXT:    s_and_b32 s8, exec_lo, s5
 ; GFX10-NEXT:    s_or_b32 s6, s6, s8
 ; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s4
 ; GFX10-NEXT:    s_cbranch_execnz .LBB3_1
@@ -247,66 +239,64 @@ define void @divergent_i1_xor_used_outside_loop_larger_loop_body(i32 %num.elts,
 ; GFX10:       ; %bb.0: ; %entry
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX10-NEXT:    s_mov_b32 s5, 0
-; GFX10-NEXT:    s_mov_b32 s6, -1
-; GFX10-NEXT:    s_and_saveexec_b32 s4, vcc_lo
+; GFX10-NEXT:    s_mov_b32 s6, exec_lo
+; GFX10-NEXT:    s_mov_b32 s8, 0
+; GFX10-NEXT:    s_and_saveexec_b32 s7, vcc_lo
 ; GFX10-NEXT:    s_cbranch_execz .LBB4_6
 ; GFX10-NEXT:  ; %bb.1: ; %loop.start.preheader
-; GFX10-NEXT:    v_mov_b32_e32 v5, s5
-; GFX10-NEXT:    ; implicit-def: $sgpr6
-; GFX10-NEXT:    ; implicit-def: $sgpr8
+; GFX10-NEXT:    s_mov_b32 s4, 0
+; GFX10-NEXT:    ; implicit-def: $sgpr10
+; GFX10-NEXT:    ; implicit-def: $sgpr11
 ; GFX10-NEXT:    ; implicit-def: $sgpr9
-; GFX10-NEXT:    ; implicit-def: $sgpr7
 ; GFX10-NEXT:    s_branch .LBB4_3
 ; GFX10-NEXT:  .LBB4_2: ; %Flow
 ; GFX10-NEXT:    ; in Loop: Header=BB4_3 Depth=1
-; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s10
-; GFX10-NEXT:    s_xor_b32 s10, s9, -1
-; GFX10-NEXT:    s_and_b32 s11, exec_lo, s8
-; GFX10-NEXT:    s_or_b32 s5, s11, s5
-; GFX10-NEXT:    s_andn2_b32 s7, s7, exec_lo
-; GFX10-NEXT:    s_and_b32 s10, exec_lo, s10
-; GFX10-NEXT:    s_andn2_b32 s6, s6, exec_lo
-; GFX10-NEXT:    s_or_b32 s7, s7, s10
-; GFX10-NEXT:    s_and_b32 s10, exec_lo, s7
-; GFX10-NEXT:    s_or_b32 s6, s6, s10
-; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s5
+; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s5
+; GFX10-NEXT:    s_xor_b32 s5, s11, exec_lo
+; GFX10-NEXT:    s_and_b32 s12, exec_lo, s10
+; GFX10-NEXT:    s_or_b32 s8, s12, s8
+; GFX10-NEXT:    s_andn2_b32 s9, s9, exec_lo
+; GFX10-NEXT:    s_and_b32 s5, exec_lo, s5
+; GFX10-NEXT:    s_or_b32 s9, s9, s5
+; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s8
 ; GFX10-NEXT:    s_cbranch_execz .LBB4_5
 ; GFX10-NEXT:  .LBB4_3: ; %loop.start
 ; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX10-NEXT:    v_ashrrev_i32_e32 v6, 31, v5
-; GFX10-NEXT:    s_andn2_b32 s9, s9, exec_lo
-; GFX10-NEXT:    s_and_b32 s10, exec_lo, -1
-; GFX10-NEXT:    s_andn2_b32 s8, s8, exec_lo
-; GFX10-NEXT:    s_or_b32 s9, s9, s10
-; GFX10-NEXT:    v_lshlrev_b64 v[6:7], 2, v[5:6]
-; GFX10-NEXT:    s_or_b32 s8, s8, s10
-; GFX10-NEXT:    v_add_co_u32 v6, vcc_lo, v1, v6
-; GFX10-NEXT:    v_add_co_ci_u32_e32 v7, vcc_lo, v2, v7, vcc_lo
-; GFX10-NEXT:    global_load_dword v6, v[6:7], off
+; GFX10-NEXT:    s_ashr_i32 s5, s4, 31
+; GFX10-NEXT:    s_andn2_b32 s10, s10, exec_lo
+; GFX10-NEXT:    s_lshl_b64 s[12:13], s[4:5], 2
+; GFX10-NEXT:    s_andn2_b32 s5, s11, exec_lo
+; GFX10-NEXT:    v_mov_b32_e32 v5, s12
+; GFX10-NEXT:    v_mov_b32_e32 v6, s13
+; GFX10-NEXT:    s_and_b32 s11, exec_lo, exec_lo
+; GFX10-NEXT:    s_and_b32 s12, exec_lo, exec_lo
+; GFX10-NEXT:    s_or_b32 s11, s5, s11
+; GFX10-NEXT:    v_add_co_u32 v5, vcc_lo, v1, v5
+; GFX10-NEXT:    v_add_co_ci_u32_e32 v6, vcc_lo, v2, v6, vcc_lo
+; GFX10-NEXT:    s_or_b32 s10, s10, s12
+; GFX10-NEXT:    global_load_dword v5, v[5:6], off
 ; GFX10-NEXT:    s_waitcnt vmcnt(0)
-; GFX10-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v6
-; GFX10-NEXT:    s_and_saveexec_b32 s10, vcc_lo
+; GFX10-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v5
+; GFX10-NEXT:    s_and_saveexec_b32 s5, vcc_lo
 ; GFX10-NEXT:    s_cbranch_execz .LBB4_2
 ; GFX10-NEXT:  ; %bb.4: ; %loop.cond
 ; GFX10-NEXT:    ; in Loop: Header=BB4_3 Depth=1
-; GFX10-NEXT:    v_add_nc_u32_e32 v6, 1, v5
-; GFX10-NEXT:    v_cmp_lt_i32_e32 vcc_lo, v5, v0
-; GFX10-NEXT:    s_andn2_b32 s9, s9, exec_lo
-; GFX10-NEXT:    s_and_b32 s11, exec_lo, 0
-; GFX10-NEXT:    s_andn2_b32 s8, s8, exec_lo
-; GFX10-NEXT:    v_mov_b32_e32 v5, v6
-; GFX10-NEXT:    s_and_b32 s12, exec_lo, vcc_lo
-; GFX10-NEXT:    s_or_b32 s9, s9, s11
-; GFX10-NEXT:    s_or_b32 s8, s8, s12
+; GFX10-NEXT:    v_cmp_lt_i32_e32 vcc_lo, s4, v0
+; GFX10-NEXT:    s_andn2_b32 s11, s11, exec_lo
+; GFX10-NEXT:    s_and_b32 s12, exec_lo, 0
+; GFX10-NEXT:    s_andn2_b32 s10, s10, exec_lo
+; GFX10-NEXT:    s_add_i32 s4, s4, 1
+; GFX10-NEXT:    s_and_b32 s13, exec_lo, vcc_lo
+; GFX10-NEXT:    s_or_b32 s11, s11, s12
+; GFX10-NEXT:    s_or_b32 s10, s10, s13
 ; GFX10-NEXT:    s_branch .LBB4_2
 ; GFX10-NEXT:  .LBB4_5: ; %loop.exit.guard
-; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s5
-; GFX10-NEXT:    s_andn2_b32 s5, -1, exec_lo
-; GFX10-NEXT:    s_and_b32 s6, exec_lo, s6
-; GFX10-N...
[truncated]

petar-avramovic · 2025-08-12T13:01:49Z

Merge activity

Aug 12, 1:01 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Aug 12, 1:03 PM UTC: @petar-avramovic merged this pull request with Graphite.

AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select

8e9ae94

This was referenced Aug 12, 2025

AMDGPU/GlobalISel: Add regbanklegalize rules for ptr-add #153175

Merged

AMDGPU/GlobalISel: Import D16 load patterns and add combines for them #153178

Open

AMDGPU/GlobalISel: Add regbanklegalize rules for load and store #153176

Open

petar-avramovic requested review from arsenm, nhaehnle and Pierre-vh August 12, 2025 12:33

petar-avramovic marked this pull request as ready for review August 12, 2025 12:33

llvmbot added backend:AMDGPU llvm:globalisel labels Aug 12, 2025

arsenm approved these changes Aug 12, 2025

View reviewed changes

petar-avramovic merged commit f88be47 into main Aug 12, 2025
14 checks passed

petar-avramovic deleted the users/petar-avramovic/switch-tests branch August 12, 2025 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select #153174

AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select #153174

Uh oh!

petar-avramovic commented Aug 12, 2025

Uh oh!

petar-avramovic commented Aug 12, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 12, 2025 •

edited

Loading

Uh oh!

petar-avramovic commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select #153174

AMDGPU/GlobalISel: Switch a few tests to new-reg-bank-select #153174

Uh oh!

Conversation

petar-avramovic commented Aug 12, 2025

Uh oh!

petar-avramovic commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petar-avramovic commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Uh oh!

petar-avramovic commented Aug 12, 2025 •

edited

Loading

llvmbot commented Aug 12, 2025 •

edited

Loading

petar-avramovic commented Aug 12, 2025 •

edited

Loading