Skip to content

Commit 6b4b3e2

Browse files
committed
[AMDGPU] SIRemoveShortExecBranches should not remove branches exiting loops
Summary: Check that a s_cbranch_execz is not a loop exit before removing it. As the pass is generating infinite loops. Reviewers: cdevadas, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits, dstuttard, foad Tags: #llvm Differential Revision: https://reviews.llvm.org/D72997
1 parent e53a9d9 commit 6b4b3e2

File tree

3 files changed

+7
-6
lines changed

3 files changed

+7
-6
lines changed

llvm/lib/Target/AMDGPU/SIRemoveShortExecBranches.cpp

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -88,10 +88,9 @@ bool SIRemoveShortExecBranches::mustRetainExeczBranch(
8888
for (MachineBasicBlock::const_iterator I = MBB.begin(), E = MBB.end();
8989
I != E; ++I) {
9090
// When a uniform loop is inside non-uniform control flow, the branch
91-
// leaving the loop might be an S_CBRANCH_VCCNZ, which is never taken
92-
// when EXEC = 0. We should skip the loop lest it becomes infinite.
93-
if (I->getOpcode() == AMDGPU::S_CBRANCH_VCCNZ ||
94-
I->getOpcode() == AMDGPU::S_CBRANCH_VCCZ)
91+
// leaving the loop might never be taken when EXEC = 0.
92+
// Hence we should retain cbranch out of the loop lest it become infinite.
93+
if (I->isConditionalBranch())
9594
return true;
9695

9796
if (TII->hasUnwantedEffectsWhenEXECEmpty(*I))

llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ define amdgpu_ps void @main(i32, float) {
3232
; CHECK-NEXT: s_and_b64 s[8:9], s[8:9], exec
3333
; CHECK-NEXT: s_or_b64 s[4:5], s[4:5], s[8:9]
3434
; CHECK-NEXT: s_andn2_b64 exec, exec, s[2:3]
35+
; CHECK-NEXT: s_cbranch_execz BB0_6
3536
; CHECK-NEXT: BB0_3: ; %loop
3637
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
3738
; CHECK-NEXT: s_or_b64 s[6:7], s[6:7], exec
@@ -49,7 +50,7 @@ define amdgpu_ps void @main(i32, float) {
4950
; CHECK-NEXT: s_add_i32 s0, s0, 1
5051
; CHECK-NEXT: s_xor_b64 s[6:7], exec, -1
5152
; CHECK-NEXT: s_branch BB0_1
52-
; CHECK-NEXT: ; %bb.6: ; %Flow2
53+
; CHECK-NEXT: BB0_6: ; %Flow2
5354
; CHECK-NEXT: s_or_b64 exec, exec, s[2:3]
5455
; CHECK-NEXT: v_mov_b32_e32 v1, 0
5556
; CHECK-NEXT: s_and_saveexec_b64 s[0:1], s[4:5]

llvm/test/CodeGen/AMDGPU/valu-i1.ll

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,15 @@ declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
1313
; SI-NEXT: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, 0
1414
; SI-NEXT: s_and_saveexec_b64 [[SAVE1:s\[[0-9]+:[0-9]+\]]], vcc
1515
; SI-NEXT: s_xor_b64 [[SAVE2:s\[[0-9]+:[0-9]+\]]], exec, [[SAVE1]]
16+
; SI-NEXT: s_cbranch_execz [[FLOW_BB:BB[0-9]+_[0-9]+]]
1617

1718
; SI-NEXT: ; %bb.{{[0-9]+}}: ; %LeafBlock3
1819
; SI: s_mov_b64 s[{{[0-9]:[0-9]}}], -1
1920
; SI: s_and_saveexec_b64
2021
; SI-NEXT: s_cbranch_execnz
2122

2223
; v_mov should be after exec modification
23-
; SI: ; %bb.{{[0-9]+}}:
24+
; SI: [[FLOW_BB]]:
2425
; SI-NEXT: s_or_saveexec_b64 [[SAVE3:s\[[0-9]+:[0-9]+\]]], [[SAVE2]]
2526
; SI-NEXT: s_xor_b64 exec, exec, [[SAVE3]]
2627
;

0 commit comments

Comments
 (0)