Skip to content

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Sep 4, 2025

My goal here was to eliminate the use of getGPRsForTailCall.
I don't think it makes sense to use here, considering this function
handles other non-tail call cases. It seems to have been used to
find a non-callee saved register in a roundabout way. We can be more
precise by letting LiveRegUnits figure out the CSR set for the current
function.

There are a few things I find confusing about this function; the API
isn't right. The callers ideally would be maintaining LiveRegUnits
in the context where they need the free register. It also doesn't
provide a RegisterClass to use. Also, this should work for any
block and this shouldn't need to special case this set of return
opcodes. I'm also not sure why there is a special case for RIP. It was introduced in
fe5e5dc but doesn't appear to
be tested.

There are a few codegen test changes. One of them I think is a correct
improvement since the old code didn't consider undef uses as available.
The others I think are just to allocation order changes, since we're now
using the broader GPR_*NOSP classes.

@arsenm arsenm added the backend:X86 label Sep 4, 2025 — with Graphite App
Copy link
Contributor Author

arsenm commented Sep 4, 2025

@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-backend-x86

Author: Matt Arsenault (arsenm)

Changes

My goal here was to eliminate the use of getGPRsForTailCall.
I don't think it makes sense to use here, considering this function
handles other non-tail call cases. It seems to have been used to
find a non-callee saved register in a roundabout way. We can be more
precise by letting LiveRegUnits figure out the CSR set for the current
function.

There are a few things I find confusing about this function; the API
isn't right. The callers ideally would be maintaining LiveRegUnits
in the context where they need the free register. It also doesn't
provide a RegisterClass to use. Also, this should work for any
block and this shouldn't need to special case this set of return
opcodes.

There are a few codegen test changes. One of them I think is a correct
improvement since the old code didn't consider undef uses as available.
The others I think are just to allocation order changes, since we're now
using the broader GPR_*NOSP classes.


Full diff: https://github.com/llvm/llvm-project/pull/156817.diff

5 Files Affected:

  • (modified) llvm/lib/Target/X86/X86RegisterInfo.cpp (+13-15)
  • (modified) llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/lvi-hardening-ret.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/pr40289-64bit.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr40289.ll (+1-1)
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.cpp b/llvm/lib/Target/X86/X86RegisterInfo.cpp
index b79e508df7c97..3f4955f28e68b 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.cpp
+++ b/llvm/lib/Target/X86/X86RegisterInfo.cpp
@@ -1002,8 +1002,6 @@ unsigned X86RegisterInfo::findDeadCallerSavedReg(
   if (MF->callsEHReturn())
     return 0;
 
-  const TargetRegisterClass &AvailableRegs = *getGPRsForTailCall(*MF);
-
   if (MBBI == MBB.end())
     return 0;
 
@@ -1025,20 +1023,20 @@ unsigned X86RegisterInfo::findDeadCallerSavedReg(
   case X86::TCRETURNmi64:
   case X86::EH_RETURN:
   case X86::EH_RETURN64: {
-    SmallSet<uint16_t, 8> Uses;
-    for (MachineOperand &MO : MBBI->operands()) {
-      if (!MO.isReg() || MO.isDef())
-        continue;
-      Register Reg = MO.getReg();
-      if (!Reg)
-        continue;
-      for (MCRegAliasIterator AI(Reg, this, true); AI.isValid(); ++AI)
-        Uses.insert(*AI);
+    LiveRegUnits LRU(*this);
+    LRU.addLiveOuts(MBB);
+    LRU.stepBackward(*MBBI);
+
+    // FIXME: Why do we need to special case this register? Is it missing from
+    // return implicit uses?
+    LRU.removeReg(X86::RIP);
+
+    const TargetRegisterClass &RC =
+        Is64Bit ? X86::GR64_NOSPRegClass : X86::GR32_NOSPRegClass;
+    for (MCRegister Reg : RC) {
+      if (LRU.available(Reg))
+        return Reg;
     }
-
-    for (auto CS : AvailableRegs)
-      if (!Uses.count(CS) && CS != X86::RIP && CS != X86::RSP && CS != X86::ESP)
-        return CS;
   }
   }
 
diff --git a/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
index ad24608d338ad..d6d4db3509103 100644
--- a/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
+++ b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
@@ -81,7 +81,7 @@ define i32 @csr6_alloc16(ptr %argv) {
 ; LIN-NEXT:    .cfi_def_cfa_offset 32
 ; LIN-NEXT:    pop2 %rbp, %r15
 ; LIN-NEXT:    .cfi_def_cfa_offset 16
-; LIN-NEXT:    popq %rcx
+; LIN-NEXT:    popq %rax
 ; LIN-NEXT:    .cfi_def_cfa_offset 8
 ; LIN-NEXT:    retq
 ;
@@ -116,7 +116,7 @@ define i32 @csr6_alloc16(ptr %argv) {
 ; LIN-PPX-NEXT:    .cfi_def_cfa_offset 32
 ; LIN-PPX-NEXT:    pop2p %rbp, %r15
 ; LIN-PPX-NEXT:    .cfi_def_cfa_offset 16
-; LIN-PPX-NEXT:    popq %rcx
+; LIN-PPX-NEXT:    popq %rax
 ; LIN-PPX-NEXT:    .cfi_def_cfa_offset 8
 ; LIN-PPX-NEXT:    retq
 ;
@@ -180,7 +180,7 @@ define i32 @csr6_alloc16(ptr %argv) {
 ; WIN-NEXT:    pop2 %rbp, %rbx
 ; WIN-NEXT:    pop2 %r13, %r12
 ; WIN-NEXT:    pop2 %r15, %r14
-; WIN-NEXT:    popq %rcx
+; WIN-NEXT:    popq %rax
 ; WIN-NEXT:    .seh_endepilogue
 ; WIN-NEXT:    retq
 ; WIN-NEXT:    .seh_endproc
@@ -211,7 +211,7 @@ define i32 @csr6_alloc16(ptr %argv) {
 ; WIN-PPX-NEXT:    pop2p %rbp, %rbx
 ; WIN-PPX-NEXT:    pop2p %r13, %r12
 ; WIN-PPX-NEXT:    pop2p %r15, %r14
-; WIN-PPX-NEXT:    popq %rcx
+; WIN-PPX-NEXT:    popq %rax
 ; WIN-PPX-NEXT:    .seh_endepilogue
 ; WIN-PPX-NEXT:    retq
 ; WIN-PPX-NEXT:    .seh_endproc
diff --git a/llvm/test/CodeGen/X86/lvi-hardening-ret.ll b/llvm/test/CodeGen/X86/lvi-hardening-ret.ll
index faa8bff8f0943..954985a3798b7 100644
--- a/llvm/test/CodeGen/X86/lvi-hardening-ret.ll
+++ b/llvm/test/CodeGen/X86/lvi-hardening-ret.ll
@@ -41,9 +41,9 @@ entry:
   %add = add nsw i32 %0, %1
   ret i32 %add
 ; CHECK-NOT:  retq
-; CHECK:      popq %rcx
+; CHECK:      popq %rsi
 ; CHECK-NEXT: lfence
-; CHECK-NEXT: jmpq *%rcx
+; CHECK-NEXT: jmpq *%rsi
 }
 
 ; Function Attrs: noinline nounwind optnone uwtable
@@ -52,9 +52,9 @@ define dso_local preserve_mostcc void @preserve_most() #0 {
 entry:
   ret void
 ; CHECK-NOT:  retq
-; CHECK:      popq %rax
+; CHECK:      popq %r11
 ; CHECK-NEXT: lfence
-; CHECK-NEXT: jmpq *%rax
+; CHECK-NEXT: jmpq *%r11
 }
 
 ; Function Attrs: noinline nounwind optnone uwtable
@@ -63,9 +63,9 @@ define dso_local preserve_allcc void @preserve_all() #0 {
 entry:
   ret void
 ; CHECK-NOT:  retq
-; CHECK:      popq %rax
+; CHECK:      popq %r11
 ; CHECK-NEXT: lfence
-; CHECK-NEXT: jmpq *%rax
+; CHECK-NEXT: jmpq *%r11
 }
 
 define { i64, i128 } @ret_i64_i128() #0 {
diff --git a/llvm/test/CodeGen/X86/pr40289-64bit.ll b/llvm/test/CodeGen/X86/pr40289-64bit.ll
index 58da5258a6703..a83674ac81998 100644
--- a/llvm/test/CodeGen/X86/pr40289-64bit.ll
+++ b/llvm/test/CodeGen/X86/pr40289-64bit.ll
@@ -6,5 +6,5 @@ define cc 92 < 9 x i64 > @clobber() {
   ret < 9 x i64 > undef
   ; CHECK-LABEL: clobber:
   ; CHECK-NOT: popq %rsp
-  ; CHECK: addq $8, %rsp
+  ; CHECK: pushq %rax
 }
diff --git a/llvm/test/CodeGen/X86/pr40289.ll b/llvm/test/CodeGen/X86/pr40289.ll
index 851b23c002bdb..21e50931b40f2 100644
--- a/llvm/test/CodeGen/X86/pr40289.ll
+++ b/llvm/test/CodeGen/X86/pr40289.ll
@@ -6,5 +6,5 @@ define < 3 x i32 > @clobber() {
   ret < 3 x i32 > undef
   ; CHECK-LABEL: clobber:
   ; CHECK-NOT: popl %esp
-  ; CHECK: addl $4, %esp
+  ; CHECK: popl %eax
 }

@arsenm arsenm marked this pull request as ready for review September 4, 2025 07:17
My goal here was to eliminate the use of getGPRsForTailCall.
I don't think it makes sense to use here, considering this function
handles other non-tail call cases. It seems to have been used to
find a non-callee saved register in a roundabout way. We can be more
precise by letting LiveRegUnits figure out the CSR set for the current
function.

There are a few things I find confusing about this function; the API
isn't right. The callers ideally would be maintaining LiveRegUnits
in the context where they need the free register. It also doesn't
provide a RegisterClass to use. Also, this should work for any
block and this shouldn't need to special case this set of return
opcodes.

There are a few codegen test changes. One of them I think is a correct
improvement since the old code didn't consider undef uses as available.
The others I think are just to allocation order changes, since we're now
using the broader GPR_*NOSP classes.
@arsenm arsenm force-pushed the users/arsenm/x86/use-liveregunits-findDeadCallerSavedReg branch from 73e2257 to c831a2a Compare September 4, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants