Skip to content

Conversation

justinfargnoli
Copy link
Contributor

@justinfargnoli justinfargnoli commented Aug 19, 2025

ToT lit currently assumes that a given ptxas version supports all capabilities of prior ptxas releases. This approach was flexible enough to support the removal of 32-bit address compilation from ptxas in CUDA 12.1, but it struggles with the removal of Volta and prior compilation in CUDA 13.0.

To deal with this, this PR refactors how lit defines the set of features available for a given ptxas version. It invokes ptxas not just to get its version, but also to get the list of supported SMs, supported PTX ISA versions, and support for 32-bit compilation.

This approach should be flexible enough to deal with the changing support matrix of ptxas as it goes forward. One obvious downside is that this relies on parsing the stdout of ptxas, something that's inherently unstable. But, IMO, this is something that we can fix as needed.

@justinfargnoli justinfargnoli requested a review from Copilot August 19, 2025 23:30
Copilot

This comment was marked as outdated.

Copy link

github-actions bot commented Aug 19, 2025

✅ With the latest revision this PR passed the Python code formatter.

@justinfargnoli justinfargnoli self-assigned this Aug 20, 2025
@justinfargnoli justinfargnoli requested a review from Copilot August 20, 2025 23:57
Copilot

This comment was marked as outdated.

@justinfargnoli justinfargnoli changed the title [lit] Refactor ptxas usage [lit] Refactor available ptxas features Aug 21, 2025
@justinfargnoli
Copy link
Contributor Author

Note: I've only tested this PR on public CUDA 13.0 and internal ToT ptxas.

Trying to see if I can trigger a build with the public build bot.

@justinfargnoli
Copy link
Contributor Author

Trying to see if I can trigger a build with the public build bot.

https://lab.llvm.org/buildbot/#/buildrequests/5249048?redirect_to_build=true

@justinfargnoli justinfargnoli marked this pull request as ready for review August 21, 2025 18:11
@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Justin Fargnoli (justinfargnoli)

Changes

ToT lit currently assumes that a given ptxas version supports all capabilities of prior ptxas releases. This approach was flexible enough to support the removal of 32-bit address compilation from ptxas in CUDA 12.1, but it struggles with the removal of Volta and prior compilation in CUDA 13.0.

To deal with this, this PR refactors how lit defines the set of features available for a given ptxas version. It invokes ptxas not just to get its version, but also to get the list of supported SMs, supported PTX ISA versions, and support for 32-bit compilation.

This approach should be flexible enough to deal with the changing support matrix of ptxas as it goes forward. One obvious downside is that this relies on parsing the stdout of ptxas, something that's inherently unstable. But, IMO, this is something that we can fix as needed.


Patch is 154.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154439.diff

172 Files Affected:

  • (modified) llvm/test/CodeGen/NVPTX/access-non-generic.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/activemask.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/addrspacecast.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/alias.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/annotations.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/applypriority.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/arithmetic-int.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/async-copy.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-sm60.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-sm70.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-sm90.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-with-scope.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/b52037.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/barrier.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/bf16-instructions.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/bmsk.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/bswap.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/byval-const-global.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/calling-conv.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cluster-dim.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm70.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm90.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/combine-mad.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/combine-min-max.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/common-linkage.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/compare-int.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-call-to-indirect.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-fp.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-int-sm20.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm100.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm100a.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm80.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm89.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm90.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-s2g-sm100.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-1cta.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-2cta.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-cta-sm100.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-cta-sm100a.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-cta-sm90.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-gather4.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-im2colw.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-im2colw128.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-prefetch-sm100a.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-prefetch.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-reduce.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-s2g-scatter4.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-s2g.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/discard.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/distributed-shared-cluster.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/dynamic_stackalloc.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/elect.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/f16-abs.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/f16-ex2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/f16-instructions.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/f16x2-instructions.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/f32-ex2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/f32-lg2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/f32x2-instructions.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/fabs-intrinsics.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fence-cluster.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fence-nocluster.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/fence-proxy-tensormap.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fexp2.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/flog2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fma-disable.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/fminimum-fmaximum.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fns.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fold-movs.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fp-contract-f32x2.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/global-addrspace.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/global-ordering.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/griddepcontrol.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/i16x2-instructions.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/idioms.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/indirect_byval.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test1.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test3.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/intrinsic-old.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/intrinsics-sm90.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/intrinsics.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/kernel-param-align.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ld-addrspace.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ld-generic.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ld-st-addrrspace.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ldg-invariant-256.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-256-addressing-invariant.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-256-addressing.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-scalars.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-sm-70.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-sm-90.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-vectors-256.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/local-stack-frame.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/managed.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/match.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm53-ptx42.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm80-ptx70-autoupgrade.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm80-ptx70.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm86-ptx72-autoupgrade.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm86-ptx72.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/mbarrier.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/nanosleep.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/nofunc.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/noreturn.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/packed-aggr.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/param-overalign.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/pr126337.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/pr13291-i1-store.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/prefetch-inferas-test.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/prefetch.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/reduction-intrinsics.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/redux-sync-f32.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/redux-sync.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/reg-types.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/setmaxnreg-sm100a.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/setmaxnreg.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/sext-setcc.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/shfl-sync-p.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/shfl-sync.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/short-ptr.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/simple-call.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/st-addrspace.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/st-generic.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/st-param-imm.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/st_bulk.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/stacksaverestore.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/surf-tex.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/symbol-naming.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/szext.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tanhf.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-alloc.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-commit.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-cp.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-fence.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-ld.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-shift.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-st.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/trunc-setcc.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/trunc-tofp.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/unreachable.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/vaargs.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/variadics-backend.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/vector-compare.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/vector-select.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/vote.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/weak-global.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/wgmma-sm90a-fence.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx60-sm70.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx63-sm72.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx63-sm75.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx64-sm70.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx65-sm75.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx71-sm80.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx78-sm90.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm100a.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm101a.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm120a.py (+1-1)
  • (modified) llvm/test/lit.cfg.py (+66-64)
diff --git a/llvm/test/CodeGen/NVPTX/access-non-generic.ll b/llvm/test/CodeGen/NVPTX/access-non-generic.ll
index 601a35288f54d..be8d00a10108a 100644
--- a/llvm/test/CodeGen/NVPTX/access-non-generic.ll
+++ b/llvm/test/CodeGen/NVPTX/access-non-generic.ll
@@ -2,7 +2,7 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s --check-prefix PTX
 ; RUN: opt -mtriple=nvptx-- < %s -S -passes=infer-address-spaces | FileCheck %s --check-prefix IR
 ; RUN: opt -mtriple=nvptx64-- < %s -S -passes=infer-address-spaces | FileCheck %s --check-prefix IR
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 
 @array = internal addrspace(3) global [10 x float] zeroinitializer, align 4
diff --git a/llvm/test/CodeGen/NVPTX/activemask.ll b/llvm/test/CodeGen/NVPTX/activemask.ll
index aa3c5819d7f91..96df904290e68 100644
--- a/llvm/test/CodeGen/NVPTX/activemask.ll
+++ b/llvm/test/CodeGen/NVPTX/activemask.ll
@@ -1,5 +1,5 @@
 ; RUN: llc < %s -mtriple=nvptx64 -O2 -mcpu=sm_52 -mattr=+ptx62 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_52 -mattr=+ptx62 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v6.2 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_52 -mattr=+ptx62 | %ptxas-verify %}
 
 declare i32 @llvm.nvvm.activemask()
 
diff --git a/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll b/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll
index 00b17896d2c9e..1b9dc6ab1e122 100644
--- a/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll
+++ b/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll
@@ -1,8 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s -check-prefixes=NOPTRCONV
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 --nvptx-short-ptr | FileCheck %s -check-prefixes=PTRCONV
-; RUN: %if ptxas-12.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
-; RUN: %if ptxas-12.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 --nvptx-short-ptr | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 --nvptx-short-ptr | %ptxas-verify -arch=sm_90 %}
 
 ; ALL-LABEL: conv_shared_cluster_to_generic
 define i32 @conv_shared_cluster_to_generic(ptr addrspace(7) %ptr) {
diff --git a/llvm/test/CodeGen/NVPTX/addrspacecast.ll b/llvm/test/CodeGen/NVPTX/addrspacecast.ll
index 86008a1b70058..546d22ca0e691 100644
--- a/llvm/test/CodeGen/NVPTX/addrspacecast.ll
+++ b/llvm/test/CodeGen/NVPTX/addrspacecast.ll
@@ -1,7 +1,7 @@
 ; RUN: llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,CLS32
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,NOPTRCONV,CLS64
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 --nvptx-short-ptr | FileCheck %s -check-prefixes=ALL,PTRCONV,CLS64
-; RUN: %if ptxas && !ptxas-12.0 %{ llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 --nvptx-short-ptr | %ptxas-verify %}
 
diff --git a/llvm/test/CodeGen/NVPTX/alias.ll b/llvm/test/CodeGen/NVPTX/alias.ll
index 01761c21ab103..4b22df225e33f 100644
--- a/llvm/test/CodeGen/NVPTX/alias.ll
+++ b/llvm/test/CodeGen/NVPTX/alias.ll
@@ -1,5 +1,5 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx64 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx64 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v6.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx64 | %ptxas-verify %}
 
 define i32 @a() { ret i32 0 }
 @b = internal alias i32 (), ptr @a
diff --git a/llvm/test/CodeGen/NVPTX/annotations.ll b/llvm/test/CodeGen/NVPTX/annotations.ll
index 5360e8988777b..e4aa0552e8420 100644
--- a/llvm/test/CodeGen/NVPTX/annotations.ll
+++ b/llvm/test/CodeGen/NVPTX/annotations.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 
 @texture = internal addrspace(1) global i64 0, align 8
diff --git a/llvm/test/CodeGen/NVPTX/applypriority.ll b/llvm/test/CodeGen/NVPTX/applypriority.ll
index 23b1bda9a32bf..9ac52c8bfe188 100644
--- a/llvm/test/CodeGen/NVPTX/applypriority.ll
+++ b/llvm/test/CodeGen/NVPTX/applypriority.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| FileCheck --check-prefixes=CHECK-PTX64 %s
-; RUN: %if ptxas-11.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| %ptxas-verify -arch=sm_80 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll b/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll
index ce71d3a78c0de..e88d0396f0858 100644
--- a/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll
+++ b/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 -fp-contract=fast | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -fp-contract=fast | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 -fp-contract=fast | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 -fp-contract=fast | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -fp-contract=fast | %ptxas-verify %}
 
 ;; These tests should run for all targets
diff --git a/llvm/test/CodeGen/NVPTX/arithmetic-int.ll b/llvm/test/CodeGen/NVPTX/arithmetic-int.ll
index 1fbfd0a987d7a..9e41e9e240902 100644
--- a/llvm/test/CodeGen/NVPTX/arithmetic-int.ll
+++ b/llvm/test/CodeGen/NVPTX/arithmetic-int.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 
 ;; These tests should run for all targets
diff --git a/llvm/test/CodeGen/NVPTX/async-copy.ll b/llvm/test/CodeGen/NVPTX/async-copy.ll
index cefb8ede9fa58..97b6b5d4d0097 100644
--- a/llvm/test/CodeGen/NVPTX/async-copy.ll
+++ b/llvm/test/CodeGen/NVPTX/async-copy.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_80 -mattr=+ptx70 | FileCheck -check-prefixes=CHECK,CHECK_PTX32 %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | FileCheck -check-prefixes=CHECK,CHECK_PTX64 %s
-; RUN: %if ptxas-11.0 && ! ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.0 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
 
 declare void @llvm.nvvm.cp.async.wait.group(i32)
 
diff --git a/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll b/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll
index 94b3f0a2e1c3e..88fae7a3f78a0 100644
--- a/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll
+++ b/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_30 | FileCheck %s --check-prefixes=ALL,SM30
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s --check-prefixes=ALL,SM60
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 | %ptxas-verify %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
 
 ; CHECK-LABEL: fadd_double
 define void @fadd_double(ptr %0, double %1) {
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm60.ll b/llvm/test/CodeGen/NVPTX/atomics-sm60.ll
index 2e11323d1b3e1..5a7a1823cb2a0 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm60.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm60.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_60 | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
 
 ; CHECK-LABEL: .func test(
 define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, double %d) {
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
index f710d7f883a1b..e1a69d2e3db20 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
@@ -2,9 +2,9 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx63 | FileCheck %s --check-prefixes=CHECK
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | FileCheck %s --check-prefixes=CHECK64
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx62 | FileCheck %s --check-prefixes=CHECKPTX62
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx62 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v6.3 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v6.3 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v6.2 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx62 | %ptxas-verify -arch=sm_70 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
index f96fd30019025..79e12025ba614 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
@@ -2,9 +2,9 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s --check-prefixes=CHECK
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s --check-prefixes=CHECK64
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_86 -mattr=+ptx71 | FileCheck %s --check-prefixes=CHECKPTX71
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_86 -mattr=+ptx71 | %ptxas-verify -arch=sm_86 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_86 && ptxas-isa-v7.1 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_86 -mattr=+ptx71 | %ptxas-verify -arch=sm_86 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll b/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll
index e6636d706b49d..9e30519b31cc3 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_60 | FileCheck %s -check-prefixes=CHECK,CHECK32
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
 
 ; CHECK-LABEL: .func test_atomics_scope(
 define void @test_atomics_scope(ptr %fp, float %f,
diff --git a/llvm/test/CodeGen/NVPTX/b52037.ll b/llvm/test/CodeGen/NVPTX/b52037.ll
index b6317dfb28597..268a8972ebd22 100644
--- a/llvm/test/CodeGen/NVPTX/b52037.ll
+++ b/llvm/test/CodeGen/NVPTX/b52037.ll
@@ -4,7 +4,7 @@
 ; https://bugs.llvm.org/show_bug.cgi?id=52037 for the gory details.
 ;
 ; RUN: llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - %s | FileCheck %s
-; RUN: %if ptxas %{ llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - %s | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 %{ llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - %s | %ptxas-verify -arch=sm_70 %}
 
 ; CHECK-LABEL: .visible .entry barney(
 ; CHECK-NOT:  .local{{.*}}__local_depot
diff --git a/llvm/test/CodeGen/NVPTX/barrier.ll b/llvm/test/CodeGen/NVPTX/barrier.ll
index a3b0d21f098f2..f209bdd0cfae7 100644
--- a/llvm/test/CodeGen/NVPTX/barrier.ll
+++ b/llvm/test/CodeGen/NVPTX/barrier.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx60 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx60 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v6.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx60 | %ptxas-verify %}
 
 declare void @llvm.nvvm.bar.warp.sync(i32)
 declare void @llvm.nvvm.barrier.cta.sync.aligned.all(i32)
diff --git a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
index aee58a044a986..835e09b9a38e0 100644
--- a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
@@ -3,9 +3,9 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | FileCheck --check-prefixes=CHECK,SM80 %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 -denormal-fp-math-f32=preserve-sign | FileCheck --check-prefixes=CHECK,SM80-FTZ %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck --check-prefixes=CHECK,SM90 %s
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 -denormal-fp-math-f32=preserve-sign | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 -denormal-fp-math-f32=preserve-sign | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll b/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll
index e1d4ef1073a78..60a5abf03e19f 100644
--- a/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | FileCheck --check-prefixes=CHECK %s
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
 
 target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
 
diff --git a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
index b4641d01eb927..4ea8ffc727b56 100644
--- a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
@@ -1,8 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | FileCheck --check-prefixes=CHECK,SM80 %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck --check-prefixes=CHECK,SM90 %s
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
 
 target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
 
diff --git a/llvm/test/CodeGen/NVPTX/bmsk.ll b/llvm/test/CodeGen/NVPTX/bmsk.ll
index d5b278657bd52..b9404f2a160cd 100644
--- a/llvm/test/CodeGen/NVPTX/bmsk.ll
+++ b/llvm/test/CodeGen/NVPTX/bmsk.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -o - < %s -mcpu=sm_70 -mattr=+ptx76 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mcpu=sm_70 -mattr=+ptx76 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v7.6 %{ llc < %s -mcpu=sm_70 -mattr=+ptx76 | %ptxas-verify -arch=sm_70 %}
 
 target triple = "nvptx64-unknown-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/bswap.ll b/llvm/test/CodeGen/NVPTX/bswap.ll
index 0d1d6da4ba2b6..b913b9a03c553 100644
--- a/llvm/test/CodeGen/NVPTX/bswap.ll
+++ b/llvm/test/CodeGen/NVPTX/bswap.ll
@@ -1,9 +1,9 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx70 | FileCheck -check-prefixes CHECK,PTX70 %s
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
-; RUN: %if ptxas-11.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx70 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v7.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx70 | %ptxas-verify %}
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx71 | FileCheck -check-prefixes CHECK,PTX71 %s
-; RUN: %if ptxas-11.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx71 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx71 | %ptxas-verify %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll b/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll
index 9988d5b122cc1..e7f71f4ad52ea 100644
--- a/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll
+++ b/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mcpu=sm_70 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mcpu=sm_70 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 %{ llc < %s -mcpu=sm_70 | %ptxas-verify -arch=sm_70 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/byval-const-global.ll b/llvm/test/CodeGen/NVPTX/byval-const-global.ll
index b4934e1a94d1b..81e7edfd8602e 100644
--- a/llvm/test/CodeGen/NVPTX/byval-const-global.ll
+++ b/llvm/test/CodeGen/NVPTX/byval-const-global.ll
@@ -1,6 +1,6 @@
...
[truncated]

Copy link
Member

@AlexMaclean AlexMaclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very useful. I've personally run afoul of issues with versions of ptxas not working with a given configuration with %ptxas-verify several times. I would consider tweaking the naming of these a little bit. I'd recommend that we remove the "v" in the ISA features and rename "32" -> "ptr32". I'd also lean towards using () or {} to represent these (i.e. ptxas(sm_90), ptxas(isa-8.4)) but that is more a matter of personal taste so if you or others disagree I'm happy to leave as is.

@justinfargnoli
Copy link
Contributor Author

I'd also lean towards using () or {} to represent these (i.e. ptxas(sm_90), ptxas(isa-8.4)) but that is more a matter of personal taste so if you or others disagree I'm happy to leave as is.

It looks like lit doesn't like the use of (), {}, or <>, so I went back to using -.

@justinfargnoli justinfargnoli enabled auto-merge (squash) August 26, 2025 19:46
@AlexMaclean AlexMaclean disabled auto-merge August 26, 2025 19:49
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors how lit determines available ptxas features by directly querying ptxas capabilities instead of assuming backward compatibility. The refactoring addresses limitations with the previous approach when CUDA removes support for older architectures.

Key changes:

  • Replace version-based feature detection with dynamic capability querying
  • Query ptxas for supported SM architectures, PTX ISA versions, and 32-bit compilation support
  • Update test conditions across hundreds of test files to use specific capability checks

Reviewed Changes

Copilot reviewed 174 out of 174 changed files in this pull request and generated no comments.

File Description
llvm/test/lit.cfg.py Core refactoring from version-based to capability-based ptxas feature detection
llvm/test/CodeGen/NVPTX/*.ll Updated test conditions to use new capability-based features like ptxas-sm_XX and ptxas-isa-X.Y
Comments suppressed due to low confidence (4)

@justinfargnoli justinfargnoli requested a review from Artem-B August 26, 2025 20:48
Copy link
Member

@AlexMaclean AlexMaclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM

@justinfargnoli justinfargnoli merged commit d77cf57 into llvm:main Aug 28, 2025
9 checks passed
@vvereschaka
Copy link
Contributor

Hi @justinfargnoli ,

https://lab.llvm.org/buildbot/#/builders/54/builds/12220

llvm-lit.py: C:\buildbot\as-builder-8\llvm-nvptx-nvidia-win\llvm-project\llvm\utils\lit\lit\llvm\config.py:57: note: using lit tools: C:\Program Files\Git\usr\bin
llvm-lit.py: C:\buildbot\as-builder-8\llvm-nvptx-nvidia-win\llvm-project\llvm\utils\lit\lit\TestingConfig.py:157: fatal: unable to parse config file 'C:\\buildbot\\as-builder-8\\llvm-nvptx-nvidia-win\\llvm-project\\llvm\\test/lit.cfg.py', traceback: Traceback (most recent call last):
  File "C:\buildbot\as-builder-8\llvm-nvptx-nvidia-win\llvm-project\llvm\utils\lit\lit\TestingConfig.py", line 145, in load_from_path
    exec(compile(data, path, "exec"), cfg_globals, None)
  File "C:\buildbot\as-builder-8\llvm-nvptx-nvidia-win\llvm-project\llvm\test/lit.cfg.py", line 438, in <module>
    enable_ptxas(ptxas_executable)
  File "C:\buildbot\as-builder-8\llvm-nvptx-nvidia-win\llvm-project\llvm\test/lit.cfg.py", line 430, in enable_ptxas
    if ptxas_supports_address_size_32(ptxas_executable):
  File "C:\buildbot\as-builder-8\llvm-nvptx-nvidia-win\llvm-project\llvm\test/lit.cfg.py", line 407, in ptxas_supports_address_size_32
    raise RuntimeError(f"Unexpected ptxas output: {result.stderr}")
RuntimeError: Unexpected ptxas output: 

same for https://lab.llvm.org/buildbot/#/builders/155

would you take care of it?

@justinfargnoli
Copy link
Contributor Author

Will do

@justinfargnoli
Copy link
Contributor Author

@vvereschaka I put up #155912 as a fix forward the issue. However, if that doesn't work for whatever reason, I also opened #155914 to revert the change.

@vvereschaka
Copy link
Contributor

@justinfargnoli got it, thank you. I'm checking #155912 currently. I'll let you know about the results.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 28, 2025

LLVM Buildbot has detected a new failure on builder llvm-nvptx64-nvidia-ubuntu running on as-builder-7 while building llvm at step 6 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/160/builds/23833

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: DebugInfo/NVPTX/debug-ptx-symbols.ll' FAILED ********************
Exit Code: 255

Command Output (stderr):
--
/home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/build/bin/llc < /home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll -mcpu=sm_60 | /home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/build/bin/FileCheck /home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll # RUN: at line 1
+ /home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/build/bin/llc -mcpu=sm_60
+ /home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/build/bin/FileCheck /home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll
/home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/build/bin/llc < /home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/llvm-project/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll -mcpu=sm_60 | /usr/local/cuda/bin/ptxas -c - # RUN: at line 2
+ /home/buildbot/worker/as-builder-7/llvm-nvptx64-nvidia-ubuntu/build/bin/llc -mcpu=sm_60
+ /usr/local/cuda/bin/ptxas -c -
ptxas fatal   : SM version specified by .target is higher than default SM version assumed

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 28, 2025

LLVM Buildbot has detected a new failure on builder llvm-nvptx-nvidia-ubuntu running on as-builder-7 while building llvm at step 6 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/180/builds/23977

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: DebugInfo/NVPTX/debug-ptx-symbols.ll' FAILED ********************
Exit Code: 255

Command Output (stderr):
--
/home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/build/bin/llc < /home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/llvm-project/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll -mcpu=sm_60 | /home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/build/bin/FileCheck /home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/llvm-project/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll # RUN: at line 1
+ /home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/build/bin/llc -mcpu=sm_60
+ /home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/build/bin/FileCheck /home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/llvm-project/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll
/home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/build/bin/llc < /home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/llvm-project/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll -mcpu=sm_60 | /usr/local/cuda/bin/ptxas -c - # RUN: at line 2
+ /home/buildbot/worker/as-builder-7/llvm-nvptx-nvidia-ubuntu/build/bin/llc -mcpu=sm_60
+ /usr/local/cuda/bin/ptxas -c -
ptxas fatal   : SM version specified by .target is higher than default SM version assumed

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants