Refactor Blackwell #27537

johnnynunez · 2025-07-13T18:24:27Z

In CUDA 13:

10.0 is b100/b200 same for aarch64 (gb200)
10.3 is GB300
11.0 is Thor with new OpenRm driver (moves to SBSA)
12.0 is RTX/RTX PRO
12.1 is Spark GB10

Thor was moved from 10.1 to 11.0 and Spark is 12.1.
Related patch: pytorch/pytorch#156176

asmorkalov · 2025-07-14T08:01:45Z

cmake/OpenCVDetectCUDAUtils.cmake

@@ -109,7 +109,7 @@ macro(ocv_initialize_nvidia_device_generations)
  set(_arch_ampere   "8.0;8.6")
  set(_arch_lovelace "8.9")
  set(_arch_hopper   "9.0")
-  set(_arch_blackwell "10.0;12.0")
+  set(_arch_blackwell "10.0;10.3,11.0;12.0;12.1")


Hm.. each option here is yet another kernel build. Longer build and fatter binary. I would say, that we do not need all possible combinations, if the target is defined by the arch name. "12.0;12.1" most probably should be just 12.1. The same for 10.x variations, if pure 10.0 devices do not exist and 10.3 is minimal production version.

Hello,
In CUDA 13
10.0 is b100/b200 same for aarch64 (gb200)
10.3 is GB300
11.0 is Thor with new OpenRm driver (moves to SBSA)
12.0 is RTX/RTX PRO
12.1 is Spark GB10

feel free to change it or recommend me the most optimized codegen there

Orin is moving to sbsa (deletes old nvgpu) in Q1 2026. I don’t if it will mantain the same codegen

There is new option that is putting f 10.0f and takes optimizarion for all blackwell family

@johnnynunez Do you have any official documentation mentioning the new compute capabilities and the f option?

@johnnynunez Do you have any official documentation mentioning the new compute capabilities and the f option?

https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/

Hm.. each option here is yet another kernel build. Longer build and fatter binary. I would say, that we do not need all possible combinations, if the target is defined by the arch name. "12.0;12.1" most probably should be just 12.1. The same for 10.x variations, if pure 10.0 devices do not exist and 10.3 is minimal production version.

@asmorkalov @johnnynunez @tomoaki0705 Firstly I don't really know what the use case for specifying the CUDA architecture is? That said if someone explicitly selects it I don't see the issue in compiling for all architectures because the user should be aware of the implications of there choice.

I think where the size is an issue is when the CUDA_ARCH_BIN and CUDA_ARCH_PTX are not specified and all supported architectures from all generations are built. In a future PR we could relax this and just build major architectures (5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0 and 12.0), equivalent to the CMake all-major flag.

A better alternative might be to make the default CUDA_GENERATION=Auto which I think makes much more sense. If a user hasn't explicitley specified the architechture this is probably what they want. We could then have an additional option equivalent to the existing default (CUDA_GENERATION=All) or one for building compatibility with all major architectures (CUDA_GENERATION=All-Major)

asmorkalov · 2025-07-14T08:57:45Z

cc @cudawarped

cudawarped · 2025-08-08T08:39:34Z

In CUDA 13:

* 10.0 is b100/b200 same for aarch64 (gb200)

* 10.3 is GB300

* 11.0 is Thor with new OpenRm driver (moves to SBSA)

* 12.0 is RTX/RTX PRO

* 12.1 is Spark GB10

It looks like Nvidia have stuck to this. Output from nvcc from CUDA 13.0 below.

nvcc --list-gpu-code
sm_75
sm_80
sm_86
sm_87
sm_88
sm_89
sm_90
sm_100
sm_110
sm_103
sm_120
sm_121

johnnynunez · 2025-08-08T08:41:18Z

In CUDA 13:

* 10.0 is b100/b200 same for aarch64 (gb200)

* 10.3 is GB300

* 11.0 is Thor with new OpenRm driver (moves to SBSA)

* 12.0 is RTX/RTX PRO

* 12.1 is Spark GB10

It looks like Nvidia have stuck to this. Output from nvcc from CUDA 13.0 below.

nvcc --list-gpu-code
sm_75
sm_80
sm_86
sm_87
sm_88
sm_89
sm_90
sm_100
sm_110
sm_103
sm_120
sm_121

I joined nvidia this month… haha

cudawarped · 2025-08-08T09:04:11Z

I joined nvidia this month… haha

Congratulations 🥳

Update OpenCVDetectCUDAUtils.cmake

1bd0fc1

asmorkalov reviewed Jul 14, 2025

View reviewed changes

asmorkalov added category: build/install category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib labels Jul 14, 2025

asmorkalov self-assigned this Jul 14, 2025

asmorkalov added this to the 4.13.0 milestone Jul 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor Blackwell #27537

Refactor Blackwell #27537

johnnynunez commented Jul 13, 2025 •

edited by asmorkalov

Loading

Uh oh!

asmorkalov Jul 14, 2025

Uh oh!

johnnynunez Jul 14, 2025

Uh oh!

johnnynunez Jul 14, 2025

Uh oh!

johnnynunez Jul 14, 2025

Uh oh!

cudawarped Jul 14, 2025

Uh oh!

johnnynunez Jul 14, 2025

Uh oh!

cudawarped Jul 14, 2025 •

edited

Loading

Uh oh!

asmorkalov commented Jul 14, 2025

Uh oh!

cudawarped commented Aug 8, 2025

Uh oh!

johnnynunez commented Aug 8, 2025

Uh oh!

cudawarped commented Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

Refactor Blackwell #27537

Are you sure you want to change the base?

Refactor Blackwell #27537

Conversation

johnnynunez commented Jul 13, 2025 • edited by asmorkalov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

johnnynunez Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

johnnynunez Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

johnnynunez Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

cudawarped Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

johnnynunez Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

cudawarped Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Jul 14, 2025

Uh oh!

cudawarped commented Aug 8, 2025

Uh oh!

johnnynunez commented Aug 8, 2025

Uh oh!

cudawarped commented Aug 8, 2025

Uh oh!

Uh oh!

johnnynunez commented Jul 13, 2025 •

edited by asmorkalov

Loading

cudawarped Jul 14, 2025 •

edited

Loading