-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Refactor Blackwell #27537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.x
Are you sure you want to change the base?
Refactor Blackwell #27537
Conversation
@@ -109,7 +109,7 @@ macro(ocv_initialize_nvidia_device_generations) | |||
set(_arch_ampere "8.0;8.6") | |||
set(_arch_lovelace "8.9") | |||
set(_arch_hopper "9.0") | |||
set(_arch_blackwell "10.0;12.0") | |||
set(_arch_blackwell "10.0;10.3,11.0;12.0;12.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. each option here is yet another kernel build. Longer build and fatter binary. I would say, that we do not need all possible combinations, if the target is defined by the arch name. "12.0;12.1" most probably should be just 12.1. The same for 10.x variations, if pure 10.0 devices do not exist and 10.3 is minimal production version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
In CUDA 13
10.0 is b100/b200 same for aarch64 (gb200)
10.3 is GB300
11.0 is Thor with new OpenRm driver (moves to SBSA)
12.0 is RTX/RTX PRO
12.1 is Spark GB10
feel free to change it or recommend me the most optimized codegen there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orin is moving to sbsa (deletes old nvgpu) in Q1 2026. I don’t if it will mantain the same codegen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is new option that is putting f 10.0f and takes optimizarion for all blackwell family
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnnynunez Do you have any official documentation mentioning the new compute capabilities and the f option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnnynunez Do you have any official documentation mentioning the new compute capabilities and the f option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. each option here is yet another kernel build. Longer build and fatter binary. I would say, that we do not need all possible combinations, if the target is defined by the arch name. "12.0;12.1" most probably should be just 12.1. The same for 10.x variations, if pure 10.0 devices do not exist and 10.3 is minimal production version.
@asmorkalov @johnnynunez @tomoaki0705 Firstly I don't really know what the use case for specifying the CUDA architecture is? That said if someone explicitly selects it I don't see the issue in compiling for all architectures because the user should be aware of the implications of there choice.
I think where the size is an issue is when the CUDA_ARCH_BIN
and CUDA_ARCH_PTX
are not specified and all supported architectures from all generations are built. In a future PR we could relax this and just build major architectures (5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0 and 12.0), equivalent to the CMake all-major flag.
A better alternative might be to make the default CUDA_GENERATION=Auto
which I think makes much more sense. If a user hasn't explicitley specified the architechture this is probably what they want. We could then have an additional option equivalent to the existing default (CUDA_GENERATION=All
) or one for building compatibility with all major architectures (CUDA_GENERATION=All-Major
)
cc @cudawarped |
It looks like Nvidia have stuck to this. Output from nvcc from CUDA 13.0 below.
|
I joined nvidia this month… haha |
Congratulations 🥳 |
In CUDA 13:
Thor was moved from 10.1 to 11.0 and Spark is 12.1.
Related patch: pytorch/pytorch#156176