Guard rocm_smi.h include with a header check #158771

dsashidh · 2025-07-21T19:50:17Z

This PR fixes a build issue when building with USE_ROCM enabled on systems that don't have the rocm_smi.h header installed.

The include of <rocm_smi/rocm_smi.h> in intrad_node_comm.cpp is now conditionally compiled based on a new CMake check. If the header's found, CMake defines HAS_ROCM_SMI which enables the include. Otherwise the file builds without it.

This makes the build more robust and compatible with different ROCm setups.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

pytorch-bot · 2025-07-21T19:50:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158771

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'

✅ No Failures

As of commit 07b5dbe with merge base 7a08755 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

dsashidh · 2025-07-21T19:50:42Z

@pytorchbot label "topic: not user facing"

jeffdaily · 2025-07-21T23:20:50Z

Isn't this header file inclusion necessary for the source file to compile successful? If not, then just remove the #include statement entirely. The rocm_smi.h header comes from the rocm-smi-lib apt package.

dsashidh · 2025-07-22T15:37:43Z

Isn't this header file inclusion necessary for the source file to compile successful? If not, then just remove the #include statement entirely. The rocm_smi.h header comes from the rocm-smi-lib apt package.

Hi @jeffdaily I saw the conversation between you and @trixirt. Based on that discussion intra_node__comm.cpp doesn't compile with USE_ROCM=ON unless both rocm_smi.h and librocm_smi64 are available. Since rocm-smi-lib isn't always isnstalled by default I kept the include but wrapped it in a check so it's only added when the header is present.

Also in my latest commit I updated the block that uses rsmi_init() and RSMI_STATUS_SUCCESS so that it's guarded with the same HAS_ROCM_SMI macro. The original version was causing a test failure when the header wasn't included. This way the code's only compiled if the header is available.

jeffdaily · 2025-07-22T22:20:27Z

Shouldn't we instead change the cmake procedure to hard fail if rocm-smi isn't installed? The compile might succeed with this PR when rocm-smi header is missing, but you're also unsafely turning a build issue into a runtime issue if someone tries to use the feature that needed rsmi_init() to work properly.

…issing

dsashidh · 2025-08-05T17:40:00Z

Shouldn't we instead change the cmake procedure to hard fail if rocm-smi isn't installed? The compile might succeed with this PR when rocm-smi header is missing, but you're also unsafely turning a build issue into a runtime issue if someone tries to use the feature that needed rsmi_init() to work properly.

Hi @jeffdaily , thanks for the earlier feedback. I went ahead and updated the PR to follow your recommendation.

Instead of conditionally compiling the #include <rocm_smi/rocm_smi.h> based on a CMake check, I've now moved that check up into the top-level CMakeLists.txt and changed it to fail hard during configuration if the header isn't present. This avoids the issue you pointed out.

I also switched from checking the USE_ROCM CMake variable to checking ENV{USE_ROCM} since the CMake variable wasn't defined early enough in the top-level CMakeLists.txt. Using the environment variables ensures the check runs reliably when I build with USE_ROCM=1.

Please let me know if this looks good or if you'd prefer the check be somewhere else. Thank you for your recommendations.

Add header check for rocm_smi.h to avoid build errors

ca54895

pytorch-bot bot added module: rocm AMD GPU support for Pytorch oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Jul 21, 2025

pytorch-bot bot added the topic: not user facing topic category label Jul 21, 2025

pytorchbot added the open source label Jul 21, 2025

Guard rsmi ussage with HAS_ROCM_SMI to avoid build errors

3e76a0f

soulitzer requested review from jeffdaily and malfet July 23, 2025 18:54

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 23, 2025

Move rocm_smi.h check to top-level CMakeLists.txt and fail early if m…

07b5dbe

…issing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guard rocm_smi.h include with a header check #158771

Guard rocm_smi.h include with a header check #158771

Uh oh!

dsashidh commented Jul 21, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jul 21, 2025 •

edited

Loading

Uh oh!

dsashidh commented Jul 21, 2025

Uh oh!

jeffdaily commented Jul 21, 2025

Uh oh!

dsashidh commented Jul 22, 2025

Uh oh!

jeffdaily commented Jul 22, 2025

Uh oh!

dsashidh commented Aug 5, 2025

Uh oh!

Uh oh!

Guard rocm_smi.h include with a header check #158771

Are you sure you want to change the base?

Guard rocm_smi.h include with a header check #158771

Uh oh!

Conversation

dsashidh commented Jul 21, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158771

❗ 1 Active SEVs

✅ No Failures

Uh oh!

dsashidh commented Jul 21, 2025

Uh oh!

jeffdaily commented Jul 21, 2025

Uh oh!

dsashidh commented Jul 22, 2025

Uh oh!

jeffdaily commented Jul 22, 2025

Uh oh!

dsashidh commented Aug 5, 2025

Uh oh!

Uh oh!

dsashidh commented Jul 21, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 21, 2025 •

edited

Loading