-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[SymmMem] Install NVSHMEM wheel in CI docker #157411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/kwen2501/188/base
Are you sure you want to change the base?
Conversation
@@ -135,3 +135,6 @@ do | |||
esac | |||
shift | |||
done | |||
|
|||
# Install NVSHMEM wheel which is a build-time dependency for torch since 2.8 | |||
python3 -mpip install nvidia-nvshmem-cu12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, why don't we build it like other nvidia libraries and use the pip bundle instead?
Closing in favor of #157453 |
@kwen2501 Actually looks like we need more CMake changes are our current build is broken, rip on the older hardware. RIP |
…atest for 2.8RC (#157453) Fixed our bad builds of nvshmem, (we were not building or testing before) and also updates to the latest version. Newest versions has critical support for things that would actually make it useful, like bfloat16 and float16 support. This is a proper fix for: #157411 Pull Request resolved: #157453 Approved by: https://github.com/kwen2501, https://github.com/atalman
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
…atest for 2.8RC (#157453) Fixed our bad builds of nvshmem, (we were not building or testing before) and also updates to the latest version. Newest versions has critical support for things that would actually make it useful, like bfloat16 and float16 support. This is a proper fix for: #157411 Pull Request resolved: #157453 Approved by: https://github.com/kwen2501, https://github.com/atalman (cherry picked from commit a6fab82)
Stack from ghstack (oldest at bottom):
2.8 RC1 and nightly build did not compile with NVSHMEM, because the build environment on CI machine does not have NVSHMEM installed.
This PR pip installs NVSHMEM wheel in the docker of CI.
Also add
nvidia-nvshmem-cu12
toPYTORCH_EXTRA_INSTALL_REQUIREMENTS
of CUDA 12.9.