-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Use device agnostic APIs for RNG #159021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use device agnostic APIs for RNG #159021
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159021
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a8ebf2c with merge base 4d5b3f2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates test files to use device-agnostic APIs for random number generator (RNG) seeding instead of hardcoded CUDA APIs, making the code more portable across different device types. It also adds proper type annotations to fix MYPY linting issues.
- Replace
torch.cuda.manual_seed()
withtorch.get_device_module(self.device_type).manual_seed()
- Add comprehensive type annotations to all test methods and helper functions
- Fix a missing manual seed initialization in the distribute region test
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
test/distributed/tensor/test_random_ops.py | Adds type annotations to all methods and adds device-agnostic manual seed call for distribute region test |
test/distributed/tensor/parallel/test_tp_random_state.py | Replaces CUDA-specific manual_seed with device-agnostic API and adds comprehensive type annotations |
@@ -130,6 +131,7 @@ def test_meta_tensor_init(self): | |||
) | |||
|
|||
# Test 2: disable the distribute region for RNG | |||
torch.get_device_module(self.device_type).manual_seed(self.rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change feels suspicious because it adds a new set seed where there previously wasn't one. Also it missed changing a cuda-specific seed setting in the same test further above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wconstab : //Also it missed changing a cuda-specific seed setting in the same test further above.
Fixed this with commit a8ebf2c
//This change feels suspicious because it adds a new set seed where there previously wasn't one
We observed that with the latest changes in RNG, uniform distribution was not achieved after setting random._rng_tracker.distribute_region_enabled = False . Setting the manual seed helped to mitigate that . We also verified this fix on CUDA devices to make sure that this is not breaking any functionality.
MOTIVATION
In order to make the test files more device agnostic, use device agnostic APIs to set the manual seed rather than hard coded CUDA APIs. We have also verified that this does not break any of the existing CUDA functionality.
CHANGES
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @EikanWang @kwen2501 @ankurneog