Skip to content

Use device agnostic APIs for RNG #159021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Use device agnostic APIs for RNG #159021

wants to merge 1 commit into from

Conversation

amathewc
Copy link
Contributor

@amathewc amathewc commented Jul 24, 2025

MOTIVATION

In order to make the test files more device agnostic, use device agnostic APIs to set the manual seed rather than hard coded CUDA APIs. We have also verified that this does not break any of the existing CUDA functionality.

CHANGES

  • In test_tp_random_state.py , use torch.get_device_module(self.device_type).manual_seed(dp_rank) instead of torch.cuda.manual_seed(dp_rank)
  • In test_random_ops.py, use torch.get_device_module(self.device_type).manual_seed(self.rank) to initialize manual seed so as to properly acheive uniform distribution when random._rng_tracker.distribute_region_enabled is set to False.
  • Fix Lint (MYPY) issues. Properly annotate all functions with return type.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @EikanWang @kwen2501 @ankurneog

Copy link

pytorch-bot bot commented Jul 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159021

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d9f0ed4 with merge base 6834911 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Jul 24, 2025
@EikanWang EikanWang requested review from Copilot and XilunWu July 28, 2025 03:50
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates test files to use device-agnostic APIs for random number generator (RNG) seeding instead of hardcoded CUDA APIs, making the code more portable across different device types. It also adds proper type annotations to fix MYPY linting issues.

  • Replace torch.cuda.manual_seed() with torch.get_device_module(self.device_type).manual_seed()
  • Add comprehensive type annotations to all test methods and helper functions
  • Fix a missing manual seed initialization in the distribute region test

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
test/distributed/tensor/test_random_ops.py Adds type annotations to all methods and adds device-agnostic manual seed call for distribute region test
test/distributed/tensor/parallel/test_tp_random_state.py Replaces CUDA-specific manual_seed with device-agnostic API and adds comprehensive type annotations

@EikanWang EikanWang requested a review from kwen2501 July 28, 2025 03:52
The manual_seed call was earlier placed before dtesnor.uniform_().
Setting the seed before creatring the tensor to ensure reproducible
initialization.

Signed-off-by: Aby Mathew C <aby.mathew.c@intel.com>
@amathewc
Copy link
Contributor Author

@kwen2501 @XilunWu : Could you help in merging this ?

@amathewc
Copy link
Contributor Author

amathewc commented Aug 6, 2025

@kwen2501 @XilunWu : Please help with the merging

@@ -130,6 +131,7 @@ def test_meta_tensor_init(self):
)

# Test 2: disable the distribute region for RNG
torch.get_device_module(self.device_type).manual_seed(self.rank)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change feels suspicious because it adds a new set seed where there previously wasn't one. Also it missed changing a cuda-specific seed setting in the same test further above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: distributed Add this issue/PR to distributed oncall triage queue open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants