Skip to content

Enable sample nightly PT2 benchmark on B200 #158011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Jul 10, 2025

Signed-off-by: Huy Do <huydhn@gmail.com>
Copy link

pytorch-bot bot commented Jul 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158011

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 25 Pending

As of commit 1fe577b with merge base 4175453 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: releng release notes category label Jul 10, 2025
@nWEIdia
Copy link
Collaborator

nWEIdia commented Jul 10, 2025

nvidia-ml-py==11.525.84
has pinned nvidia-ml-py package.

Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn huydhn requested a review from malfet July 11, 2025 17:24
@huydhn huydhn marked this pull request as ready for review July 11, 2025 17:24
@huydhn huydhn requested a review from a team as a code owner July 11, 2025 17:24
Copy link
Collaborator

@nWEIdia nWEIdia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great work! I see the job https://github.com/pytorch/pytorch/actions/runs/16211997928/job/45775083219 was successful after running a lot of benchmark tests.

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Huy Do <huydhn@gmail.com>
huydhn added 3 commits July 11, 2025 22:50
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@nWEIdia
Copy link
Collaborator

nWEIdia commented Jul 29, 2025

@huydhn Have you had a chance to look into why the torchbench still took 12+ hours?

huydhn added 6 commits July 29, 2025 15:57
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn huydhn changed the title Enable nightly PT2 benchmark on B200 Enable sample nightly PT2 benchmark on B200 Aug 1, 2025
@huydhn
Copy link
Contributor Author

huydhn commented Aug 1, 2025

@pytorchbot merge -f 'Unblock _linux_test workflow on b200`

Per the discussion with @nWEIdia, I will go ahead and land this change because it unblocks _linux_test workflow on b200. The TorchBench benchmark job itself still doesn't work https://github.com/pytorch/pytorch/actions/runs/16615101382/job/47008262828, but at least it runs (and times out). I will deal with this in another PR.

Copy link

pytorch-bot bot commented Aug 1, 2025

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

@huydhn
Copy link
Contributor Author

huydhn commented Aug 1, 2025

@pytorchbot merge -f 'Unblock _linux_test workflow on b200'

Per the discussion with @nWEIdia, I will go ahead and land this change because it unblocks _linux_test workflow on b200. The TorchBench benchmark job itself still doesn't work https://github.com/pytorch/pytorch/actions/runs/16615101382/job/47008262828, but at least it runs (and times out). I will deal with this in another PR.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

facebook-github-bot pushed a commit to pytorch/benchmark that referenced this pull request Aug 2, 2025
Summary:
Per the discussion with nWEIdia, this resumes the work on pytorch/pytorch#157870 to enable PT2 benchmark on B200

### Testing

https://github.com/pytorch/pytorch/actions/runs/16615101382

X-link: pytorch/pytorch#158011
Approved by: https://github.com/nWEIdia, https://github.com/atalman

Reviewed By: yangw-dev

Differential Revision: D79490413

fbshipit-source-id: 0202249d62bd3011a1e60925b2f8442ddfdcbff7
@nWEIdia nWEIdia mentioned this pull request Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants