Skip to content

Introduce a new API torch.accelerator.get_mem_info #156812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: gh/guangyey/163/base
Choose a base branch
from

Conversation

@guangyey guangyey requested review from eqy and syed-ahmed as code owners June 25, 2025 11:18
Copy link

pytorch-bot bot commented Jun 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156812

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit ab5c4b2 with merge base 178515d (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@guangyey guangyey changed the title Introduce a new API torch.accelerator.get_mem_info [WIP] Introduce a new API torch.accelerator.get_mem_info Jun 25, 2025
guangyey added 4 commits June 25, 2025 18:48
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey guangyey added the release notes: python_frontend python frontend release notes category label Jul 7, 2025
@guangyey guangyey changed the title [WIP] Introduce a new API torch.accelerator.get_mem_info Introduce a new API torch.accelerator.get_mem_info Jul 7, 2025
guangyey added 2 commits July 7, 2025 16:13
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Jul 8, 2025
guangyey added 8 commits July 8, 2025 16:22
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey guangyey requested a review from albanD July 21, 2025 02:33
[ghstack-poisoned]
@albanD
Copy link
Collaborator

albanD commented Jul 21, 2025

Isn't this duplicated with other existing APIs?

@guangyey
Copy link
Collaborator Author

Unlike torch.accelerator.memory_stats family of APIs, which only collects memory statistics for the current process, torch.accelerator.get_memory_info reports a (free, total) pair for the current device. The reported free memory reflects usage across all processes, not just the current one.

guangyey added 2 commits July 22, 2025 09:55
[ghstack-poisoned]
[ghstack-poisoned]
@Stonepia
Copy link
Contributor

Currently, in transformers code, there are CUDA hard-coded place like below:

        if device.type == "cuda":
            index = device.index if device.index is not None else torch.cuda.current_device()
            device_memory = torch.cuda.mem_get_info(index)[0]

Changing these code with another if xpu ... with the same logic would be too ugly, so if we have this unified API would be great.

[ghstack-poisoned]
[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

guangyey commented Aug 5, 2025

@albanD Do you think if introducing this API is reasonable.

[ghstack-poisoned]
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/xpu Run XPU CI tasks open source release notes: python_frontend python frontend release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants