Introduce a new API torch.accelerator.get_mem_info #156812

guangyey · 2025-06-25T11:18:11Z

Stack from ghstack (oldest at bottom):

Motivation

torch.cuda.mem_get_info and torch.xpu.mem_get_info are widely used in other popular repos, such as

This PR introduces a unified API torch.accelerator.get_memory_info to cover this scenario.

pytorch-bot · 2025-06-25T11:18:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156812

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit ab5c4b2 with merge base 178515d ():

NEW FAILURES - The following jobs have failed:

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 1, 6, linux.idc.xpu) (gh)
'test/test_transformers.py::TestSDPAXpuOnlyXPU::test_scaled_dot_product_fused_attention_mask_vs_math_fused_kernel1_float16_batch_size_4_n_head_32_q_size_32_kv_size_32_head_dim_128_mask_type_causal_train_False_xpu_float16'
xpu / win-vs2022-xpu-2025_0-py3 / build (gh)
ninja: build stopped: subcommand failed

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 4, 6, linux.idc.xpu) (gh) (similar failure)
'test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoXPU::test_comprehensive_nn_functional_interpolate_trilinear_xpu_float32'
xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 6, 6, linux.idc.xpu) (gh) (similar failure)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_xpu

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 3, 6, linux.idc.xpu) (gh) (trunk failure)
'test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoXPU::test_comprehensive_nn_functional_interpolate_trilinear_xpu_float64'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

albanD · 2025-07-21T20:29:56Z

Isn't this duplicated with other existing APIs?

guangyey · 2025-07-22T01:40:31Z

Unlike torch.accelerator.memory_stats family of APIs, which only collects memory statistics for the current process, torch.accelerator.get_memory_info reports a (free, total) pair for the current device. The reported free memory reflects usage across all processes, not just the current one.

[ghstack-poisoned]

Stonepia · 2025-07-24T08:24:43Z

Currently, in transformers code, there are CUDA hard-coded place like below:

        if device.type == "cuda":
            index = device.index if device.index is not None else torch.cuda.current_device()
            device_memory = torch.cuda.mem_get_info(index)[0]

Changing these code with another if xpu ... with the same logic would be too ugly, so if we have this unified API would be great.

[ghstack-poisoned]

guangyey · 2025-08-05T17:13:14Z

@albanD Do you think if introducing this API is reasonable.

[ghstack-poisoned]

guangyey requested review from eqy and syed-ahmed as code owners June 25, 2025 11:18

This was referenced Jun 25, 2025

Add DeviceAllocator as the base device allocator #138222

Closed

Add unified memory APIs for torch.accelerator #152932

Closed

Add UT for torch.accelerator memory-related API #155200

Closed

[WIP] Generalize device caching allocator #151298

Draft

guangyey changed the title ~~Introduce a new API torch.accelerator.get_mem_info~~ [WIP] Introduce a new API torch.accelerator.get_mem_info Jun 25, 2025

guangyey requested review from EikanWang and gujinghui as code owners June 25, 2025 11:30

pytorchbot added the open source label Jun 25, 2025

guangyey added 4 commits June 25, 2025 18:48

Update

95d0309

[ghstack-poisoned]

Update

ed3f309

[ghstack-poisoned]

Update

db70a7e

[ghstack-poisoned]

Update

4088c18

[ghstack-poisoned]

guangyey added the release notes: python_frontend python frontend release notes category label Jul 7, 2025

guangyey changed the title ~~[WIP] Introduce a new API torch.accelerator.get_mem_info~~ Introduce a new API torch.accelerator.get_mem_info Jul 7, 2025

guangyey added 2 commits July 7, 2025 16:13

Update

31b9279

[ghstack-poisoned]

Update

6d7cc6d

[ghstack-poisoned]

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 8, 2025

guangyey requested review from jeffdaily and jithunnair-amd as code owners July 8, 2025 10:03

guangyey added 8 commits July 8, 2025 16:22

Update

109f798

[ghstack-poisoned]

Update

e9467fb

[ghstack-poisoned]

Update

3cd42a4

[ghstack-poisoned]

Update

c9f6133

[ghstack-poisoned]

Update

bfe9db9

[ghstack-poisoned]

Update

4e7474b

[ghstack-poisoned]

Update

d950af0

[ghstack-poisoned]

Update

87ad13f

[ghstack-poisoned]

Update

ecfd210

[ghstack-poisoned]

guangyey requested a review from albanD July 21, 2025 02:33

Update

acd57ab

[ghstack-poisoned]

guangyey added 2 commits July 22, 2025 09:55

Update

6c537c7

[ghstack-poisoned]

Update

1e09ac1

[ghstack-poisoned]

Stonepia mentioned this pull request Jul 24, 2025

[XPU] Model get OOM when loading models huggingface/transformers#39627

Closed

guangyey added 2 commits August 1, 2025 21:49

Update

3485cfe

[ghstack-poisoned]

Update

3ed6a5e

[ghstack-poisoned]

guangyey added 2 commits August 6, 2025 00:36

Update

bb349e2

[ghstack-poisoned]

Update

ab5c4b2

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce a new API torch.accelerator.get_mem_info #156812

Introduce a new API torch.accelerator.get_mem_info #156812

guangyey commented Jun 25, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 25, 2025 •

edited

Loading

Uh oh!

albanD commented Jul 21, 2025

Uh oh!

guangyey commented Jul 22, 2025

Uh oh!

Stonepia commented Jul 24, 2025

Uh oh!

guangyey commented Aug 5, 2025

Uh oh!

Uh oh!

Introduce a new API torch.accelerator.get_mem_info #156812

Are you sure you want to change the base?

Introduce a new API torch.accelerator.get_mem_info #156812

Conversation

guangyey commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

pytorch-bot bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156812

❌ 2 New Failures, 3 Unrelated Failures

Uh oh!

albanD commented Jul 21, 2025

Uh oh!

guangyey commented Jul 22, 2025

Uh oh!

Stonepia commented Jul 24, 2025

Uh oh!

guangyey commented Aug 5, 2025

Uh oh!

Uh oh!

guangyey commented Jun 25, 2025 •

edited

Loading

pytorch-bot bot commented Jun 25, 2025 •

edited

Loading