Add DeviceAllocator as the base device allocator #138222

guangyey · 2024-10-17T15:06:29Z

Stack from ghstack (oldest at bottom):

Motivation

In line with [RFC] A device-agnostic Python device memory related API design for stream-based accelerators, some memory-related APIs are widely used in popular repositories, such as HuggingFace so many if-else conditional code. We would like to introduce a generic API set under torch.accelerator namespace to generalize these user cases.

Device-specific memory APIs torch.xxx.foo	Device-agnostic memory APIs torch.accelerator.foo
torch.xxx.empty_cache	torch.accelerator.empty_cache
torch.xxx.reset_peak_memory_stats	torch.accelerator.reset_peak_memory_stats
torch.xxx.reset_accumulated_memory_stats	torch.accelerator.reset_accumulated_memory_stats
torch.xxx.memory_stats	torch.accelerator.memory_stats
torch.xxx.memory_allocated	torch.accelerator.memory_allocated
torch.xxx.max_memory_allocated	torch.accelerator.max_memory_allocated
torch.xxx.memory_reserved	torch.accelerator.memory_reserved
torch.xxx.max_memory_reserved	torch.accelerator.max_memory_reserved

Solution

This design follows a similar pattern to HostAllocator. We're introducing a base class DeviceAllocator, from which CUDAAllocator and XPUAllocator will inherit. This allows us to provide a unified call path like: torch.accelerator.empty_cache() -> GetDeviceAllocator(allocator)->empty_cache().

cc @albanD @EikanWang

pytorch-bot · 2024-10-17T15:06:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138222

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 5 Pending, 2 Unrelated Failures

As of commit 5521326 with merge base 178515d ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

s390x-periodic / linux-manylinux-2_28-py3-cpu-s390x / test (default, 6, 10, linux.s390x) (gh) (similar failure)
test_proxy_tensor.py::TestSymbolicTracing::test_constant_specialization
xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 6, 6, linux.idc.xpu) (gh) (similar failure)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_xpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 2634951 Pull Request resolved: #138222

[ghstack-poisoned]

github-actions · 2024-12-16T18:40:07Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

ghstack-source-id: f0a1690 Pull Request resolved: #138222

[ghstack-poisoned]

ghstack-source-id: 9660203 Pull Request resolved: #138222

ghstack-source-id: 76d81ec Pull Request resolved: #138222

ghstack-source-id: 07d1719 Pull Request resolved: #138222

ghstack-source-id: 0841241 Pull Request resolved: #138222

ghstack-source-id: ba52b22 Pull Request resolved: #138222

[ghstack-poisoned]

ghstack-source-id: 865b7cf Pull Request resolved: #138222

ghstack-source-id: 1c6bfe4 Pull Request resolved: #138222

[ghstack-poisoned]

ghstack-source-id: 4a2803a Pull Request resolved: #138222

guangyey · 2025-08-06T00:32:21Z

@pytorchbot merge -i

pytorchmergebot · 2025-08-06T00:34:19Z

Merge started

Your change will be merged while ignoring the following 6 checks: Check Labels / Check labels, Check mergeability of ghstack PR / ghstack-mergeability-check, pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable), rocm / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2), xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 2, 6, linux.idc.xpu), xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 5, 6, linux.idc.xpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[ghstack-poisoned]

# Motivation The following API will be put under torch.accelerator - empty_cache - max_memory_allocated - max_memory_reserved - memory_allocated - memory_reserved - memory_stats - reset_accumulated_memory_stats - reset_peak_memory_stats Pull Request resolved: #152932 Approved by: https://github.com/albanD ghstack dependencies: #138222

Pull Request resolved: #155200 Approved by: https://github.com/albanD ghstack dependencies: #138222, #152932

jithunnair-amd · 2025-08-07T16:32:41Z

@pytorchbot revert -c nosignal -m "Broke ROCm periodic runs on MI300 e.g. https://github.com/pytorch/pytorch/actions/runs/16764977800/job/47470050573"

cc @guangyey If this revert doesn't go through because it's part of a stack, please forward fix the issue.

pytorchmergebot · 2025-08-07T16:34:17Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 4604f04. Reverted #155200 on behalf of https://github.com/jithunnair-amd due to Broke ROCm periodic runs on MI300 e.g. https://github.com/pytorch/pytorch/actions/runs/16764977800/job/47470050573 ([comment](#138222 (comment)))

This reverts commit 15f1173. Reverted #152932 on behalf of https://github.com/jithunnair-amd due to Broke ROCm periodic runs on MI300 e.g. https://github.com/pytorch/pytorch/actions/runs/16764977800/job/47470050573 ([comment](#138222 (comment)))

This reverts commit f7a66da. Reverted #138222 on behalf of https://github.com/jithunnair-amd due to Broke ROCm periodic runs on MI300 e.g. https://github.com/pytorch/pytorch/actions/runs/16764977800/job/47470050573 ([comment](#138222 (comment)))

pytorchmergebot · 2025-08-07T16:34:47Z

@guangyey your PR has been successfully reverted.

guangyey · 2025-08-08T08:06:08Z

@jithunnair-amd I think the failure is not introduced by this PR, I see the same failure pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found in File "/var/lib/jenkins/pytorch/test/distributed/test_c10d_nccl.py", line 689, in test_extra_cuda_context in the previous commit.
Anyway, let's add the label ciflow/periodic-rocm-mi300 to see what CI says. And I will reland this PR once this CI pass.

guangyey · 2025-08-08T14:12:28Z

Hi @jithunnair-amd , the ciflow/periodic-rocm-mi300 has passed on this PR, see https://github.com/pytorch/pytorch/actions/runs/16825034603/job/47659415983?pr=138222
I would like to reland this PR again.

[ghstack-poisoned]

pytorchmergebot · 2025-08-08T17:40:33Z

Starting merge as part of PR stack under #155200

# Motivation The following API will be put under torch.accelerator - empty_cache - max_memory_allocated - max_memory_reserved - memory_allocated - memory_reserved - memory_stats - reset_accumulated_memory_stats - reset_peak_memory_stats Pull Request resolved: #152932 Approved by: https://github.com/albanD ghstack dependencies: #138222

Pull Request resolved: #155200 Approved by: https://github.com/albanD ghstack dependencies: #138222, #152932

# Motivation In line with [RFC] [A device-agnostic Python device memory related API design for stream-based accelerators](pytorch#134978), some memory-related APIs are widely used in popular repositories, such as HuggingFace [so many if-else conditional code](https://github.com/search?q=repo%3Ahuggingface%2Faccelerate%20torch.cuda.empty_cache&type=code). We would like to introduce a generic API set under torch.accelerator namespace to generalize these user cases. <div align="center"> <table> <tr> <td> Device-specific memory APIs torch.xxx.foo</td> <td> Device-agnostic memory APIs torch.accelerator.foo</td> </tr> <tr> <td> ```python torch.xxx.empty_cache ``` </td> <td> ```python torch.accelerator.empty_cache ``` </td> </tr> <tr> <td> ```python torch.xxx.reset_peak_memory_stats ``` </td> <td> ```python torch.accelerator.reset_peak_memory_stats ``` </td> </tr> <tr> <td> ```python torch.xxx.reset_accumulated_memory_stats ``` </td> <td> ```python torch.accelerator.reset_accumulated_memory_stats ``` </td> </tr> <tr> <td> ```python torch.xxx.memory_stats ``` </td> <td> ```python torch.accelerator.memory_stats ``` </td> </tr> <tr> <td> ```python torch.xxx.memory_allocated ``` </td> <td> ```python torch.accelerator.memory_allocated ``` </td> </tr> <tr> <td> ```python torch.xxx.max_memory_allocated ``` </td> <td> ```python torch.accelerator.max_memory_allocated ``` </td> </tr> <tr> <td> ```python torch.xxx.memory_reserved ``` </td> <td> ```python torch.accelerator.memory_reserved ``` </td> </tr> <tr> <td> ```python torch.xxx.max_memory_reserved ``` </td> <td> ```python torch.accelerator.max_memory_reserved ``` </td> </tr> </table> </div> # Solution This design follows a similar pattern to `HostAllocator`. We're introducing a base class `DeviceAllocator`, from which `CUDAAllocator` and `XPUAllocator` will inherit. This allows us to provide a unified call path like: `torch.accelerator.empty_cache()` -> `GetDeviceAllocator(allocator)->empty_cache()`. Pull Request resolved: pytorch#138222 Approved by: https://github.com/albanD, https://github.com/Camyll

# Motivation The following API will be put under torch.accelerator - empty_cache - max_memory_allocated - max_memory_reserved - memory_allocated - memory_reserved - memory_stats - reset_accumulated_memory_stats - reset_peak_memory_stats Pull Request resolved: pytorch#152932 Approved by: https://github.com/albanD ghstack dependencies: pytorch#138222

Pull Request resolved: pytorch#155200 Approved by: https://github.com/albanD ghstack dependencies: pytorch#138222, pytorch#152932

guangyey added a commit that referenced this pull request Oct 17, 2024

Add CachingDeviceAllocatorInterface as the base device allocator

a4bc082

ghstack-source-id: 2634951 Pull Request resolved: #138222

guangyey marked this pull request as draft October 17, 2024 15:06

guangyey changed the title ~~Add CachingDeviceAllocatorInterface as the base device allocator~~ [WIP] Add CachingDeviceAllocatorInterface as the base device allocator Oct 17, 2024

pytorchbot added the open source label Oct 17, 2024

Update

ceb0601

[ghstack-poisoned]

github-actions bot added the Stale label Dec 16, 2024

guangyey added the no-stale label Dec 18, 2024

guangyey added a commit that referenced this pull request Mar 4, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

e82e5a3

ghstack-source-id: f0a1690 Pull Request resolved: #138222

Update

76fe045

[ghstack-poisoned]

guangyey added topic: improvements topic category topic: not user facing topic category labels Mar 18, 2025

guangyey added a commit that referenced this pull request Mar 18, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

8b37b50

ghstack-source-id: 9660203 Pull Request resolved: #138222

guangyey mentioned this pull request Mar 18, 2025

Reuse format_size utils #149383

Closed

guangyey added a commit that referenced this pull request Mar 18, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

58338c6

ghstack-source-id: 76d81ec Pull Request resolved: #138222

guangyey added a commit that referenced this pull request Mar 18, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

42fa563

ghstack-source-id: 07d1719 Pull Request resolved: #138222

guangyey added a commit that referenced this pull request Mar 18, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

29d7597

ghstack-source-id: 0841241 Pull Request resolved: #138222

guangyey added a commit that referenced this pull request Mar 18, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

5add2d1

ghstack-source-id: ba52b22 Pull Request resolved: #138222

guangyey added 5 commits March 18, 2025 13:03

Update

d000cff

[ghstack-poisoned]

Update

c27a0c3

[ghstack-poisoned]

Update

37733da

[ghstack-poisoned]

Update

9d67fb3

[ghstack-poisoned]

Update

922288e

[ghstack-poisoned]

guangyey mentioned this pull request Mar 20, 2025

Introduce AcceleratorAllocatorConfig as the common class #149601

Closed

guangyey added a commit that referenced this pull request Mar 20, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

f06852e

ghstack-source-id: 865b7cf Pull Request resolved: #138222

guangyey added a commit that referenced this pull request Mar 20, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

0ead5dd

ghstack-source-id: 1c6bfe4 Pull Request resolved: #138222

guangyey added 2 commits March 20, 2025 13:24

Update

6d4ab7b

[ghstack-poisoned]

Update

ca80023

[ghstack-poisoned]

guangyey added a commit that referenced this pull request Mar 21, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

ef279cd

ghstack-source-id: 4a2803a Pull Request resolved: #138222

pytorchmergebot added the merging label Aug 6, 2025

Update

3d2dfda

[ghstack-poisoned]

pytorchmergebot closed this in f7a66da Aug 6, 2025

pytorchmergebot removed the merging label Aug 6, 2025

pytorchmergebot pushed a commit that referenced this pull request Aug 6, 2025

Add UT for torch.accelerator memory-related API (#155200)

4604f04

Pull Request resolved: #155200 Approved by: https://github.com/albanD ghstack dependencies: #138222, #152932

pytorchmergebot reopened this Aug 7, 2025

guangyey added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300 labels Aug 8, 2025

guangyey removed the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Aug 8, 2025

Update

5521326

[ghstack-poisoned]

pytorchmergebot closed this in d7114f0 Aug 8, 2025

pytorchmergebot pushed a commit that referenced this pull request Aug 8, 2025

Add UT for torch.accelerator memory-related API (#155200)

da1f608

Pull Request resolved: #155200 Approved by: https://github.com/albanD ghstack dependencies: #138222, #152932

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DeviceAllocator as the base device allocator #138222

Add DeviceAllocator as the base device allocator #138222

Uh oh!

guangyey commented Oct 17, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 17, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Dec 16, 2024

Uh oh!

guangyey commented Aug 6, 2025

Uh oh!

pytorchmergebot commented Aug 6, 2025

Uh oh!

jithunnair-amd commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Uh oh!

guangyey commented Aug 8, 2025

Uh oh!

guangyey commented Aug 8, 2025

Uh oh!

pytorchmergebot commented Aug 8, 2025

Uh oh!

Uh oh!

Add DeviceAllocator as the base device allocator #138222

Add DeviceAllocator as the base device allocator #138222

Uh oh!

Conversation

guangyey commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Uh oh!

pytorch-bot bot commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138222

⏳ 5 Pending, 2 Unrelated Failures

Uh oh!

github-actions bot commented Dec 16, 2024

Uh oh!

guangyey commented Aug 6, 2025

Uh oh!

pytorchmergebot commented Aug 6, 2025

Merge started

Uh oh!

jithunnair-amd commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Uh oh!

guangyey commented Aug 8, 2025

Uh oh!

guangyey commented Aug 8, 2025

Uh oh!

pytorchmergebot commented Aug 8, 2025

Uh oh!

Uh oh!

guangyey commented Oct 17, 2024 •

edited

Loading

pytorch-bot bot commented Oct 17, 2024 •

edited

Loading