Add HostAllocator as the unified parent class #151431

guangyey · 2025-04-16T08:23:13Z

Stack from ghstack (oldest at bottom):

Motivation

This PR introduces a unified parent class HostAllocator with the following goals:

Enable backend-specific host allocator registration, including support for out-of-tree backends.
Provide a unified and extensible API surface for host memory management across all backends, especially accelerators.

The new interface includes:

at::getHostAllocator()->allocate
at::getHostAllocator()->empty_cache
at::getHostAllocator()->record_event
at::getHostAllocator()->get_stats
at::getHostAllocator()->reset_accumulated_stats
at::getHostAllocator()->reset_peak_stats

Additional Context

We plan to deprecate legacy APIs such as at::cuda::CachingHostAllocator_emptyCache and recommend users migrate to the new backend-specific API, for example:

at::getHostAllocator(at::kCUDA)->empty_cache();

This refactor will help standardize host memory management across devices and simplify backend integration in the future.
Another key improvement I am going to do is move the is_pinned functionality into the HostAllocator class, which enables centralized pinned memory verification through calls like at::getHostAllocator(at::kCUDA)->is_pinned(ptr).
Benefits include:

Consistent host memory handling across all device backends
Decouple pinned memory functionality with AcceleratorHooksInterface in a more modular way
Clearer separation between device memory allocation and pinned host memory management

This architecture makes the system more maintainable and extensible for future device support.

cc @albanD @EikanWang

pytorch-bot · 2025-04-16T08:23:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151431

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit aa2a8fd with merge base 107121d ():

NEW FAILURES - The following jobs have failed:

xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 1, 4, linux.idc.xpu) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_convolution4_dynamic_shapes_xpu
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 2, 4, linux.idc.xpu) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoXPU::test_comprehensive__chunk_cat_xpu_int32
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 3, 4, linux.idc.xpu) (gh)
inductor/test_torchinductor.py::GPUTests::test_convolution4_xpu
xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu) (gh)
higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_triton_kernel_native

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD · 2025-04-16T14:29:34Z

aten/src/ATen/core/CachingHostAllocator.h

@@ -343,7 +345,7 @@ struct CachingHostAllocatorImpl {
    TORCH_CHECK_NOT_IMPLEMENTED(false, "Not implemented for copy_data");
  }

-  HostStats getStats() {
+  HostStats get_stats() {


I'm not sure about renaming all these functions. CamelCase is pretty standard in our C++ codebase so I think it's best to avoid the churn of having to change call sites !

@guangyey It's not necessary to rename all these functions, right?

OK. I’ve changed the function names in CachingHostAllocatorImpl back to their original CamelCase style. However, since the APIs in HostAllocator will be public, I don’t want users to have to remember which APIs use CamelCase and which use snake_case.
To ensure a consistent and user-friendly API experience, I suggest we keep using snake_case for HostAllocator, aligning with the style already used in its parent Allocator. This would promote better consistency and usability for users. What do you think about?

albanD · 2025-04-16T14:33:47Z

aten/src/ATen/xpu/CachingHostAllocator.h


-TORCH_XPU_API bool CachingHostAllocator_recordEvent(
+inline TORCH_XPU_API bool CachingHostAllocator_recordEvent(


Why isn't this function taking a c10::Stream directly? I don't see it being used either within this repo?

This is a device-specific API would be deprecated in the next PR. We would like to recommend user to use a unified API at::getHostAllocator(kXPU).record_event(...) that accepts c10::Stream as input parameter.

It is used in torch-xpu-ops repo for CopyKernel to ensure non-blocking copy accuracy.

albanD · 2025-04-16T14:36:31Z

aten/src/ATen/cuda/CachingHostAllocator.h

@@ -18,25 +18,38 @@ namespace at::cuda {
 // call between host and device, and passed the corresponding context from the
 // allocation. This is currently invoked by at::native::copy_kernel_cuda.
 //
-TORCH_CUDA_CPP_API c10::Allocator* getCachingHostAllocator();
+inline TORCH_CUDA_CPP_API at::HostAllocator* getCachingHostAllocator() {


Why do we call this a "caching" host allocator in these APIs rather that just getHostAllocator?
From what I can tell, all access to this just need a host allocator..

Do you think we could simplify the naming here by removing the "caching" concept from the user APIs ?

The code change here is just for the backward compatibility. I plan to deprecate these device-specific APIs in this file in next PR. And recommend the user to new unified APIs to cover this.

albanD · 2025-04-16T14:38:26Z

aten/src/ATen/core/CachingHostAllocator.h

+    at::HostAllocator* allocator,
+    uint8_t priority = 0);
+
+TORCH_API at::HostAllocator* getHostAllocator(at::DeviceType device_type);


Do we really need a per-device lookup here?
Can we rely on the fact that there is a single accelerator right now?

Sounds reasonable. @guangyey the default accelerator is able to tell which device type should be now?

I think a per-device lookup design is safer.
From an implementation perspective, I recall that devices like MTIA or PrivateUser1 can co-exist with a CUDA-enabled binary, since getAccelerator() is resolved at runtime rather than at build time for them.
In practice, users can combine getAccelerator() with getHostAllocator() to enable single accelerator usage.

I guess it is, ok!

[ghstack-poisoned]

albanD

Ok ok!
Thanks for the clarifications!

albanD · 2025-04-17T15:41:53Z

aten/src/ATen/core/CachingHostAllocator.h

+    at::HostAllocator* allocator,
+    uint8_t priority = 0);
+
+TORCH_API at::HostAllocator* getHostAllocator(at::DeviceType device_type);


I guess it is, ok!

guangyey · 2025-04-18T02:36:19Z

@pytorchbot merge -i

guangyey · 2025-04-18T02:37:11Z

"These failures are irrelevant."

pytorchmergebot · 2025-04-18T02:38:17Z

Merge started

Your change will be merged while ignoring the following 4 checks: xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 2, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 3, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 1, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025.0-py3.9 / test (default, 4, 4, linux.idc.xpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation This PR aims to deprecate the host allocator legacy API and recommend users to use the unified API `getHostAllocator(device_type)` APIs, such as: ```cpp at::getHostAllocator(device_type)->allocate(...); at::getHostAllocator(device_type)->empty_cache(); at::getHostAllocator(device_type)->record_event(...); at::getHostAllocator(device_type)->get_stats(); at::getHostAllocator(device_type)->reset_accumulated_stats(); at::getHostAllocator(device_type)->reset_peak_stats(); ``` # Additional Context TODO: - [ ] Move is_pinned from `AcceleratorHookInterface` to `HostAllocator` - [ ] Deprecate `getPinnedMemoryAllocator` inside `AcceleratorHookInterface` and recommend using `getHostAllocator` instead. Pull Request resolved: #151437 Approved by: https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #151403, #151431

ghstack-source-id: cd02580 Pull Request resolved: pytorch/pytorch#151431

guangyey requested review from eqy, syed-ahmed, EikanWang and gujinghui as code owners April 16, 2025 08:23

guangyey mentioned this pull request Apr 16, 2025

Refine host caching allocator #151403

Closed

3 tasks

pytorchbot added the open source label Apr 16, 2025

guangyey added ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request release notes: cpp release notes category labels Apr 16, 2025

This was referenced Apr 16, 2025

Deprecate host allocator legacy APIs #151437

Closed

Add is_pinned to host allocator #151439

Open

guangyey added the module: accelerator Issues related to the shared accelerator API label Apr 16, 2025

guangyey added this to PyTorch Intel Apr 16, 2025

guangyey requested a review from albanD April 16, 2025 10:50

albanD reviewed Apr 16, 2025

View reviewed changes

guangyey added 4 commits April 16, 2025 15:56

Update

f276b3f

[ghstack-poisoned]

Update

4c7a268

[ghstack-poisoned]

Update

fcb0cfb

[ghstack-poisoned]

Update

9d2dce8

[ghstack-poisoned]

guangyey added 6 commits April 16, 2025 16:21

Update

a9f3425

[ghstack-poisoned]

Update

79f7f6f

[ghstack-poisoned]

Update

5c831fa

[ghstack-poisoned]

Update

bf24829

[ghstack-poisoned]

Update

0445633

[ghstack-poisoned]

Update

d0c3ded

[ghstack-poisoned]

guangyey requested a review from albanD April 17, 2025 06:56

guangyey mentioned this pull request Apr 17, 2025

[WIP] Deprecate getPinnedMemoryAllocator use getHostAllocator instead #151531

Draft

Update

aa2a8fd

[ghstack-poisoned]

albanD approved these changes Apr 17, 2025

View reviewed changes

guangyey moved this to Approved in PyTorch Intel Apr 18, 2025

pytorchmergebot added the merging label Apr 18, 2025

pytorchmergebot added the Merged label Apr 18, 2025

pytorchmergebot closed this in 33cfe30 Apr 18, 2025

github-project-automation bot moved this from Approved to Done in PyTorch Intel Apr 18, 2025

pytorchmergebot removed the merging label Apr 18, 2025

guangyey mentioned this pull request Apr 21, 2025

Ues the new added unified API for host allocator intel/torch-xpu-ops#1589

Closed

Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025

Add HostAllocator as the unified parent class

e85ea24

ghstack-source-id: cd02580 Pull Request resolved: pytorch/pytorch#151431

guangyey mentioned this pull request Apr 22, 2025

Add MPS support for getHostAllocator API #151913

Closed

github-actions bot deleted the gh/guangyey/137/head branch May 28, 2025 02:18

galv mentioned this pull request Aug 5, 2025

New at::HostAllocator interface prevents using more than one allocator implementation for a device type #159906

Open


		TORCH_XPU_API bool CachingHostAllocator_recordEvent(
		inline TORCH_XPU_API bool CachingHostAllocator_recordEvent(

Add HostAllocator as the unified parent class #151431

Add HostAllocator as the unified parent class #151431

Uh oh!

Conversation

guangyey commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Additional Context

Uh oh!

pytorch-bot bot commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151431

❌ 4 New Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guangyey Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guangyey Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guangyey commented Apr 18, 2025

Uh oh!

guangyey commented Apr 18, 2025

Uh oh!

pytorchmergebot commented Apr 18, 2025

Merge started

Uh oh!

Uh oh!

guangyey commented Apr 16, 2025 •

edited

Loading

pytorch-bot bot commented Apr 16, 2025 •

edited

Loading

guangyey Apr 17, 2025 •

edited

Loading

guangyey Apr 17, 2025 •

edited

Loading