Skip to content

Add is_pinned to host allocator #151439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: gh/guangyey/139/base
Choose a base branch
from

Conversation

guangyey
Copy link
Collaborator

@guangyey guangyey commented Apr 16, 2025

Stack from ghstack (oldest at bottom):

Motivation

This PR aims to add the is_pinned functionality into the HostAllocator class, which enables centralized pinned memory verification through calls like at::getHostAllocator(at::kCUDA)->is_pinned(ptr).
Benefits include:

  • Consistent host memory handling across all device backends
  • Group similar functionalities together to enhance code modularity.

This architecture makes the system more maintainable and extensible for future device support.

Solution

This PR introduces a new pair of API init and is_initialized in HostAllocator to avoid early runtime call in is_pinned.

Additional Context

It's difficult to deprecate isPinnedPtr in AcceleratorHooksInterface because some backends (such as mps, hpu, privateuser1) may not register their own host allocator using the REGISTER_HOST_ALLOCATOR mechanism, which was introduced in #151431.

Copy link

pytorch-bot bot commented Apr 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151439

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9553383 with merge base 94da452 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangyey added a commit that referenced this pull request Apr 16, 2025
ghstack-source-id: 05370d5
Pull Request resolved: #151439
@guangyey guangyey changed the title Move is_pinned to host allocator [WIP] Move is_pinned to host allocator Apr 16, 2025
@guangyey guangyey added ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request module: cpp Related to C++ API release notes: cpp release notes category and removed module: cpp Related to C++ API labels Apr 16, 2025
bool is_pinned(void* ptr) override {
// First check if driver is broken/missing, in which case PyTorch CPU
// functionalities should still work, we should report `false` here.
if (!at::cuda::is_available()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this call itself cause driver initialization?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will call cudaDeviceCount that will initialize driver runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check for something like is_initialized? Because if driver is not initialized, ptr is obviously not pinned?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will try to improve it.

Copy link
Collaborator Author

@guangyey guangyey Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet The logic is already handled at

if (!init_[static_cast<int8_t>(opt_device_type.value())].test_once()) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet May I know if I have addressed your comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet I introduce one pair API init&is_initialized to imporve the functionality of is_pinned like

  bool is_pinned(const void* ptr) const override {
    if (!is_initialized()) {
      // If the host allocator is not initialized, we assume that the pointer
      // isn't pinned to avoid early runtime call.
      return false;
    }
    return impl_->is_pinned(ptr);
  }

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey guangyey added the ciflow/rocm Trigger "default" config CI on ROCm label Apr 22, 2025
Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025
ghstack-source-id: ebb95ba
Pull Request resolved: pytorch/pytorch#151439
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@guangyey guangyey changed the title [WIP] Move is_pinned to host allocator Move is_pinned to host allocator Apr 24, 2025
@guangyey guangyey requested a review from albanD April 24, 2025 08:59
[ghstack-poisoned]
@guangyey guangyey changed the title Move is_pinned to host allocator Add is_pinned to host allocator Apr 29, 2025
guangyey added a commit that referenced this pull request Apr 29, 2025
ghstack-source-id: ee38ba6
Pull Request resolved: #151439
@guangyey guangyey changed the title Add is_pinned to host allocator [WIP] Add is_pinned to host allocator Apr 29, 2025
@guangyey guangyey changed the title [WIP] Add is_pinned to host allocator Add is_pinned to host allocator Apr 29, 2025
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
guangyey added a commit that referenced this pull request May 6, 2025
ghstack-source-id: 66caae7
Pull Request resolved: #151439
[ghstack-poisoned]
guangyey added 4 commits June 11, 2025 11:19
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Aug 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/mps Run MPS tests (subset of trunk) ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks open source release notes: cpp release notes category Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants