Add beginnings of torch::stable::accelerator #159679

mikaylagawarecki · 2025-08-01T21:37:22Z

Adds

torch::stable::accelerator::DeviceGuard: std::unique_ptr to DeviceGuardOpauqe mostly copied from the below (but made generic)

pytorch/torch/csrc/inductor/aoti_runtime/utils_cuda.h

Lines 30 to 46 in 50eac81

    
           class AOTICudaGuard { 
        
            public: 
        
             AOTICudaGuard(int32_t device_index) : guard_(nullptr, delete_cuda_guard) { 
        
               CUDAGuardHandle ptr = nullptr; 
        
               AOTI_TORCH_ERROR_CODE_CHECK( 
        
                   aoti_torch_create_cuda_guard(device_index, &ptr)); 
        
               guard_.reset(ptr); 
        
             } 
        
             void set_index(int32_t device_index) { 
        
               AOTI_TORCH_ERROR_CODE_CHECK( 
        
                   aoti_torch_cuda_guard_set_index(guard_.get(), device_index)); 
        
             } 
        
            private: 
        
             std::unique_ptr<CUDAGuardOpaque, DeleterFnPtr> guard_; 
        
           };

constructor DeviceGuard(DeviceIndex) (this matches aoti but defers from the actual c10 DeviceGuard constructor that takes in device)
set_index(DeviceIndex)

torch::stable::accelerator::Stream: std::shared_ptr to StreamOpaque
- constructor Stream(StreamHandle stream) (similar to torch::stable::Tensor)
- id() -> StreamId
getCurrentStream(DeviceIndex device_index) -> stable::accelerator::Stream

Stack from ghstack (oldest at bottom):

-> Add beginnings of torch::stable::accelerator #159679

[ghstack-poisoned]

pytorch-bot · 2025-08-01T21:37:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159679

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 43994c1 with merge base 556e2a7 ():

NEW FAILURES - The following jobs have failed:

inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_amx, 1, 2, linux.8xlarge.amx) (gh)
'Test'
inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx2, 1, 2, linux.10xlarge.avx2) (gh)
'Test'
pull / linux-jammy-py3.9-clang12 / test (default, 4, 5, linux.4xlarge) (gh)
'Test'
pull / linux-jammy-py3.9-gcc11 / test (default, 4, 5, linux.2xlarge) (gh)
'Test'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 1530b24 Pull Request resolved: #159679

github-actions · 2025-08-01T21:41:54Z

Attention! PyTorch one of the C-stable API file was changed

You MUST NOT change existing function declarations in this, as this header defines a stable C ABI. If you need to change the signature for a function, introduce a new v2 version of the function and modify code generation to target the new version of the function.

Caused by:

torch/csrc/inductor/aoti_torch/c/shim.h

[ghstack-poisoned]

ghstack-source-id: 2f2fa7b Pull Request resolved: #159679

[ghstack-poisoned]

ghstack-source-id: fae00fc Pull Request resolved: #159679

[ghstack-poisoned]

ghstack-source-id: 083b8bc Pull Request resolved: #159679

mikaylagawarecki · 2025-08-04T19:17:42Z

torch/csrc/stable/accelerator.h

+
+using DeviceIndex = int8_t;
+using StreamId = int64_t;
+class DeviceGuard {


something that I'm not sure about -- do the copy / move semantics need to match the real device guard although this just a wrapper that is holding a std::unique_ptr to DeviceGuardOpaque?

pytorch/c10/core/DeviceGuard.h

Lines 37 to 46 in 444e238

~DeviceGuard() = default;

/// Copy is disallowed

DeviceGuard(const DeviceGuard&) = delete;

DeviceGuard& operator=(const DeviceGuard&) = delete;

/// Move is disallowed, as DeviceGuard does not have an uninitialized state,

/// which is required for moves on types with nontrivial destructors.

DeviceGuard(DeviceGuard&& other) = delete;

DeviceGuard& operator=(DeviceGuard&& other) = delete;

It doesn't need to copy the existing DeviceGuard. For example, I agree with deleting the default constructor for this API. We can disallow copy and move if it's easiest to maintain anyway.

[ghstack-poisoned]

ghstack-source-id: afc07a8 Pull Request resolved: #159679

[ghstack-poisoned]

ghstack-source-id: 23b8be1 Pull Request resolved: #159679

Adds - `torch::stable::accelerator::DeviceGuard`: `std::unique_ptr` to `DeviceGuardOpauqe` copied from https://github.com/pytorch/pytorch/blob/50eac811a68e63e96ad56c11c983bfe298a0bb8a/torch/csrc/inductor/aoti_runtime/utils_cuda.h#L30-L46 - constructor `DeviceGuard(DeviceIndex)` (this matches aoti but defers from the actual c10 DeviceGuard constructor that takes in device - `set_index(DeviceIndex)` - `torch::stable::accelerator::Stream`: `std::shared_ptr` to `StreamOpaque` - constructor `Stream(StreamHandle stream)` (similar to torch::stable::Tensor) - `id() -> StreamId` - `getCurrentStream(DeviceIndex device_index) -> stable::accelerator::Stream` [ghstack-poisoned]

ghstack-source-id: 655868b Pull Request resolved: #159679

Adds - `torch::stable::accelerator::DeviceGuard`: `std::unique_ptr` to `DeviceGuardOpauqe` copied from https://github.com/pytorch/pytorch/blob/50eac811a68e63e96ad56c11c983bfe298a0bb8a/torch/csrc/inductor/aoti_runtime/utils_cuda.h#L30-L46 - constructor `DeviceGuard(DeviceIndex)` (this matches aoti but defers from the actual c10 DeviceGuard constructor that takes in device - `set_index(DeviceIndex)` - `torch::stable::accelerator::Stream`: `std::shared_ptr` to `StreamOpaque` - constructor `Stream(StreamHandle stream)` (similar to torch::stable::Tensor) - `id() -> StreamId` - `getCurrentStream(DeviceIndex device_index) -> stable::accelerator::Stream` [ghstack-poisoned]

ghstack-source-id: 94eaf15 Pull Request resolved: #159679

albanD · 2025-08-04T22:07:36Z

FYI @guangyey @EikanWang in case you have some feedback here.

Adds - `torch::stable::accelerator::DeviceGuard`: `std::unique_ptr` to `DeviceGuardOpauqe` copied from https://github.com/pytorch/pytorch/blob/50eac811a68e63e96ad56c11c983bfe298a0bb8a/torch/csrc/inductor/aoti_runtime/utils_cuda.h#L30-L46 - constructor `DeviceGuard(DeviceIndex)` (this matches aoti but defers from the actual c10 DeviceGuard constructor that takes in device - `set_index(DeviceIndex)` - `torch::stable::accelerator::Stream`: `std::shared_ptr` to `StreamOpaque` - constructor `Stream(StreamHandle stream)` (similar to torch::stable::Tensor) - `id() -> StreamId` - `getCurrentStream(DeviceIndex device_index) -> stable::accelerator::Stream` [ghstack-poisoned]

ghstack-source-id: 2ede272 Pull Request resolved: #159679

guangyey

Thanks.

test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/csrc/kernel.cpp

test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/ops.py

janeyx99 · 2025-08-06T18:42:59Z

test/cpp_extensions/libtorch_agnostic_extension/setup.py

+    if torch.cuda.is_available():
+        extra_compile_args["cxx"].append("-DUSE_CUDA")
+        extension = CUDAExtension
+


Hmmm maybe this would be a good call to move the CUDA stuff into its own C++ file and build them separately..

Hmmm, I don't see the issue with the current approach, so gonna keep it as is unless there's something specific you're concerned about! :)

janeyx99 · 2025-08-06T18:49:14Z

torch/csrc/inductor/aoti_torch/shim_common.cpp

@@ -1590,3 +1594,53 @@ AOTITorchError aoti_torch_call_dispatcher(
    }
  });
 }
+
+AOTITorchError aoti_torch_create_device_guard(


I'm guessing this code is inspired by https://github.com/pytorch/pytorch/blob/main/torch/csrc/inductor/aoti_torch/shim_cuda.cpp#L8?

janeyx99 · 2025-08-06T18:53:17Z

torch/csrc/inductor/aoti_torch/shim_common.cpp

+    c10::Stream* stream_ptr = new c10::Stream(stream);
+    *ret_stream = reinterpret_cast<StreamHandle>(stream_ptr);


Creating a new stream on the heap is gonna leak memory, we should be able to assign the stream in line 1642 into the pointer (tho idk the cast semantics).

Ahhh after reading the rest of the code, I see what the user flow is intended to be. That said, I'm not sure if we want the semantics of get_current_stream to create a new stream on the heap (this is likely not expected from the user POV), and if we do end up sticking with this, we need to loudly document this as.

I think we do not want to mess with the memory of the stream at all...cc @albanD for thoughts

Hmm looks like @albanD had no thoughts :b

I'm not sure what to do here, this one looks different from

pytorch/torch/csrc/inductor/aoti_torch/shim_cuda.cpp

Lines 49 to 55 in 5f5f508

AOTITorchError aoti_torch_get_current_cuda_stream(

int32_t device_index,

void** ret_stream) {

AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE({

*(cudaStream_t*)(ret_stream) = at::cuda::getCurrentCUDAStream(device_index);

});

}

As that is just directly returning the id. However, the actual getCurrentStream returns a c10::Stream, so I'm trying to match the semantic in the stable ABI (so in future users can add other stream methods on a stream object)

pytorch/aten/src/ATen/DeviceAccelerator.cpp

Lines 106 to 110 in 2ee22e4

c10::Stream getCurrentStream(c10::DeviceIndex device_index) {

const auto device_type = getAccelerator(true).value();

c10::impl::VirtualGuardImpl impl(device_type);

return impl.getStream({device_type, device_index});

}

I think at::acceerator::getCurrentStream is returning c10::Stream by value which is why we will need to create a new object on the heap if we want to return it to the caller and transfer ownership to the stable::Stream

I can add a comment for sure, but would be curious how else we can tweak this in a way that's more user/memory friendly, would it make more sense to just return the .id() directly?

test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/csrc/kernel.cpp

janeyx99 · 2025-08-06T19:00:24Z

torch/csrc/stable/accelerator.h

+
+using DeviceIndex = int8_t;
+using StreamId = int64_t;
+class DeviceGuard {


It doesn't need to copy the existing DeviceGuard. For example, I agree with deleting the default constructor for this API. We can disallow copy and move if it's easiest to maintain anyway.

janeyx99 · 2025-08-06T19:05:04Z

torch/csrc/stable/accelerator.h

+
+  // Construct a stable::Stream from a StreamHandle
+  // Steals ownership from the StreamHandle
+  explicit Stream(StreamHandle stream)


Ah, I see, so the expected use case is:

user calls getCurrentStream which creates a Stream

then they can call id on it

janeyx99 · 2025-08-06T19:10:31Z

Thanks for taking this on! I'm left comments, i think my two biggest questions are:

How are we sure DeviceGuard is working as expected haha (add test for set_index and I'm not sure how to add something to ensure we're not leaking memory)
How do we want to represent Stream and what is the story of its memory handling

test/cpp_extensions/libtorch_agnostic_extension/setup.py

albanD · 2025-08-07T21:16:56Z

test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/csrc/kernel.cpp

+  stack[0] = from(res);
+}
+
+int64_t test_stream(int8_t device_index) {


schema being int means the arg is int64_t

test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/csrc/kernel.cpp

albanD · 2025-08-07T21:22:02Z

torch/csrc/stable/accelerator.h

+
+} // namespace
+
+using DeviceIndex = int8_t; // this is from c10/core/Device.h


We should NOT rely on this one at the shim level.
For this part of the world, let's make it an int32_t for consistency with the existing shim.
We will have to change the tensor.h version.

albanD · 2025-08-07T21:27:02Z

test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/csrc/kernel.cpp

+
+  DeviceGuard guard(device_index);
+  int currentDevice;
+  cudaError_t err = cudaGetDevice(&currentDevice);


Check that before the guard you have another device than this one.
Because if there is only 1 gpu, I'm not sure this will ever do anything

The test that exercises this from python is decorated with @deviceCountAtLeast(2) so this situation won't happen I think :)

Adds - `torch::stable::accelerator::DeviceGuard`: `std::unique_ptr` to `DeviceGuardOpauqe` mostly copied from the below (but made generic) https://github.com/pytorch/pytorch/blob/50eac811a68e63e96ad56c11c983bfe298a0bb8a/torch/csrc/inductor/aoti_runtime/utils_cuda.h#L30-L46 - constructor `DeviceGuard(DeviceIndex)` (**this matches aoti but defers from the actual c10 DeviceGuard constructor that takes in device**) - `set_index(DeviceIndex)` - `torch::stable::accelerator::Stream`: `std::shared_ptr` to `StreamOpaque` - constructor `Stream(StreamHandle stream)` (similar to torch::stable::Tensor) - `id() -> StreamId` - `getCurrentStream(DeviceIndex device_index) -> stable::accelerator::Stream` [ghstack-poisoned]

ghstack-source-id: 02220eb Pull Request resolved: #159679

ghstack-source-id: 02220eb Pull Request resolved: pytorch#159679

Add beginning of torch::stable::accelerator

12655fc

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Aug 1, 2025

Add beginning of torch::stable::accelerator

86dae2c

ghstack-source-id: 1530b24 Pull Request resolved: #159679

pytorch-bot bot added ciflow/inductor release notes: inductor (aoti) labels Aug 1, 2025

Update on "Add beginning of torch::stable::accelerator"

117381f

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Aug 1, 2025

Add beginning of torch::stable::accelerator

57efaf8

ghstack-source-id: 2f2fa7b Pull Request resolved: #159679

mikaylagawarecki changed the title ~~Add beginning of torch::stable::accelerator~~ Add beginnings of torch::stable::accelerator Aug 1, 2025

Update on "Add beginnings of torch::stable::accelerator"

6f02310

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Aug 4, 2025

Add beginning of torch::stable::accelerator

b922dbd

ghstack-source-id: fae00fc Pull Request resolved: #159679

Update on "Add beginnings of torch::stable::accelerator"

0ba77fb

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Aug 4, 2025

Add beginning of torch::stable::accelerator

00db19b

ghstack-source-id: 083b8bc Pull Request resolved: #159679

mikaylagawarecki requested review from albanD and janeyx99 August 4, 2025 19:15

mikaylagawarecki commented Aug 4, 2025

View reviewed changes

Update on "Add beginnings of torch::stable::accelerator"

6d43751

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Aug 4, 2025

Add beginning of torch::stable::accelerator

bd2580c

ghstack-source-id: afc07a8 Pull Request resolved: #159679

Update on "Add beginnings of torch::stable::accelerator"

fb96494

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Aug 4, 2025

Add beginning of torch::stable::accelerator

4ed8bf6

ghstack-source-id: 23b8be1 Pull Request resolved: #159679

mikaylagawarecki added a commit that referenced this pull request Aug 4, 2025

Add beginning of torch::stable::accelerator

cdd46aa

ghstack-source-id: 655868b Pull Request resolved: #159679

mikaylagawarecki added a commit that referenced this pull request Aug 4, 2025

Add beginning of torch::stable::accelerator

ff22ecb

ghstack-source-id: 94eaf15 Pull Request resolved: #159679

mikaylagawarecki added a commit that referenced this pull request Aug 4, 2025

Add beginning of torch::stable::accelerator

92fff62

ghstack-source-id: 2ede272 Pull Request resolved: #159679

guangyey approved these changes Aug 5, 2025

View reviewed changes

mikaylagawarecki marked this pull request as ready for review August 5, 2025 17:31

janeyx99 reviewed Aug 6, 2025

View reviewed changes

albanD reviewed Aug 7, 2025

View reviewed changes

mikaylagawarecki added a commit that referenced this pull request Aug 11, 2025

Add beginning of torch::stable::accelerator

30b0d7e

ghstack-source-id: 02220eb Pull Request resolved: #159679

mikaylagawarecki requested review from janeyx99 and albanD August 11, 2025 19:16

mikaylagawarecki added a commit to mikaylagawarecki/pytorch that referenced this pull request Aug 11, 2025

Add beginning of torch::stable::accelerator

0c1814d

ghstack-source-id: 02220eb Pull Request resolved: pytorch#159679

	class AOTICudaGuard {
	public:
	AOTICudaGuard(int32_t device_index) : guard_(nullptr, delete_cuda_guard) {
	CUDAGuardHandle ptr = nullptr;
	AOTI_TORCH_ERROR_CODE_CHECK(
	aoti_torch_create_cuda_guard(device_index, &ptr));
	guard_.reset(ptr);
	}

	void set_index(int32_t device_index) {
	AOTI_TORCH_ERROR_CODE_CHECK(
	aoti_torch_cuda_guard_set_index(guard_.get(), device_index));
	}

	private:
	std::unique_ptr<CUDAGuardOpaque, DeleterFnPtr> guard_;
	};

	~DeviceGuard() = default;

	/// Copy is disallowed
	DeviceGuard(const DeviceGuard&) = delete;
	DeviceGuard& operator=(const DeviceGuard&) = delete;

	/// Move is disallowed, as DeviceGuard does not have an uninitialized state,
	/// which is required for moves on types with nontrivial destructors.
	DeviceGuard(DeviceGuard&& other) = delete;
	DeviceGuard& operator=(DeviceGuard&& other) = delete;

		c10::Stream* stream_ptr = new c10::Stream(stream);
		*ret_stream = reinterpret_cast<StreamHandle>(stream_ptr);

	AOTITorchError aoti_torch_get_current_cuda_stream(
	int32_t device_index,
	void** ret_stream) {
	AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE({
	(cudaStream_t)(ret_stream) = at::cuda::getCurrentCUDAStream(device_index);
	});
	}

	c10::Stream getCurrentStream(c10::DeviceIndex device_index) {
	const auto device_type = getAccelerator(true).value();
	c10::impl::VirtualGuardImpl impl(device_type);
	return impl.getStream({device_type, device_index});
	}


		} // namespace

		using DeviceIndex = int8_t; // this is from c10/core/Device.h

Add beginnings of torch::stable::accelerator #159679

Are you sure you want to change the base?

Add beginnings of torch::stable::accelerator #159679

Uh oh!

Conversation

mikaylagawarecki commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159679

❌ 4 New Failures

Uh oh!

github-actions bot commented Aug 1, 2025

Attention! PyTorch one of the C-stable API file was changed

Uh oh!

mikaylagawarecki Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD commented Aug 4, 2025

Uh oh!

guangyey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janeyx99 commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikaylagawarecki commented Aug 1, 2025 •

edited

Loading

pytorch-bot bot commented Aug 1, 2025 •

edited

Loading

mikaylagawarecki Aug 4, 2025 •

edited

Loading

mikaylagawarecki Aug 8, 2025 •

edited

Loading

mikaylagawarecki Aug 8, 2025 •

edited

Loading

janeyx99 commented Aug 6, 2025 •

edited

Loading