CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) #13895

Yangxiaoz · 2025-05-29T16:39:17Z

add "integrated" in ggml_cuda_device_info for distinguish whether it is Intergrate_gpu or discrete_gpu
Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature

As mentioned in #13856, some logical adjustments may be required for Integrate GPU in cuda

The adjustment is based on the following : https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#memory-types-table

…it is Intergrate_gpu or discrete_gpu 2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature

Yangxiaoz · 2025-05-29T16:46:53Z

Hi @JohannesGaessler .Based on your helpful suggestions from our last discussion, I've updated the logic. Would you mind reviewing this revised version to confirm if it meets the requirements? Thanks for your time and advice.

ggml/src/ggml-cuda/ggml-cuda.cu

JohannesGaessler · 2025-05-29T18:32:38Z

Did you confirm that the code works correctly both when GGML_CUDA_ENABLE_UNIFIED_MEMORY is and isn't set?

Adjusted code indentation Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Fixed incorrect setting of variable types Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Adjusted the judgment logic Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

hjc4869 · 2025-05-30T03:32:22Z

Should we check something like canMapHostMemory / canUseHostPointerForRegisteredMem in addition to integrated before letting the GPU directly access buffer allocated by cudaMallocHost? The integrated flag doesn't seem to cover all the prerequisites of such access.

Yangxiaoz · 2025-05-30T03:58:31Z

Thank you very much for your professional modification of my code.As this is my first contribution, I apologize for any oversight on my part. @JohannesGaessler

Did you confirm that the code works correctly both when GGML_CUDA_ENABLE_UNIFIED_MEMORY is and isn't set?

Regarding the point you raised, I believe toggling GGML_CUDA_ENABLE_UNIFIED_MEMORY on/off shouldn't impact this particular change, as it appears to only affect the device_memory allocation API for 'ggml_cuda_device_malloc', which remains unrelated to host memory(pinned memory) logical reasoning,as show in: related_source_code. If my understanding is flawed here, I'd greatly appreciate your correction. Thank you for your guidance.

However, I've observed that enabling CUDA Graph on Jetson devices triggers an assertion failure related to 'buft'. I'm currently diagnosing and addressing this issue, so this branch may not yet ready for integration

Yangxiaoz · 2025-05-30T04:21:15Z

Should we check something like canMapHostMemory / canUseHostPointerForRegisteredMem in addition to integrated before letting the GPU directly access buffer allocated by cudaMallocHost? The integrated flag doesn't seem to cover all the prerequisites of such access.

I did consider this aspect previously. Based on my understanding, mapping or register 'host memory' would fall under the purview of 'buffer_from_host_ptr' functionality. While it intersects conceptually with 'host_buffer' ,but the two represent distinct mechanismsas. as we can see:

struct ggml_backend_dev_caps {
        // asynchronous operations
        bool async;
        // pinned host buffer
        bool host_buffer;
        // creating buffers from host ptr
        bool buffer_from_host_ptr;
        // event synchronization
        bool events;
    };

And one thing should note: The device's ability to map/register host pointers (pageable memory) for cuda_device may require 'I/O Coherency' support consideration, whereas accessing pinned host buffers doesn't share this requirement.

The current implementation appears to lack support for 'buffer_from_host_ptr' on all CUDA devices. Enabling this on integrated GPUs would likely necessitate significant code modifications.I don't have a good idea to add this support yet.

Of course, my understanding of the above concepts may not be very accurate. If I have fallen into some misunderstanding, please discuss it with me.

…valuate_and_capture_cuda_graph()'

Yangxiaoz · 2025-05-30T07:16:37Z

Hello, @JohannesGaessler .I apologize for bothering you again. I've pushed a new commit (bd21613) to this PR to fix the assert issue encountered when running CUDA Graph on the integrateGPU device.

For this latest commit, I have tested and verified it on both device types: dGPU (NVIDIA RTX 4060) and iGPU (Jetson Orin). I also tested with both the 'cuda_Graph' and 'GGML_CUDA_ENABLE_UNIFIED_MEMORY' flags enabled and disabled. All tests now pass. Could you please check the PR now and see if anything else needs attention?

JohannesGaessler · 2025-05-30T08:42:02Z

Should we check something like canMapHostMemory / canUseHostPointerForRegisteredMem in addition to integrated before letting the GPU directly access buffer allocated by cudaMallocHost? The integrated flag doesn't seem to cover all the prerequisites of such access.

In the context of ggml, a "CUDA host buffer" is always allocated as pinned memory. To my understanding, if canMapHostMemory is false, then the allocation of pinned memory will fail and integrated && ggml_backend_buft_is_cuda_host(buft)) will evaluate to false.

ggml/src/ggml-cuda/ggml-cuda.cu

Add a defensive security assert Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Adjusted the support judgment logic. Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

…n_device

Yangxiaoz · 2025-05-30T12:44:32Z

hi, @JohannesGaessler .After testing the latest suggest commit on my jetson device, I found that assert would be triggered.as we can see:

so i revoke this change: GGML_ASSERT(!integrated || prop.canUseHostPointerForRegisteredMem);

I seem to have found the answer based on this post：link.

So in dGPU, this flag is 1, and in arm Soc such as jetson is 0. It seems that must use 'cudaHostGetDevicePointer' to get the pointer

Yangxiaoz · 2025-05-30T12:51:33Z

For the latest commit 63db683 , I have tested it on my jetson, and both the opening and closing of flag 'GGML_CUDA_ENABLE_UNIFIED_MEMORY' has passed the test

ggml/src/ggml-cuda/ggml-cuda.cu

Add parentheses to enforce operator precedence Co-authored-by: Diego Devesa <slarengh@gmail.com>

ggml/src/ggml-cuda/ggml-cuda.cu

Fix ci bug: add a spaces Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

1. add "integrated" in ggml_cuda_device_info for distinguish whether …

db0444f

…it is Intergrate_gpu or discrete_gpu 2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 29, 2025

JohannesGaessler reviewed May 29, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

Yangxiaoz and others added 3 commits May 30, 2025 10:30

Update ggml/src/ggml-cuda/ggml-cuda.cu

99cbf62

Adjusted code indentation Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Update ggml/src/ggml-cuda/ggml-cuda.cu

eea49ab

Fixed incorrect setting of variable types Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Update ggml/src/ggml-cuda/ggml-cuda.cu

075fae3

Adjusted the judgment logic Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

add a host_buft assert in case of integrated_cuda_device with func:'e…

bd21613

…valuate_and_capture_cuda_graph()'

Yangxiaoz changed the title ~~CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda #13856~~ CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) May 30, 2025

JohannesGaessler reviewed May 30, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Show resolved Hide resolved

JohannesGaessler reviewed May 30, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

Yangxiaoz and others added 3 commits May 30, 2025 19:18

Update ggml/src/ggml-cuda/ggml-cuda.cu

1959c24

Add a defensive security assert Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Update ggml/src/ggml-cuda/ggml-cuda.cu

308886a

Adjusted the support judgment logic. Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

revoke the suggest commit changes due to it's not applicable in jetso…

63db683

…n_device

JohannesGaessler approved these changes May 30, 2025

View reviewed changes

slaren reviewed May 30, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

Update ggml/src/ggml-cuda/ggml-cuda.cu

73472d1

Add parentheses to enforce operator precedence Co-authored-by: Diego Devesa <slarengh@gmail.com>

JohannesGaessler reviewed May 30, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

Update ggml/src/ggml-cuda/ggml-cuda.cu

2ab5def

Fix ci bug: add a spaces Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Yangxiaoz requested a review from slaren May 31, 2025 00:05

slaren approved these changes May 31, 2025

View reviewed changes

JohannesGaessler merged commit eb39499 into ggml-org:master May 31, 2025
46 checks passed

Yangxiaoz deleted the IntegrateCUDA branch May 31, 2025 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) #13895

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) #13895

Uh oh!

Yangxiaoz commented May 29, 2025

Uh oh!

Yangxiaoz commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler commented May 29, 2025

Uh oh!

hjc4869 commented May 30, 2025

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

JohannesGaessler commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) #13895

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) #13895

Uh oh!

Conversation

Yangxiaoz commented May 29, 2025

Uh oh!

Yangxiaoz commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler commented May 29, 2025

Uh oh!

hjc4869 commented May 30, 2025

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

JohannesGaessler commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

Yangxiaoz commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!