Skip to content

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) #13895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 31, 2025

Conversation

Yangxiaoz
Copy link
Contributor

  1. add "integrated" in ggml_cuda_device_info for distinguish whether it is Intergrate_gpu or discrete_gpu

  2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature

As mentioned in #13856, some logical adjustments may be required for Integrate GPU in cuda

The adjustment is based on the following : https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#memory-types-table

image

…it is Intergrate_gpu or discrete_gpu

2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature
@Yangxiaoz
Copy link
Contributor Author

Hi @JohannesGaessler .Based on your helpful suggestions from our last discussion, I've updated the logic. Would you mind reviewing this revised version to confirm if it meets the requirements? Thanks for your time and advice.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 29, 2025
@JohannesGaessler
Copy link
Collaborator

Did you confirm that the code works correctly both when GGML_CUDA_ENABLE_UNIFIED_MEMORY is and isn't set?

Yangxiaoz and others added 3 commits May 30, 2025 10:30
Adjusted code indentation

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Fixed incorrect setting of variable types

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Adjusted the judgment logic

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
@hjc4869
Copy link
Contributor

hjc4869 commented May 30, 2025

Should we check something like canMapHostMemory / canUseHostPointerForRegisteredMem in addition to integrated before letting the GPU directly access buffer allocated by cudaMallocHost? The integrated flag doesn't seem to cover all the prerequisites of such access.

@Yangxiaoz
Copy link
Contributor Author

Thank you very much for your professional modification of my code.As this is my first contribution, I apologize for any oversight on my part. @JohannesGaessler

Did you confirm that the code works correctly both when GGML_CUDA_ENABLE_UNIFIED_MEMORY is and isn't set?

Regarding the point you raised, I believe toggling GGML_CUDA_ENABLE_UNIFIED_MEMORY on/off shouldn't impact this particular change, as it appears to only affect the device_memory allocation API for 'ggml_cuda_device_malloc', which remains unrelated to host memory(pinned memory) logical reasoning,as show in: related_source_code. If my understanding is flawed here, I'd greatly appreciate your correction. Thank you for your guidance.


However, I've observed that enabling CUDA Graph on Jetson devices triggers an assertion failure related to 'buft'. I'm currently diagnosing and addressing this issue, so this branch may not yet ready for integration

@Yangxiaoz
Copy link
Contributor Author

Should we check something like canMapHostMemory / canUseHostPointerForRegisteredMem in addition to integrated before letting the GPU directly access buffer allocated by cudaMallocHost? The integrated flag doesn't seem to cover all the prerequisites of such access.

I did consider this aspect previously. Based on my understanding, mapping or register 'host memory' would fall under the purview of 'buffer_from_host_ptr' functionality. While it intersects conceptually with 'host_buffer' ,but the two represent distinct mechanismsas. as we can see:

struct ggml_backend_dev_caps {
        // asynchronous operations
        bool async;
        // pinned host buffer
        bool host_buffer;
        // creating buffers from host ptr
        bool buffer_from_host_ptr;
        // event synchronization
        bool events;
    };

And one thing should note: The device's ability to map/register host pointers (pageable memory) for cuda_device may require 'I/O Coherency' support consideration, whereas accessing pinned host buffers doesn't share this requirement.

The current implementation appears to lack support for 'buffer_from_host_ptr' on all CUDA devices. Enabling this on integrated GPUs would likely necessitate significant code modifications.I don't have a good idea to add this support yet.

Of course, my understanding of the above concepts may not be very accurate. If I have fallen into some misunderstanding, please discuss it with me.

@Yangxiaoz Yangxiaoz changed the title CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda #13856 CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) May 30, 2025
@Yangxiaoz
Copy link
Contributor Author

Hello, @JohannesGaessler .I apologize for bothering you again. I've pushed a new commit (bd21613) to this PR to fix the assert issue encountered when running CUDA Graph on the integrateGPU device.

For this latest commit, I have tested and verified it on both device types: dGPU (NVIDIA RTX 4060) and iGPU (Jetson Orin). I also tested with both the 'cuda_Graph' and 'GGML_CUDA_ENABLE_UNIFIED_MEMORY' flags enabled and disabled. All tests now pass. Could you please check the PR now and see if anything else needs attention?

@JohannesGaessler
Copy link
Collaborator

Should we check something like canMapHostMemory / canUseHostPointerForRegisteredMem in addition to integrated before letting the GPU directly access buffer allocated by cudaMallocHost? The integrated flag doesn't seem to cover all the prerequisites of such access.

In the context of ggml, a "CUDA host buffer" is always allocated as pinned memory. To my understanding, if canMapHostMemory is false, then the allocation of pinned memory will fail and integrated && ggml_backend_buft_is_cuda_host(buft)) will evaluate to false.

Yangxiaoz and others added 3 commits May 30, 2025 19:18
Add a defensive security assert

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Adjusted the support judgment logic.

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
@Yangxiaoz
Copy link
Contributor Author

hi, @JohannesGaessler .After testing the latest suggest commit on my jetson device, I found that assert would be triggered.as we can see:
image

so i revoke this change: GGML_ASSERT(!integrated || prop.canUseHostPointerForRegisteredMem);

I seem to have found the answer based on this post:link.

So in dGPU, this flag is 1, and in arm Soc such as jetson is 0. It seems that must use 'cudaHostGetDevicePointer' to get the pointer

@Yangxiaoz
Copy link
Contributor Author

For the latest commit 63db683 , I have tested it on my jetson, and both the opening and closing of flag 'GGML_CUDA_ENABLE_UNIFIED_MEMORY' has passed the test

Add parentheses to enforce operator precedence​

Co-authored-by: Diego Devesa <slarengh@gmail.com>
Fix ci bug: add a spaces

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
@Yangxiaoz Yangxiaoz requested a review from slaren May 31, 2025 00:05
@JohannesGaessler JohannesGaessler merged commit eb39499 into ggml-org:master May 31, 2025
46 checks passed
@Yangxiaoz Yangxiaoz deleted the IntegrateCUDA branch May 31, 2025 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants