-
Notifications
You must be signed in to change notification settings - Fork 12k
CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) #13895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…it is Intergrate_gpu or discrete_gpu 2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature
Hi @JohannesGaessler .Based on your helpful suggestions from our last discussion, I've updated the logic. Would you mind reviewing this revised version to confirm if it meets the requirements? Thanks for your time and advice. |
Did you confirm that the code works correctly both when |
Adjusted code indentation Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Fixed incorrect setting of variable types Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Adjusted the judgment logic Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Should we check something like canMapHostMemory / canUseHostPointerForRegisteredMem in addition to |
Thank you very much for your professional modification of my code.As this is my first contribution, I apologize for any oversight on my part. @JohannesGaessler
Regarding the point you raised, I believe toggling However, I've observed that enabling CUDA Graph on Jetson devices triggers an assertion failure related to 'buft'. I'm currently diagnosing and addressing this issue, so this branch may not yet ready for integration |
I did consider this aspect previously. Based on my understanding, mapping or register 'host memory' would fall under the purview of 'buffer_from_host_ptr' functionality. While it intersects conceptually with 'host_buffer' ,but the two represent distinct mechanismsas. as we can see: struct ggml_backend_dev_caps {
// asynchronous operations
bool async;
// pinned host buffer
bool host_buffer;
// creating buffers from host ptr
bool buffer_from_host_ptr;
// event synchronization
bool events;
}; And one thing should note: The device's ability to map/register host pointers (pageable memory) for cuda_device may require 'I/O Coherency' support consideration, whereas accessing pinned host buffers doesn't share this requirement. The current implementation appears to lack support for 'buffer_from_host_ptr' on all CUDA devices. Enabling this on integrated GPUs would likely necessitate significant code modifications.I don't have a good idea to add this support yet. Of course, my understanding of the above concepts may not be very accurate. If I have fallen into some misunderstanding, please discuss it with me. |
…valuate_and_capture_cuda_graph()'
Hello, @JohannesGaessler .I apologize for bothering you again. I've pushed a new commit (bd21613) to this PR to fix the assert issue encountered when running CUDA Graph on the integrateGPU device. For this latest commit, I have tested and verified it on both device types: dGPU (NVIDIA RTX 4060) and iGPU (Jetson Orin). I also tested with both the 'cuda_Graph' and 'GGML_CUDA_ENABLE_UNIFIED_MEMORY' flags enabled and disabled. All tests now pass. Could you please check the PR now and see if anything else needs attention? |
In the context of ggml, a "CUDA host buffer" is always allocated as pinned memory. To my understanding, if |
Add a defensive security assert Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Adjusted the support judgment logic. Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
hi, @JohannesGaessler .After testing the latest suggest commit on my jetson device, I found that assert would be triggered.as we can see: so i revoke this change: I seem to have found the answer based on this post:link. So in dGPU, this flag is 1, and in arm Soc such as jetson is 0. It seems that must use 'cudaHostGetDevicePointer' to get the pointer |
For the latest commit 63db683 , I have tested it on my jetson, and both the opening and closing of flag 'GGML_CUDA_ENABLE_UNIFIED_MEMORY' has passed the test |
Add parentheses to enforce operator precedence Co-authored-by: Diego Devesa <slarengh@gmail.com>
Fix ci bug: add a spaces Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
add "integrated" in ggml_cuda_device_info for distinguish whether it is Intergrate_gpu or discrete_gpu
Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature
As mentioned in #13856, some logical adjustments may be required for Integrate GPU in cuda
The adjustment is based on the following : https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#memory-types-table