Open
Description
Name and Version
version: 5574 (093e3f1)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-bench
Command line
build/bin/llama-bench -m /dataset/models/gguf/TheDrummer_Fallen-Gemma3-27B-v1-Q8_0.gguf --rpc 192.168.1.17:50052
Problem description & steps to reproduce
When attempting to run a benchmark split between multiple devices to compare to offload from a single card, bench attempts to fully offload onto the rpc card and ignores the local devices.
(gemma 3 does fully offload on the host device, this was just testing speeds of various things. it does not fully offload on the client device)
First Bad Commit
No response
Relevant log output
Accepted client connection, free_mem=23854841856, total_mem=25422659584
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 27371.91 MiB on device 0: cudaMalloc failed: out of memory
[alloc_buffer] size: 28701528704 -> failed