Skip to content

Misc. bug: llama-bench improper tensor split #13972

Open
@yggdrasil75

Description

@yggdrasil75

Name and Version

version: 5574 (093e3f1)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-bench

Command line

build/bin/llama-bench -m /dataset/models/gguf/TheDrummer_Fallen-Gemma3-27B-v1-Q8_0.gguf --rpc 192.168.1.17:50052

Problem description & steps to reproduce

When attempting to run a benchmark split between multiple devices to compare to offload from a single card, bench attempts to fully offload onto the rpc card and ignores the local devices.

(gemma 3 does fully offload on the host device, this was just testing speeds of various things. it does not fully offload on the client device)

First Bad Commit

No response

Relevant log output

Accepted client connection, free_mem=23854841856, total_mem=25422659584
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 27371.91 MiB on device 0: cudaMalloc failed: out of memory
[alloc_buffer] size: 28701528704 -> failed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions