Misc. bug: llama-bench improper tensor split

### Name and Version

version: 5574 (093e3f1f)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-bench

### Command line

```shell
build/bin/llama-bench -m /dataset/models/gguf/TheDrummer_Fallen-Gemma3-27B-v1-Q8_0.gguf --rpc 192.168.1.17:50052
```

### Problem description & steps to reproduce

When attempting to run a benchmark split between multiple devices to compare to offload from a single card, bench attempts to fully offload onto the rpc card and ignores the local devices.

(gemma 3 does fully offload on the host device, this was just testing speeds of various things. it does not fully offload on the client device)

### First Bad Commit

_No response_

### Relevant log output

```shell
Accepted client connection, free_mem=23854841856, total_mem=25422659584
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 27371.91 MiB on device 0: cudaMalloc failed: out of memory
[alloc_buffer] size: 28701528704 -> failed
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: llama-bench improper tensor split #13972

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: llama-bench improper tensor split #13972

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions