Misc. bug: Build 6278 Vulkan crashes: llama-bench and llama-server both affected

### Name and Version

./llama-cli --version
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce GTX 1080 Ti (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from H:\llama.cpp_vulkan\ggml-vulkan.dll
load_backend: loaded CPU backend from H:\llama.cpp_vulkan\ggml-cpu-haswell.dll
version: 6278 (34bdbbd7)

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-bench, llama-server

### Command line

```shell
----------------------------------------------------
## Command Used for llama-bench
----------------------------------------------------
@echo off
H:\llama.cpp_vulkan\llama-bench.exe --model .\gpt-oss-20b-UD-Q4_K_XL.gguf ^
  --threads 24 --main-gpu 1 ^
  --n-gpu-layers 99 ^
  --n-prompt 512,2048 --n-gen 128,256,512 ^
  --mmap 0 --flash-attn 1 --split-mode layer

pause

----------------------------------------------------
## Command Used for llama-server
----------------------------------------------------
@echo off
set GGML_VK_VISIBLE_DEVICES=0,2
H:\llama.cpp_vulkan\llama-server.exe --model .\gpt-oss-20b-mxfp4.gguf ^
 --alias gtp-4.1 ^
 --threads -1 --main-gpu 1 ^
 --n-gpu-layers 99 --tensor-split 25,75 ^
 --temp 1.0 --min-p 0.0 --top-p 1.0 --top-k 40 --repeat-penalty 1.0 ^
 --port 8181 --host 127.0.0.1 ^
 --ctx-size 0 -fa --metrics --seed 42 ^
 --reasoning-format none --chat-template-kwargs "{\"reasoning_effort\":\"high\"}"
 
pause
```

### Problem description & steps to reproduce

llama-bench exits unexpectedly after displaying the table headers without running any benchmarks in build 6278 and newer builds, while the same command works perfectly in build 6277.

llama-server exits abruptly when receiving a simple "hello" message from Open WebUI in build 6278 and newer builds, while the same setup works perfectly in build 6277.

## Environment
- **OS**: Windows
- **Build**: 6278 (34bdbbd7) - broken, 6277 (74f52f77) - working
- **Compiler**: clang version 19.1.5 for x86_64-pc-windows-msvc
- **GPUs**: 
  - NVIDIA GeForce GTX 1080 Ti
  - NVIDIA GeForce RTX 4070 (main GPU)

## Command Used for llama-bench
```bash
H:\llama.cpp_vulkan\llama-bench.exe --model .\gpt-oss-20b-UD-Q4_K_XL.gguf ^
  --threads 24 --main-gpu 1 ^
  --n-gpu-layers 99 ^
  --n-prompt 512,2048 --n-gen 128,256,512 ^
  --mmap 0 --flash-attn 1 --split-mode layer
```

In build 6278, the program:
1. Successfully initializes backends (RPC, Vulkan, CPU)
2. Detects GPUs correctly
3. Displays the table header
4. Immediately exits with "Press any key to continue . . ." without running any benchmarks

## Working Output (Build 6277)
```
| model                          |       size |     params | backend    | ngl | threads |   main_gpu | fa | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ---------: | -: | ---: | --------------: | -------------------: |
| gpt-oss 20B Q4_K - Medium      |  11.04 GiB |    20.91 B | RPC,Vulkan |  99 |      24 |          1 |  1 |    0 |           pp512 |       606.65 ± 11.03 |
| gpt-oss 20B Q4_K - Medium      |  11.04 GiB |    20.91 B | RPC,Vulkan |  99 |      24 |          1 |  1 |    0 |          pp2048 |       562.76 ± 17.86 |
| gpt-oss 20B Q4_K - Medium      |  11.04 GiB |    20.91 B | RPC,Vulkan |  99 |      24 |          1 |  1 |    0 |           tg128 |         82.04 ± 1.57 |
| gpt-oss 20B Q4_K - Medium      |  11.04 GiB |    20.91 B | RPC,Vulkan |  99 |      24 |          1 |  1 |    0 |           tg256 |         82.53 ± 1.53 |
| gpt-oss 20B Q4_K - Medium      |  11.04 GiB |    20.91 B | RPC,Vulkan |  99 |      24 |          1 |  1 |    0 |           tg512 |         80.57 ± 0.94 |
```

## Broken Output (Build 6278)
```
| model                          |       size |     params | backend    | ngl | threads |   main_gpu | fa | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ---------: | -: | ---: | --------------: | -------------------: |
Press any key to continue . . .
```

## Additional Information
- The model file (`gpt-oss-20b-UD-Q4_K_XL.gguf`) is the same in both tests
- Backend initialization appears to work correctly in both versions
- GPU detection is identical in both builds


### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Build 6278 Vulkan crashes: llama-bench and llama-server both affected #15678

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Environment

Command Used for llama-bench

Working Output (Build 6277)

Broken Output (Build 6278)

Additional Information

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Build 6278 Vulkan crashes: llama-bench and llama-server both affected #15678

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Environment

Command Used for llama-bench

Working Output (Build 6277)

Broken Output (Build 6278)

Additional Information

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions