Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300

### Name and Version

```
llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Instinct MI300X VF, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64
version: 5201 (85f36e5e)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
```

### Operating systems
Linux

### Which llama.cpp modules do you know to be affected?
llama-server


### Problem description & steps to reproduce



```
llama-server -m UD-IQ1_S/MAI-DS-R1-UD-IQ1_S-00001-of-00004.gguf -c 32768  -b 8192 -ub 4096  -ngl 999  -to 3600  -a MAI-DS-R1-UD-IQ1_S --no-mmap -t 1 -nkvo -fa
```

small context windows, small prompts, it seems to work, but if i use large context, it seems like it's using CPU. I get about 20-30 tokens/s on small prompts. otherwise, it just hangs for hours


without `-fa` i have to use much smaller `-c` `-b` `-ub` values to not max out on VRAM, and it runs larger prompts, but only at 7-10 tokens/second.

```build.sh
#!/bin/bash 
cd ~/llama.cpp
git pull origin master

#rm -rf build
mkdir build
cd build

HIPCXX="$(hipconfig -l)/clang" \
HIP_PATH="$(hipconfig -R)" \
cmake -S .. -B . \
    -DGGML_HIP=ON \
    -DGGML_HIP_ROCWMMA_FATTN=ON \
    -DAMDGPU_TARGETS=gfx942 \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr/local \
    -DBUILD_SHARED_LIBS=ON \
    -DLLAMA_CURL=ON \
&& cmake --build . --config Release -j8 \
&& sudo cmake --install .
sudo ldconfig
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300 #13145

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300 #13145

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions