Skip to content

vulkan: support SET_ROWS #14587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 12, 2025
Merged

vulkan: support SET_ROWS #14587

merged 2 commits into from
Jul 12, 2025

Conversation

jeffbolznv
Copy link
Collaborator

Add variants of the copy_to_quant shader that do the SET_ROWS operation. Change these shaders to spread the work across the workgroup. The memory access pattern is probably not great (one thread per quant block), but should be fine for now.

@jeffbolznv jeffbolznv requested a review from 0cc4m July 9, 2025 02:58
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 9, 2025
Add variants of the copy_to_quant shader that do the SET_ROWS operation.
Change these shaders to spread the work across the workgroup.
The memory access pattern is probably not great (one thread per quant block),
but should be fine for now.
@jeffbolznv
Copy link
Collaborator Author

Prompt processing performance is a few percent slower with LLAMA_SET_ROWS=1. I'll look into it soon. But I don't think it needs to block merging this.

Larger workgroups for non-quant types.
Set "norepeat" (there is manual repeat logic).
Use fastmod.
@jeffbolznv
Copy link
Collaborator Author

I did some optimizations for set_rows. It's maybe 1% slower than the default, but pretty close now.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0cc4m 0cc4m merged commit b3ad3a0 into ggml-org:master Jul 12, 2025
48 checks passed
@ggerganov
Copy link
Member

@jeffbolznv @0cc4m Do you know why some of the F16 tests slightly exceed the NMSE limit on my RTX 2060:

cmake .. -DGGML_VULKAN=ON
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.43.0") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native 
-- Found Vulkan: /usr/lib/x86_64-linux-gnu/libvulkan.so (found version "1.4.313") found components: glslc glslangValidator 
-- Vulkan found
-- GL_KHR_cooperative_matrix supported by glslc
-- GL_NV_cooperative_matrix2 supported by glslc
-- GL_EXT_integer_dot_product supported by glslc
-- GL_EXT_bfloat16 supported by glslc
-- Including Vulkan backend
-- ggml version: 0.0.5876
-- ggml commit:  0c1df14b5
-- Found CURL: /usr/lib/x86_64-linux-gnu/libcurl.so (found version "8.5.0")  
-- Configuring done (1.2s)
-- Generating done (0.1s)
-- Build files have been written to: /home/ggerganov/development/github/llama.cpp/build-vulkan-new

make -j && ./bin/test-backend-ops
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 2060 SUPER (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
Testing 2 devices
Backend 1/2: Vulkan0
  Device description: NVIDIA GeForce RTX 2060 SUPER
  Device memory: 8192 MB (8192 MB free)

[REGLU] NMSE = 0.000000313 > 0.000000100   REGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): FAIL
[REGLU] NMSE = 0.000000257 > 0.000000100   REGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): FAIL
[REGLU] NMSE = 0.000000281 > 0.000000100   REGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): FAIL
[REGLU] NMSE = 0.000000233 > 0.000000100   REGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): FAIL
[REGLU] NMSE = 0.000000284 > 0.000000100   REGLU(type=f16,ne_a=[128,2,2,2],v=0,split): FAIL
[REGLU] NMSE = 0.000000264 > 0.000000100   REGLU(type=f16,ne_a=[5,7,11,13],v=0,split): FAIL
[GEGLU] NMSE = 0.000000235 > 0.000000100   GEGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): FAIL
[GEGLU] NMSE = 0.000000257 > 0.000000100   GEGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): FAIL
[GEGLU] NMSE = 0.000000204 > 0.000000100   GEGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): FAIL
[GEGLU] NMSE = 0.000000241 > 0.000000100   GEGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): FAIL
[GEGLU] NMSE = 0.000000256 > 0.000000100   GEGLU(type=f16,ne_a=[128,2,2,2],v=0,split): FAIL
[GEGLU] NMSE = 0.000000245 > 0.000000100   GEGLU(type=f16,ne_a=[5,7,11,13],v=0,split): FAIL
[SWIGLU] NMSE = 0.000000234 > 0.000000100   SWIGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): FAIL
[SWIGLU] NMSE = 0.000000285 > 0.000000100   SWIGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): FAIL
[SWIGLU] NMSE = 0.000000259 > 0.000000100   SWIGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): FAIL
[SWIGLU] NMSE = 0.000000276 > 0.000000100   SWIGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): FAIL
[SWIGLU] NMSE = 0.000000264 > 0.000000100   SWIGLU(type=f16,ne_a=[128,2,2,2],v=0,split): FAIL
[SWIGLU] NMSE = 0.000000267 > 0.000000100   SWIGLU(type=f16,ne_a=[5,7,11,13],v=0,split): FAIL
[GEGLU_ERF] NMSE = 0.000000381 > 0.000000100   GEGLU_ERF(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): FAIL
[GEGLU_ERF] NMSE = 0.000000257 > 0.000000100   GEGLU_ERF(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): FAIL
[GEGLU_ERF] NMSE = 0.000000232 > 0.000000100   GEGLU_ERF(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): FAIL
[GEGLU_ERF] NMSE = 0.000000279 > 0.000000100   GEGLU_ERF(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): FAIL
[GEGLU_ERF] NMSE = 0.000000203 > 0.000000100   GEGLU_ERF(type=f16,ne_a=[128,2,2,2],v=0,split): FAIL
[GEGLU_ERF] NMSE = 0.000000261 > 0.000000100   GEGLU_ERF(type=f16,ne_a=[5,7,11,13],v=0,split): FAIL
[GEGLU_QUICK] NMSE = 0.000000287 > 0.000000100   GEGLU_QUICK(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): FAIL
[GEGLU_QUICK] NMSE = 0.000000253 > 0.000000100   GEGLU_QUICK(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): FAIL
[GEGLU_QUICK] NMSE = 0.000000224 > 0.000000100   GEGLU_QUICK(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): FAIL
[GEGLU_QUICK] NMSE = 0.000000246 > 0.000000100   GEGLU_QUICK(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): FAIL
[GEGLU_QUICK] NMSE = 0.000000308 > 0.000000100   GEGLU_QUICK(type=f16,ne_a=[128,2,2,2],v=0,split): FAIL
[GEGLU_QUICK] NMSE = 0.000000255 > 0.000000100   GEGLU_QUICK(type=f16,ne_a=[5,7,11,13],v=0,split): FAIL
[MUL] NMSE = 0.000000105 > 0.000000100   MUL(type=f16,ne=[1,1,8,1],nr=[1,1,1,1]): FAIL
[DIV] NMSE = 0.000000127 > 0.000000100   DIV(type=f16,ne=[1,1,8,1],nr=[1,1,1,1]): FAIL
[ADD] NMSE = 0.000000298 > 0.000000100   ADD(type=f16,ne=[1,1,1,1],nr=[32,1,1,1]): FAIL
[SUB] NMSE = 0.000000439 > 0.000000100   SUB(type=f16,ne=[1,1,1,1],nr=[32,1,1,1]): FAIL
[DIV] NMSE = 0.000000374 > 0.000000100   DIV(type=f16,ne=[1,1,1,1],nr=[32,1,1,1]): FAIL
[ADD] NMSE = 0.000000157 > 0.000000100   ADD(type=f16,ne=[1,1,320,320],nr=[1,1,1,1]): FAIL
[SUB] NMSE = 0.000000162 > 0.000000100   SUB(type=f16,ne=[1,1,320,320],nr=[1,1,1,1]): FAIL
[MUL] NMSE = 0.000000275 > 0.000000100   MUL(type=f16,ne=[1,1,320,320],nr=[1,1,1,1]): FAIL
[DIV] NMSE = 0.000000307 > 0.000000100   DIV(type=f16,ne=[1,1,320,320],nr=[1,1,1,1]): FAIL
[ADD] NMSE = 0.000000269 > 0.000000100   ADD(type=f16,ne=[10,5,1,1],nr=[1,1,1,1]): FAIL
[SUB] NMSE = 0.000000181 > 0.000000100   SUB(type=f16,ne=[10,5,1,1],nr=[1,1,1,1]): FAIL
[MUL] NMSE = 0.000000236 > 0.000000100   MUL(type=f16,ne=[10,5,1,1],nr=[1,1,1,1]): FAIL
[DIV] NMSE = 0.000000373 > 0.000000100   DIV(type=f16,ne=[10,5,1,1],nr=[1,1,1,1]): FAIL
[ADD] NMSE = 0.000000157 > 0.000000100   ADD(type=f16,ne=[10,5,4,1],nr=[1,1,1,1]): FAIL
[SUB] NMSE = 0.000000157 > 0.000000100   SUB(type=f16,ne=[10,5,4,1],nr=[1,1,1,1]): FAIL
[MUL] NMSE = 0.000000319 > 0.000000100   MUL(type=f16,ne=[10,5,4,1],nr=[1,1,1,1]): FAIL
[DIV] NMSE = 0.000000340 > 0.000000100   DIV(type=f16,ne=[10,5,4,1],nr=[1,1,1,1]): FAIL
[ADD] NMSE = 0.000000134 > 0.000000100   ADD(type=f16,ne=[10,5,4,3],nr=[1,1,1,1]): FAIL
[SUB] NMSE = 0.000000159 > 0.000000100   SUB(type=f16,ne=[10,5,4,3],nr=[1,1,1,1]): FAIL
[MUL] NMSE = 0.000000265 > 0.000000100   MUL(type=f16,ne=[10,5,4,3],nr=[1,1,1,1]): FAIL
[DIV] NMSE = 0.000000299 > 0.000000100   DIV(type=f16,ne=[10,5,4,3],nr=[1,1,1,1]): FAIL
[ADD] NMSE = 0.000000139 > 0.000000100   ADD(type=f16,ne=[10,5,4,3],nr=[2,1,1,1]): FAIL
[SUB] NMSE = 0.000000166 > 0.000000100   SUB(type=f16,ne=[10,5,4,3],nr=[2,1,1,1]): FAIL
[MUL] NMSE = 0.000000268 > 0.000000100   MUL(type=f16,ne=[10,5,4,3],nr=[2,1,1,1]): FAIL
[DIV] NMSE = 0.000000323 > 0.000000100   DIV(type=f16,ne=[10,5,4,3],nr=[2,1,1,1]): FAIL
[ADD] NMSE = 0.000000161 > 0.000000100   ADD(type=f16,ne=[10,5,4,3],nr=[1,2,1,1]): FAIL
[SUB] NMSE = 0.000000162 > 0.000000100   SUB(type=f16,ne=[10,5,4,3],nr=[1,2,1,1]): FAIL
[MUL] NMSE = 0.000000266 > 0.000000100   MUL(type=f16,ne=[10,5,4,3],nr=[1,2,1,1]): FAIL
[DIV] NMSE = 0.000000291 > 0.000000100   DIV(type=f16,ne=[10,5,4,3],nr=[1,2,1,1]): FAIL
[ADD] NMSE = 0.000000129 > 0.000000100   ADD(type=f16,ne=[10,5,4,3],nr=[1,1,2,1]): FAIL
[SUB] NMSE = 0.000000170 > 0.000000100   SUB(type=f16,ne=[10,5,4,3],nr=[1,1,2,1]): FAIL
[MUL] NMSE = 0.000000268 > 0.000000100   MUL(type=f16,ne=[10,5,4,3],nr=[1,1,2,1]): FAIL
[DIV] NMSE = 0.000000326 > 0.000000100   DIV(type=f16,ne=[10,5,4,3],nr=[1,1,2,1]): FAIL
[ADD] NMSE = 0.000000157 > 0.000000100   ADD(type=f16,ne=[10,5,4,3],nr=[1,1,1,2]): FAIL
[SUB] NMSE = 0.000000148 > 0.000000100   SUB(type=f16,ne=[10,5,4,3],nr=[1,1,1,2]): FAIL
[MUL] NMSE = 0.000000275 > 0.000000100   MUL(type=f16,ne=[10,5,4,3],nr=[1,1,1,2]): FAIL
[DIV] NMSE = 0.000000292 > 0.000000100   DIV(type=f16,ne=[10,5,4,3],nr=[1,1,1,2]): FAIL
[ADD] NMSE = 0.000000163 > 0.000000100   ADD(type=f16,ne=[10,5,4,3],nr=[1,1,2,2]): FAIL
[SUB] NMSE = 0.000000165 > 0.000000100   SUB(type=f16,ne=[10,5,4,3],nr=[1,1,2,2]): FAIL
[MUL] NMSE = 0.000000282 > 0.000000100   MUL(type=f16,ne=[10,5,4,3],nr=[1,1,2,2]): FAIL
[DIV] NMSE = 0.000000320 > 0.000000100   DIV(type=f16,ne=[10,5,4,3],nr=[1,1,2,2]): FAIL
[ADD] NMSE = 0.000000163 > 0.000000100   ADD(type=f16,ne=[10,5,4,3],nr=[1,2,2,2]): FAIL
[SUB] NMSE = 0.000000169 > 0.000000100   SUB(type=f16,ne=[10,5,4,3],nr=[1,2,2,2]): FAIL
[MUL] NMSE = 0.000000277 > 0.000000100   MUL(type=f16,ne=[10,5,4,3],nr=[1,2,2,2]): FAIL
[DIV] NMSE = 0.000000305 > 0.000000100   DIV(type=f16,ne=[10,5,4,3],nr=[1,2,2,2]): FAIL
[ADD] NMSE = 0.000000159 > 0.000000100   ADD(type=f16,ne=[10,5,4,3],nr=[2,2,2,2]): FAIL
[SUB] NMSE = 0.000000169 > 0.000000100   SUB(type=f16,ne=[10,5,4,3],nr=[2,2,2,2]): FAIL
[MUL] NMSE = 0.000000282 > 0.000000100   MUL(type=f16,ne=[10,5,4,3],nr=[2,2,2,2]): FAIL
[DIV] NMSE = 0.000000307 > 0.000000100   DIV(type=f16,ne=[10,5,4,3],nr=[2,2,2,2]): FAIL
[ADD] NMSE = 0.000000159 > 0.000000100   ADD(type=f16,ne=[1280,1,1,1],nr=[1,1,1,1]): FAIL
[SUB] NMSE = 0.000000174 > 0.000000100   SUB(type=f16,ne=[1280,1,1,1],nr=[1,1,1,1]): FAIL
[MUL] NMSE = 0.000000262 > 0.000000100   MUL(type=f16,ne=[1280,1,1,1],nr=[1,1,1,1]): FAIL
[DIV] NMSE = 0.000000312 > 0.000000100   DIV(type=f16,ne=[1280,1,1,1],nr=[1,1,1,1]): FAIL
[ADD] NMSE = 0.000000161 > 0.000000100   ADD(type=f16,ne=[1280,1,1,1],nr=[1,16,16,1]): FAIL
[SUB] NMSE = 0.000000160 > 0.000000100   SUB(type=f16,ne=[1280,1,1,1],nr=[1,16,16,1]): FAIL
[MUL] NMSE = 0.000000272 > 0.000000100   MUL(type=f16,ne=[1280,1,1,1],nr=[1,16,16,1]): FAIL
[DIV] NMSE = 0.000000308 > 0.000000100   DIV(type=f16,ne=[1280,1,1,1],nr=[1,16,16,1]): FAIL
[ADD] NMSE = 0.000000159 > 0.000000100   ADD(type=f16,ne=[1280,16,16,1],nr=[1,1,1,1]): FAIL
[SUB] NMSE = 0.000000160 > 0.000000100   SUB(type=f16,ne=[1280,16,16,1],nr=[1,1,1,1]): FAIL
[MUL] NMSE = 0.000000273 > 0.000000100   MUL(type=f16,ne=[1280,16,16,1],nr=[1,1,1,1]): FAIL
[DIV] NMSE = 0.000000308 > 0.000000100   DIV(type=f16,ne=[1280,16,16,1],nr=[1,1,1,1]): FAIL
[ADD] NMSE = 0.000000159 > 0.000000100   ADD(type=f16,ne=[1280,1,1,1],nr=[1,256,1,1]): FAIL
[SUB] NMSE = 0.000000160 > 0.000000100   SUB(type=f16,ne=[1280,1,1,1],nr=[1,256,1,1]): FAIL
[MUL] NMSE = 0.000000277 > 0.000000100   MUL(type=f16,ne=[1280,1,1,1],nr=[1,256,1,1]): FAIL
[DIV] NMSE = 0.000000308 > 0.000000100   DIV(type=f16,ne=[1280,1,1,1],nr=[1,256,1,1]): FAIL
[ADD] NMSE = 0.000000161 > 0.000000100   ADD(type=f16,ne=[1,1,1280,1],nr=[16,16,1,1]): FAIL
[SUB] NMSE = 0.000000159 > 0.000000100   SUB(type=f16,ne=[1,1,1280,1],nr=[16,16,1,1]): FAIL
[MUL] NMSE = 0.000000272 > 0.000000100   MUL(type=f16,ne=[1,1,1280,1],nr=[16,16,1,1]): FAIL
[DIV] NMSE = 0.000000306 > 0.000000100   DIV(type=f16,ne=[1,1,1280,1],nr=[16,16,1,1]): FAIL
[ADD] NMSE = 0.000000161 > 0.000000100   ADD(type=f16,ne=[16,16,1280,1],nr=[1,1,1,1]): FAIL
[SUB] NMSE = 0.000000160 > 0.000000100   SUB(type=f16,ne=[16,16,1280,1],nr=[1,1,1,1]): FAIL
[MUL] NMSE = 0.000000273 > 0.000000100   MUL(type=f16,ne=[16,16,1280,1],nr=[1,1,1,1]): FAIL
[DIV] NMSE = 0.000000306 > 0.000000100   DIV(type=f16,ne=[16,16,1280,1],nr=[1,1,1,1]): FAIL
[ADD] NMSE = 0.000000159 > 0.000000100   ADD(type=f16,ne=[1,1,1920,1],nr=[16,16,1,1]): FAIL
[SUB] NMSE = 0.000000161 > 0.000000100   SUB(type=f16,ne=[1,1,1920,1],nr=[16,16,1,1]): FAIL
[MUL] NMSE = 0.000000275 > 0.000000100   MUL(type=f16,ne=[1,1,1920,1],nr=[16,16,1,1]): FAIL
[DIV] NMSE = 0.000000305 > 0.000000100   DIV(type=f16,ne=[1,1,1920,1],nr=[16,16,1,1]): FAIL
[ADD] NMSE = 0.000000158 > 0.000000100   ADD(type=f16,ne=[1,1,2560,1],nr=[16,16,1,1]): FAIL
[SUB] NMSE = 0.000000160 > 0.000000100   SUB(type=f16,ne=[1,1,2560,1],nr=[16,16,1,1]): FAIL
[MUL] NMSE = 0.000000273 > 0.000000100   MUL(type=f16,ne=[1,1,2560,1],nr=[16,16,1,1]): FAIL
[DIV] NMSE = 0.000000308 > 0.000000100   DIV(type=f16,ne=[1,1,2560,1],nr=[16,16,1,1]): FAIL
[ADD] NMSE = 0.000000159 > 0.000000100   ADD(type=f16,ne=[1,1,1280,1],nr=[32,32,1,1]): FAIL
[SUB] NMSE = 0.000000158 > 0.000000100   SUB(type=f16,ne=[1,1,1280,1],nr=[32,32,1,1]): FAIL
[MUL] NMSE = 0.000000273 > 0.000000100   MUL(type=f16,ne=[1,1,1280,1],nr=[32,32,1,1]): FAIL
[DIV] NMSE = 0.000000307 > 0.000000100   DIV(type=f16,ne=[1,1,1280,1],nr=[32,32,1,1]): FAIL
[ADD] NMSE = 0.000000158 > 0.000000100   ADD(type=f16,ne=[1,1,1920,1],nr=[32,32,1,1]): FAIL
[SUB] NMSE = 0.000000160 > 0.000000100   SUB(type=f16,ne=[1,1,1920,1],nr=[32,32,1,1]): FAIL
[MUL] NMSE = 0.000000274 > 0.000000100   MUL(type=f16,ne=[1,1,1920,1],nr=[32,32,1,1]): FAIL
[DIV] NMSE = 0.000000306 > 0.000000100   DIV(type=f16,ne=[1,1,1920,1],nr=[32,32,1,1]): FAIL
[ADD] NMSE = 0.000000159 > 0.000000100   ADD(type=f16,ne=[1,1,640,1],nr=[32,32,1,1]): FAIL
[SUB] NMSE = 0.000000164 > 0.000000100   SUB(type=f16,ne=[1,1,640,1],nr=[32,32,1,1]): FAIL
[MUL] NMSE = 0.000000272 > 0.000000100   MUL(type=f16,ne=[1,1,640,1],nr=[32,32,1,1]): FAIL
[DIV] NMSE = 0.000000304 > 0.000000100   DIV(type=f16,ne=[1,1,640,1],nr=[32,32,1,1]): FAIL
[ADD] NMSE = 0.000000159 > 0.000000100   ADD(type=f16,ne=[5120,1,1,1],nr=[1,256,1,1]): FAIL
[SUB] NMSE = 0.000000159 > 0.000000100   SUB(type=f16,ne=[5120,1,1,1],nr=[1,256,1,1]): FAIL
[MUL] NMSE = 0.000000274 > 0.000000100   MUL(type=f16,ne=[5120,1,1,1],nr=[1,256,1,1]): FAIL
[DIV] NMSE = 0.000000307 > 0.000000100   DIV(type=f16,ne=[5120,1,1,1],nr=[1,256,1,1]): FAIL
[ADD] NMSE = 0.000000163 > 0.000000100   ADD(type=f16,ne=[640,1,1,1],nr=[1,1,1,1]): FAIL
[SUB] NMSE = 0.000000134 > 0.000000100   SUB(type=f16,ne=[640,1,1,1],nr=[1,1,1,1]): FAIL
[MUL] NMSE = 0.000000285 > 0.000000100   MUL(type=f16,ne=[640,1,1,1],nr=[1,1,1,1]): FAIL
[DIV] NMSE = 0.000000284 > 0.000000100   DIV(type=f16,ne=[640,1,1,1],nr=[1,1,1,1]): FAIL
  Backend Vulkan0: FAIL

@jeffbolznv
Copy link
Collaborator Author

Do you know why some of the F16 tests slightly exceed the NMSE limit on my RTX 2060:

Last time we saw something like this it was that the device didn't automatically round to nearest even. But we have shader variants that force that, and this device/driver should support it. I'll see if I can reproduce it.

@jeffbolznv
Copy link
Collaborator Author

Looks like it was different shaders it affected last time, so we don't force RTNE in these. I'll add variants that do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants