ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

etasnadi · 2025-05-26T23:28:15Z

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D
test-backend-ops: adds additional tests for CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

tests/test-backend-ops.cpp

ggml/src/ggml-vulkan/vulkan-shaders/conv_transpose_1d.comp

Number of additional tests reduced to 108.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

etasnadi · 2025-05-27T21:01:14Z

The Ubuntu 22 runner stopped because test-backend-ops crashes with segfault. I've observed this behavior on a raw build before as well so I don't know if it has anything to do with my modifications or if it makes this happen more likely.

Edit: when the app crashes with segfault it is always during the exit after the tests were executed.

jeffbolznv · 2025-05-27T21:15:11Z

It's showing failures in some of the conv_transpose_1d tests, e.g.:

29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=1,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.459058786 > 0.000000100 FAIL
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=2,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.222932697 > 0.000000100 FAIL
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=3,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.167964696 > 0.000000100 FAIL

Then it appears to crash on the first test after the conv_transpose_1d tests, which does seem like it's related to this change somehow.

etasnadi · 2025-05-27T23:01:27Z

It's showing failures in some of the conv_transpose_1d tests, e.g.:
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=1,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.459058786 > 0.000000100 FAIL
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=2,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.222932697 > 0.000000100 FAIL
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=3,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.167964696 > 0.000000100 FAIL
Then it appears to crash on the first test after the conv_transpose_1d tests, which does seem like it's related to this change somehow.

That's interesting. I could finally reproduce these errors with llvmpipe, but not with the discrete GPU. I need time to investigate what's the root cause.

jeffbolznv · 2025-05-27T23:38:03Z

Wild guess - any chance of uninitialized shared memory?

etasnadi · 2025-05-28T00:36:56Z

Wild guess - any chance of uninitialized shared memory?

It can be, but there is a chance that the code is correct but llvmpipe what's the root cause of the failing tests.

I executed parameter sweeps (Cin, K, L) and turns out that if one (or combination of parameters) are large enough, the test fails. Even if I test with Cin=20000, L=64, K=4 the test fails that's suspicious because Cin "does not influence the logic too much". Cin=20000, L=64, K=3 passes.

Argmax also segfaults on llvmpipe on my computer for some reason.

* Removes extra whitespaces. * Adds int64->int32 casts to prevent possible warnings.

etasnadi · 2025-05-28T21:35:10Z

Wild guess - any chance of uninitialized shared memory?

No, I was not aware of the fact that iterations in loops are simply skipped after reaching a certain limit on llvmpipe, so I decided to reduce the maximum test problem size to 1337x13 (input x kernel) in order to to pass the tests on llvmpipee in commit 2813cf4. Llvmpipe starts to trim the loops after ~30k iterations so this number should be safe and also realistic opposed to my previous edge cases. 5120x512 also worked for me with 128 threads resulting in ~20k iterations per thread.

etasnadi · 2025-06-03T09:42:26Z

@slaren @0cc4m @jeffbolznv any plans to launch the pipelines to see if this can be merged finally?

jeffbolznv · 2025-06-03T11:59:27Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

@@ -9964,6 +10029,8 @@ static bool ggml_backend_vk_device_supports_op(ggml_backend_dev_t dev, const ggm
        case GGML_OP_COUNT_EQUAL:
        case GGML_OP_IM2COL:
        case GGML_OP_TIMESTEP_EMBEDDING:
+        case GGML_OP_CONV_TRANSPOSE_1D:
+            return op->src[0]->type == GGML_TYPE_F32 && op->src[1]->type == GGML_TYPE_F32;


I think the CI crash happens here - these other ops shouldn't fall through to this logic.

Yes, it's definitely an issue.

Thanks for pointing this out, fixed in commit 4e04f45. The snippet did indeed cause the crash and seems it's now working locally:

5635/5635 tests passed Backend Vulkan0: OK

0cc4m · 2025-06-04T20:01:55Z

Thank you!

* ggml-vulkan: adds op CONV_TRANSPOSE_1D

6694ab6

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels May 26, 2025

jeffbolznv reviewed May 27, 2025

View reviewed changes

Missing barrier added to shader.

1b71c23

Number of additional tests reduced to 108.

jeffbolznv reviewed May 27, 2025

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated Show resolved Hide resolved

etasnadi added 2 commits May 28, 2025 23:06

* Fixes typo in variable name.

c56820d

* Removes extra whitespaces. * Adds int64->int32 casts to prevent possible warnings.

Problem size reduced in tests to pass tests with llvmpipe.

2813cf4

jeffbolznv reviewed Jun 3, 2025

View reviewed changes

supports_op condition moved from unintended position

4e04f45

jeffbolznv approved these changes Jun 4, 2025

View reviewed changes

0cc4m merged commit 0d39844 into ggml-org:master Jun 4, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

Uh oh!

etasnadi commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

etasnadi commented May 27, 2025 •

edited

Loading

Uh oh!

jeffbolznv commented May 27, 2025

Uh oh!

etasnadi commented May 27, 2025

Uh oh!

jeffbolznv commented May 27, 2025

Uh oh!

etasnadi commented May 28, 2025

Uh oh!

etasnadi commented May 28, 2025 •

edited

Loading

Uh oh!

etasnadi commented Jun 3, 2025

Uh oh!

jeffbolznv Jun 3, 2025

Uh oh!

etasnadi Jun 3, 2025

Uh oh!

etasnadi Jun 3, 2025

Uh oh!

0cc4m commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

Uh oh!

Conversation

etasnadi commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

etasnadi commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented May 27, 2025

Uh oh!

etasnadi commented May 27, 2025

Uh oh!

jeffbolznv commented May 27, 2025

Uh oh!

etasnadi commented May 28, 2025

Uh oh!

etasnadi commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etasnadi commented Jun 3, 2025

Uh oh!

jeffbolznv Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

etasnadi Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

etasnadi Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

etasnadi commented May 27, 2025 •

edited

Loading

etasnadi commented May 28, 2025 •

edited

Loading