Skip to content

mixed inference in stable-diffusion #671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhouwg opened this issue May 6, 2025 · 8 comments
Open

mixed inference in stable-diffusion #671

zhouwg opened this issue May 6, 2025 · 8 comments

Comments

@zhouwg
Copy link

zhouwg commented May 6, 2025

it seems stable-diffusion.cpp can't support mixed inference currently:

https://github.com/leejet/stable-diffusion.cpp/blob/master/ggml_extend.hpp#L1230-L1262

the backed-scheduler feature in llama.cpp should be added in stable-diffusion.cpp to support mixed inference.

@rmatif
Copy link

rmatif commented May 21, 2025

@zhouwg

Can you try this commit dd3d6b0 with this patch ggml-org/ggml@8606b82 ?

I managed to get OpenCL working by offloading some compute to the CPU. However, the performance isn't great. I was wondering if QNN is worthwhile

@zhouwg
Copy link
Author

zhouwg commented May 21, 2025

awesome progress. I'll try them later and provide feedback accordingly.

@zhouwg zhouwg changed the title mixed inference in stabled-diffusion mixed inference in stable-diffusion May 22, 2025
@zhouwg
Copy link
Author

zhouwg commented May 22, 2025

merged your patch in this branch:https://github.com/kantv-ai/kantv/tree/stablediffusion-mixed-inference.

the ggml-backend.cpp in this branch is 1:1 copied from your patch ggml-org/ggml@8606b82.

the generated apk crashed at the launch stage.

05-22 14:47:12.482  1000  1000 F DEBUG   : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
05-22 14:47:12.482  1000  1000 F DEBUG   :     x0  0000000000000000  x1  000000000000039f  x2  0000000000000006  x3  0000007fdeefac00
05-22 14:47:12.482  1000  1000 F DEBUG   :     x4  000000000000000a  x5  000000000000000a  x6  000000000000000a  x7  7f7f7f7f7f7f7f7f
05-22 14:47:12.482  1000  1000 F DEBUG   :     x8  00000000000000f0  x9  00000077fa87f338  x10 ffffff80fffffb9f  x11 0000000000000001
05-22 14:47:12.482  1000  1000 F DEBUG   :     x12 000000775b0133e8  x13 000000775b012fe4  x14 0000000000000000  x15 0000000000000000
05-22 14:47:12.482  1000  1000 F DEBUG   :     x16 00000077fa965e50  x17 00000077fa94ffc0  x18 0000007826824000  x19 000000000000039f
05-22 14:47:12.482  1000  1000 F DEBUG   :     x20 000000000000039f  x21 00000000ffffffff  x22 0000007fdeefad88  x23 0000007826023a80
05-22 14:47:12.482  1000  1000 F DEBUG   :     x24 00000077fa9678a0  x25 0000000000000083  x26 0000000000000000  x27 00000074fa80db60
05-22 14:47:12.482  1000  1000 F DEBUG   :     x28 0000000000000000  x29 0000007fdeefac80
05-22 14:47:12.482  1000  1000 F DEBUG   :     lr  00000077fa8eba68  sp  0000007fdeefabe0  pc  00000077fa8eba98  pst 0000000000001000
05-22 14:47:12.482  1000  1000 F DEBUG   : 95 total frames
05-22 14:47:12.482  1000  1000 F DEBUG   : backtrace:
05-22 14:47:12.482  1000  1000 F DEBUG   :       #00 pc 0000000000092a98  /apex/com.android.runtime/lib64/bionic/libc.so (abort+172) (BuildId: a3cde331295ff116d9c0d5e2198af1eb)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #01 pc 00000000006cb980  /data/app/~~jXHgdZCxiTVHk4JYGJxX3g==/com.kantvai.kantvplayer-5pI2V7Y-s3VlFUf5rYMR6Q==/lib/arm64/libggml-jni.so (ggml_abort+224) (BuildId: 8c779e0de56b48951362bfaeb01d01bf4031a6a3)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #02 pc 00000000006e4be8  /data/app/~~jXHgdZCxiTVHk4JYGJxX3g==/com.kantvai.kantvplayer-5pI2V7Y-s3VlFUf5rYMR6Q==/lib/arm64/libggml-jni.so (BuildId: 8c779e0de56b48951362bfaeb01d01bf4031a6a3)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #03 pc 00000000006e4dcc  /data/app/~~jXHgdZCxiTVHk4JYGJxX3g==/com.kantvai.kantvplayer-5pI2V7Y-s3VlFUf5rYMR6Q==/lib/arm64/libggml-jni.so (ggml_backend_sched_alloc_graph+44) (BuildId: 8c779e0de56b48951362bfaeb01d01bf4031a6a3)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #04 pc 00000000009f6168  /data/app/~~jXHgdZCxiTVHk4JYGJxX3g==/com.kantvai.kantvplayer-5pI2V7Y-s3VlFUf5rYMR6Q==/lib/arm64/libggml-jni.so (BuildId: 8c779e0de56b48951362bfaeb01d01bf4031a6a3)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #05 pc 00000000009f4800  /data/app/~~jXHgdZCxiTVHk4JYGJxX3g==/com.kantvai.kantvplayer-5pI2V7Y-s3VlFUf5rYMR6Q==/lib/arm64/libggml-jni.so (whisper_init_state+5472) (BuildId: 8c779e0de56b48951362bfaeb01d01bf4031a6a3)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #06 pc 00000000009f92c4  /data/app/~~jXHgdZCxiTVHk4JYGJxX3g==/com.kantvai.kantvplayer-5pI2V7Y-s3VlFUf5rYMR6Q==/lib/arm64/libggml-jni.so (whisper_init_from_file_with_params+68) (BuildId: 8c779e0de56b48951362bfaeb01d01bf4031a6a3)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #07 pc 0000000000b46050  /data/app/~~jXHgdZCxiTVHk4JYGJxX3g==/com.kantvai.kantvplayer-5pI2V7Y-s3VlFUf5rYMR6Q==/lib/arm64/libggml-jni.so (whisper_asr_init+1088) (BuildId: 8c779e0de56b48951362bfaeb01d01bf4031a6a3)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #08 pc 0000000000b41f88  /data/app/~~jXHgdZCxiTVHk4JYGJxX3g==/com.kantvai.kantvplayer-5pI2V7Y-s3VlFUf5rYMR6Q==/lib/arm64/libggml-jni.so (Java_kantvai_ai_ggmljava_asr_1init+248) (BuildId: 8c779e0de56b48951362bfaeb01d01bf4031a6a3)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #09 pc 000000000052c570  /apex/com.android.art/lib64/libart.so (art_quick_generic_jni_trampoline+144) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #10 pc 0000000000516040  /apex/com.android.art/lib64/libart.so (art_quick_invoke_static_stub+640) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #11 pc 0000000000513974  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+2476) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #12 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #13 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #14 pc 000000000000220c  <anonymous:7821923000> (com.kantvai.kantvplayer.app.IApplication.initGlobal+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #15 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #16 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #17 pc 0000000000513ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #18 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #19 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #20 pc 0000000000003e60  <anonymous:7821923000> (com.kantvai.kantvplayer.app.IApplication.onCreate+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #21 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #22 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #23 pc 0000000000513ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #24 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #25 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #26 pc 000000000027455c  /system/framework/framework.jar (android.app.Instrumentation.callApplicationOnCreate+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #27 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #28 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #29 pc 0000000000513ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #30 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #31 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #32 pc 00000000001ed130  /system/framework/framework.jar (android.app.ActivityThread.handleBindApplication+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #33 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #34 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #35 pc 0000000000513ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #36 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #37 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #38 pc 00000000001e9f78  /system/framework/framework.jar (android.app.ActivityThread.-$$Nest$mhandleBindApplication+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #39 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #40 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #41 pc 0000000000513ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #42 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #43 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #44 pc 00000000001e4bc0  /system/framework/framework.jar (android.app.ActivityThread$H.handleMessage+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #45 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #46 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #47 pc 0000000000513ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #48 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #49 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #50 pc 000000000026c7a8  /system/framework/framework.jar (android.os.Handler.dispatchMessage+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #51 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #52 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #53 pc 0000000000513ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #54 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #55 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #56 pc 0000000000295538  /system/framework/framework.jar (android.os.Looper.loopOnce+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #57 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #58 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #59 pc 0000000000513ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #60 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #61 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #62 pc 0000000000295ea4  /system/framework/framework.jar (android.os.Looper.loop+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #63 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #64 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #65 pc 0000000000513ba8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #66 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #67 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #68 pc 00000000001f1ce4  /system/framework/framework.jar (android.app.ActivityThread.main+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #69 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #70 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #71 pc 000000000050c8fc  /apex/com.android.art/lib64/libart.so (artQuickToInterpreterBridge+532) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #72 pc 000000000052c698  /apex/com.android.art/lib64/libart.so (art_quick_to_interpreter_bridge+88) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #73 pc 0000000000516040  /apex/com.android.art/lib64/libart.so (art_quick_invoke_static_stub+640) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #74 pc 0000000000536644  /apex/com.android.art/lib64/libart.so (_jobject* art::InvokeMethod<(art::PointerSize)8>(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jobject*, _jobject*, unsigned long)+1360) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #75 pc 0000000000533b70  /apex/com.android.art/lib64/libart.so (art::Method_invoke(_JNIEnv*, _jobject*, _jobject*, _jobjectArray*) (.__uniq.165753521025965369065708152063621506277)+36) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #76 pc 000000000052c570  /apex/com.android.art/lib64/libart.so (art_quick_generic_jni_trampoline+144) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #77 pc 0000000000515d74  /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+612) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #78 pc 0000000000513858  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+2192) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #79 pc 000000000066a78c  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #80 pc 000000000052ebd8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #81 pc 000000000023153c  /system/framework/framework.jar (com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run+0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #82 pc 000000000050b03c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+68) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #83 pc 000000000050b1f4  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.4330668573328256514)+384) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #84 pc 000000000050c8fc  /apex/com.android.art/lib64/libart.so (artQuickToInterpreterBridge+532) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #85 pc 000000000052c698  /apex/com.android.art/lib64/libart.so (art_quick_to_interpreter_bridge+88) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #86 pc 0000000000950fe8  /system/framework/arm64/boot-framework.oat (com.android.internal.os.ZygoteInit.main+5080) (BuildId: f7ee8ce11eb0e28b54f4e698cfa5390d44f2a457)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #87 pc 0000000000516040  /apex/com.android.art/lib64/libart.so (art_quick_invoke_static_stub+640) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #88 pc 0000000000512b14  /apex/com.android.art/lib64/libart.so (art::JValue art::InvokeWithVarArgs<art::ArtMethod*>(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, art::ArtMethod*, std::__va_list)+880) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #89 pc 00000000006ca6e0  /apex/com.android.art/lib64/libart.so (art::JValue art::InvokeWithVarArgs<_jmethodID*>(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, std::__va_list)+32) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #90 pc 00000000006c9e44  /apex/com.android.art/lib64/libart.so (art::JNI<true>::CallStaticVoidMethodV(_JNIEnv*, _jclass*, _jmethodID*, std::__va_list)+164) (BuildId: fdc75c6487454d43ea65547d9c5c2d23)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #91 pc 00000000000e142c  /system/lib64/libandroid_runtime.so (_JNIEnv::CallStaticVoidMethod(_jclass*, _jmethodID*, ...)+108) (BuildId: b91287f090cf9f694ddd2f1a515321dd)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #92 pc 00000000000f7968  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::start(char const*, android::Vector<android::String8> const&, bool)+1008) (BuildId: b91287f090cf9f694ddd2f1a515321dd)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #93 pc 00000000000026fc  /system/bin/app_process64 (main+1596) (BuildId: 898d44f285b2a1d48b8b89b7b42d15c0)
05-22 14:47:12.482  1000  1000 F DEBUG   :       #94 pc 000000000008c420  /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+120) (BuildId: a3cde331295ff116d9c0d5e2198af1eb)
05-22 14:47:12.490  3547  1010 D OplusAmsUtilsFeatrue: detectExceptionsForOIDT type:0

zhouwg:$ ./android-ndk-r28/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-addr2line -Cfe ./android/kantvplayer/build/intermediates/cxx/Debug/3s1k6l44/obj/arm64-v8a/libggml-jni.so 00000000006cb980
ggml_abort
/home/zhouwg/kantvai/kantv/core/ggml/llamacpp/ggml/src/ggml.c:185
zhouwg:$ ./android-ndk-r28/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-addr2line -Cfe ./android/kantvplayer/build/intermediates/cxx/Debug/3s1k6l44/obj/arm64-v8a/libggml-jni.so 00000000006e4be8
ggml_backend_sched_split_graph(ggml_backend_sched*, ggml_cgraph*)
/home/zhouwg/kantvai/kantv/core/ggml/llamacpp/ggml/src/ggml-backend.cpp:1102
zhouwg:$ ./android-ndk-r28/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-addr2line -Cfe ./android/kantvplayer/build/intermediates/cxx/Debug/3s1k6l44/obj/arm64-v8a/libggml-jni.so 00000000006e4dcc
ggml_backend_sched_alloc_graph
/home/zhouwg/kantvai/kantv/core/ggml/llamacpp/ggml/src/ggml-backend.cpp:1634
zhouwg:$ ./android-ndk-r28/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-addr2line -Cfe ./android/kantvplayer/build/intermediates/cxx/Debug/3s1k6l44/obj/arm64-v8a/libggml-jni.so 00000000009f6168
whisper_sched_graph_init(whisper_sched&, std::__ndk1::vector<ggml_backend*, std::__ndk1::allocator<ggml_backend*>>, std::__ndk1::function<ggml_cgraph* ()>&&)
/home/zhouwg/kantvai/kantv/core/ggml/whispercpp/whisper.cpp:614

we can see that whisper.cpp already support mixed inference because it seems there is a full-time AI expert(danbev) works/focus on https://github.com/ggml-org/whisper.cpp.

whisper.cpp + ggml-hexagon backend can works fine as expected(although the performance of ggml-hexagon is slower than the default ggml backend at the moment) on master branch:

Image
Image

@rmatif
Copy link

rmatif commented May 22, 2025

@zhouwg

It's a bit harder to debug this way and pinpoint exactly where it fails. Could you please compile sdcpp with the Hexagon backend using debug flags and run the binary through ADB? That would make debugging easier.

I can also take a look, even though I don't have a device with this NPU. I can use Firebase's device streaming feature to get one

@zhouwg
Copy link
Author

zhouwg commented May 22, 2025

@zhouwg

It's a bit harder to debug this way and pinpoint exactly where it fails.

I see and I agree with your opinion: a pure command line program is easier for troubleshooting.

Could you please compile sdcpp with the Hexagon backend using debug flags and run the binary through ADB? That would make debugging easier.

I'll try your approach later and provide feedback accordingly.

I can also take a look, even though I don't have a device with this NPU. I can use Firebase's device streaming feature to get one

thanks for your time and hope we can fix this problem finally.

@zhouwg
Copy link
Author

zhouwg commented May 23, 2025

the following steps has verified in my x86-Linux workstation:

build stable-diffusion.cpp + ggml-hexagon(a specified backend for Qualcomm Hexagon NPU) in command line mode on Linux:

  • fetch source code
    https://github.com/zhouwg/stable-diffusion.cpp
  • setup dev env
    ./scripts/build-run-android.sh
  • build with debug mode
    ./scripts/build-run-android.sh build_debug
  • run sdcpp on Snapdragon based phone(8gen3 or 8elite is recommended)
    ./scripts/build-run-android.sh run_sdcpp

should I submit a PR to stable-diffusion community? then you and other AI experts can work on that PR accordingly?

self-contained-build was enabled in that branch(simplify workflow and assist you and other AI researchers/experts to focus on highly-value hard-core AI related R&D activities) and it contains a big customized toolchains from Qualcomm, so I think submit a PR to stable-diffusion community might-be inappropriate.

@rmatif
Copy link

rmatif commented May 23, 2025

@zhouwg

stable-diffusion.cpp relies on the ggml submodule. If your backend hasn't been merged into the ggml master branch, it won't be possible to integrate it into stable-diffusion.cpp at this time.

I've just tested your script. You need to include ggml-hexagon.h in ggml_extended.hpp and use the flag -DSD_HEXAGON=ON instead of -DGGML_HEXAGON=ON to enable the backend. Doing this, it seems that the Hexagon backend gets initialized but then fails and falls back to the CPU.

This line appears in the output:

[ggmlhexagon_check_valid_appcfg, 1960]: it seems there is wrong configuration in ggml-hexagon.cfg, will using the default ggml backend accordingly

I tested this on a Snapdragon 8 Gen 3 and compiled with the flags HTP_ARCH_VERSION=v75 and HTP_ARCH_VERSION_a=v75.

Here is the full log:

[DEBUG] stable-diffusion.cpp:188  - enable GGML_HEXAGON
[ggmlhexagon_load_cfg, 1859]: load hexagon appcfg from /data/local/tmp/ggml-hexagon.cfg
[operator(), 1865]: section[cdsp      ],[thread_counts            ] = [1]
[operator(), 1865]: section[cdsp      ],[enable_all_q_mulmat      ] = [0]
[operator(), 1865]: section[cdsp      ],[enable_rpc_ion_mempool   ] = [1]
[operator(), 1865]: section[qnn       ],[precision_mode           ] = [fp16]
[operator(), 1865]: section[qnn       ],[enable_dlbc              ] = [1]
[operator(), 1865]: section[qnn       ],[vtcm_size_in_mb          ] = [8]
[operator(), 1865]: section[qnn       ],[hvx_threads              ] = [8]
[operator(), 1865]: section[qnn       ],[print_qnn_internal_log   ] = [0]
[operator(), 1865]: section[general   ],[profiler_counts          ] = [200]
[operator(), 1865]: section[general   ],[profiler_duration        ] = [5]
[operator(), 1865]: section[general   ],[enable_profiler          ] = [0]
[operator(), 1865]: section[general   ],[enable_perf              ] = [1]
[operator(), 1865]: section[general   ],[version                  ] = [1.08]
[operator(), 1865]: section[general   ],[dump_op_info             ] = [0]
[operator(), 1865]: section[general   ],[hwaccel_approach         ] = [2]
[operator(), 1865]: section[general   ],[hexagon_backend          ] = [4]
[operator(), 1865]: section[general   ],[ggmldsp_version          ] = [0.63]
[operator(), 1865]: section[general   ],[enable_pinned_memory     ] = [0]
[operator(), 1865]: section[general   ],[print_tensors_info       ] = [0]
[operator(), 1865]: section[general   ],[enable_q_mulmat          ] = [0]
[ggmlhexagon_load_cfg, 1893]: internal ggml_hexagon_version=1.08
[ggmlhexagon_load_cfg, 1894]: internal ggml_dsp_version=0.63
[ggmlhexagon_load_cfg, 1895]: external ggml_hexagon_version=1.08
[ggmlhexagon_load_cfg, 1896]: external ggml_dsp_version=0.63
[ggmlhexagon_load_cfg, 1899]: hwaccel_approach=2(HWACCEL_CDSP)
[ggmlhexagon_load_cfg, 1901]: hexagon_backend=4(ggml)
[ggmlhexagon_load_cfg, 1902]: runtime libpath=/data/local/tmp/
[ggmlhexagon_load_cfg, 1903]: enable_perf=1
[ggmlhexagon_load_cfg, 1904]: enable_profiler=0
[ggmlhexagon_check_valid_appcfg, 1929]: using default ggml backend
[ggmlhexagon_check_valid_appcfg, 1960]: it seems there is wrong configuration in ggml-hexagon.cfg, will using the default ggml backend accordingly
[DEBUG] stable-diffusion.cpp:197  - Using CPU backend
[INFO ] stable-diffusion.cpp:206  - loading model from '/data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors'
[INFO ] model.cpp:912  - load /data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors using safetensors format
[DEBUG] model.cpp:983  - init from '/data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors'
[INFO ] stable-diffusion.cpp:253  - Version: SD 1.x 
[INFO ] stable-diffusion.cpp:286  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:287  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:288  - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:289  - VAE weight type:             f16
[DEBUG] stable-diffusion.cpp:291  - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1217 - clip params backend buffer size =  307.44 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1217 - unet params backend buffer size =  1640.25 MB(RAM) (686 tensors)
[DEBUG] ggml_extend.hpp:1217 - vae params backend buffer size =  94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:428  - loading weights
[DEBUG] model.cpp:1731 - loading tensors from /data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors
  |==================================================| 1130/1130 - 500.00it/s
[INFO ] stable-diffusion.cpp:527  - total params memory size = 2042.16MB (VRAM 0.00MB, RAM 2042.16MB): clip 307.44MB(RAM), unet 1640.25MB(RAM), vae 94.47MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:531  - loading model from '/data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors' completed, taking 1.31s
[INFO ] stable-diffusion.cpp:565  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:609  - finished loaded file
[DEBUG] stable-diffusion.cpp:1557 - txt2img 256x256
[DEBUG] stable-diffusion.cpp:1250 - prompt after extract and remove lora: "lovely cat"
[INFO ] stable-diffusion.cpp:699  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1255 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:357  - parse 'lovely cat' to [['lovely cat', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 1.40 MiB
[DEBUG] ggml_extend.hpp:1152 - clip compute buffer size for CPU: 1.40 MB
[DEBUG] conditioner.hpp:485  - computing condition graph completed, taking 449 ms
[INFO ] stable-diffusion.cpp:1388 - get_learned_condition completed, taking 449 ms
[INFO ] stable-diffusion.cpp:1411 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1448 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp:817  - Sample
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 49.57 MiB
[DEBUG] ggml_extend.hpp:1152 - unet compute buffer size for CPU: 49.57 MB
  |==================================================| 1/1 - 5.92s/it
[INFO ] stable-diffusion.cpp:1487 - sampling completed, taking 5.93s
[INFO ] stable-diffusion.cpp:1495 - generating 1 latent images completed, taking 6.00s
[INFO ] stable-diffusion.cpp:1498 - decoding 1 latents
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 416.00 MiB
[DEBUG] ggml_extend.hpp:1152 - vae compute buffer size for CPU: 416.00 MB
[DEBUG] stable-diffusion.cpp:1099 - computing vae [mode: DECODE] graph completed, taking 22.32s
[INFO ] stable-diffusion.cpp:1508 - latent 1 decoded, taking 22.32s
[INFO ] stable-diffusion.cpp:1512 - decode_first_stage completed, taking 22.32s
[INFO ] stable-diffusion.cpp:1637 - txt2img completed in 28.77s
save result PNG image to 'output.png'

@zhouwg
Copy link
Author

zhouwg commented May 24, 2025

@zhouwg

stable-diffusion.cpp relies on the ggml submodule. If your backend hasn't been merged into the ggml master branch, it won't be possible to integrate it into stable-diffusion.cpp at this time.

that branch has removed git submodule and imported the entire ggml source codes from project ggml-hexagon to simplify workflow for troubleshooting this issue(mixed inference in stable-diffusion). at the same time, your patches has merged to that branch manually.

I've just tested your script. You need to include ggml-hexagon.h in ggml_extended.hpp and use the flag -DSD_HEXAGON=ON instead of -DGGML_HEXAGON=ON to enable the backend.

thanks for your reminder. I already modified accordingly in new commit, pls check.

Doing this, it seems that the Hexagon backend gets initialized but then fails and falls back to the CPU.
This line appears in the output:

[ggmlhexagon_check_valid_appcfg, 1960]: it seems there is wrong configuration in ggml-hexagon.cfg, will using the default ggml backend accordingly

because the hexagon_backend is hardcoded to 4 in the scripts/ggml-hexagon.cfg

for hexagon-cdsp backend, we need to modify hexagon_backend to 3 in the scripts/ggml-hexagon.cfg manually before run "./scripts/build-run-android.sh run_sdcpp". I already modified it to 3 in the new commit to simplify workflow.

my running log with the new commit in that branch as following(you can see the hexagon-cdsp backend has initialized successfully and there is a same issue in the ggml-backend.cpp --- which is 1:1 copied from your patches):

/data/local/tmp//libQnnCpu.so
/data/local/tmp//libQnnGpu.so
/data/local/tmp//libQnnHtp.so
QNN libs already exist on Android phone
./scripts/ggml-hexagon.cfg: 1 file pushed. 1.1 MB/s (3210 bytes in 0.003s)
./out/android/bin/sd: 1 file pushed. 18.6 MB/s (41526408 bytes in 2.125s)
./scripts/ggml-hexagon.cfg: 1 file pushed. 0.5 MB/s (3210 bytes in 0.006s)
[ggmlhexagon_load_cfg, 1859]: load hexagon appcfg from /data/local/tmp/ggml-hexagon.cfg
[operator(), 1865]: section[cdsp      ],[thread_counts            ] = [1]
[operator(), 1865]: section[cdsp      ],[enable_all_q_mulmat      ] = [0]
[operator(), 1865]: section[cdsp      ],[enable_rpc_ion_mempool   ] = [1]
[operator(), 1865]: section[qnn       ],[precision_mode           ] = [fp16]
[operator(), 1865]: section[qnn       ],[enable_dlbc              ] = [1]
[operator(), 1865]: section[qnn       ],[vtcm_size_in_mb          ] = [8]
[operator(), 1865]: section[qnn       ],[hvx_threads              ] = [8]
[operator(), 1865]: section[qnn       ],[print_qnn_internal_log   ] = [0]
[operator(), 1865]: section[general   ],[profiler_counts          ] = [200]
[operator(), 1865]: section[general   ],[profiler_duration        ] = [5]
[operator(), 1865]: section[general   ],[enable_profiler          ] = [0]
[operator(), 1865]: section[general   ],[enable_perf              ] = [1]
[operator(), 1865]: section[general   ],[version                  ] = [1.08]
[operator(), 1865]: section[general   ],[dump_op_info             ] = [0]
[operator(), 1865]: section[general   ],[hwaccel_approach         ] = [2]
[operator(), 1865]: section[general   ],[hexagon_backend          ] = [3]
[operator(), 1865]: section[general   ],[ggmldsp_version          ] = [0.63]
[operator(), 1865]: section[general   ],[enable_pinned_memory     ] = [0]
[operator(), 1865]: section[general   ],[print_tensors_info       ] = [0]
[operator(), 1865]: section[general   ],[enable_q_mulmat          ] = [0]
[ggmlhexagon_load_cfg, 1893]: internal ggml_hexagon_version=1.08
[ggmlhexagon_load_cfg, 1894]: internal ggml_dsp_version=0.63
[ggmlhexagon_load_cfg, 1895]: external ggml_hexagon_version=1.08
[ggmlhexagon_load_cfg, 1896]: external ggml_dsp_version=0.63
[ggmlhexagon_load_cfg, 1899]: hwaccel_approach=2(HWACCEL_CDSP)
[ggmlhexagon_load_cfg, 1901]: hexagon_backend=3(HEXAGON_BACKEND_CDSP)
[ggmlhexagon_load_cfg, 1902]: runtime libpath=/data/local/tmp/
[ggmlhexagon_load_cfg, 1903]: enable_perf=1
[ggmlhexagon_load_cfg, 1904]: enable_profiler=0
[ggmlhexagon_init_dsp, 5396]: using Hexagon domain 3(Hexagon-cDSP)
[ggmlhexagon_init_dsp, 5397]: unsignedpd_enabled 1
[ggmlhexagon_init_dsp, 5430]: succeed to open domain 3(Hexagon-cDSP)
[ggmlhexagon_init_dsp, 5432]: only support offload fp32 GGML_OP_ADD and fp32 GGML_OP_MUL_MAT to cDSP currently
[ggmlhexagon_probe_dspinfo, 5249]: dsp arch version 0x79
[ggmlhexagon_probe_dspinfo, 5256]: device info: Qualcomm SnapDragon 8 Elite(aka 8 Gen 4), QCOM_HTP_V79
[ggmlhexagon_probe_dspinfo, 5266]: vtcm_count 1
[ggmlhexagon_probe_dspinfo, 5267]: vtcm_page 8388608
[ggmlhexagon_probe_dspinfo, 5273]: hmx_depth 0
[ggmlhexagon_probe_dspinfo, 5274]: hmx_spatial 0
[ggmlhexagon_probe_dspinfo, 5278]: hvx_support_128b 6
[ggmlhexagon_probe_dspinfo, 5280]: unsigned pd supported 1
[ggmlhexagon_probe_dspinfo, 5281]: async fastrpc supported 0
[ggmlhexagon_set_rpc_latency, 4977]: set rpc qos 3, latency 100

[ggmlhexagon_init_rpcmempool, 5203]: capacity of rpc memory 4096 MiB
[INFO ] stable-diffusion.cpp:206  - loading model from '/sdcard/sd-v1-4.ckpt'
[INFO ] model.cpp:911  - load /sdcard/sd-v1-4.ckpt using checkpoint format
ZIP 0, name = archive/data.pkl, dir = archive/ 
[INFO ] stable-diffusion.cpp:253  - Version: SD 1.x 
[INFO ] stable-diffusion.cpp:286  - Weight type:                 f32
[INFO ] stable-diffusion.cpp:287  - Conditioner weight type:     f32
[INFO ] stable-diffusion.cpp:288  - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:289  - VAE weight type:             f32
  |==================================================| 1131/1131 - 0.00it/s
[INFO ] stable-diffusion.cpp:527  - total params memory size = 2719.24MB (VRAM 2719.24MB, RAM 0.00MB): clip 469.44MB(VRAM), unet 2155.33MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:531  - loading model from '/sdcard/sd-v1-4.ckpt' completed, taking 16.64s
[INFO ] stable-diffusion.cpp:565  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:699  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1255 - apply_loras completed, taking 0.00s
/home/zhouwg/kantvai/stable-diffusion.cpp/ggml/src/ggml-backend.cpp:1172: void ggml_backend_sched_split_graph(ggml_backend_sched_t, struct ggml_cgraph *): assertion "src_backend_id != -1" failed
Aborted 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants