-
Notifications
You must be signed in to change notification settings - Fork 377
mixed inference in stable-diffusion #671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you try this commit dd3d6b0 with this patch ggml-org/ggml@8606b82 ? I managed to get OpenCL working by offloading some compute to the CPU. However, the performance isn't great. I was wondering if QNN is worthwhile |
awesome progress. I'll try them later and provide feedback accordingly. |
merged your patch in this branch:https://github.com/kantv-ai/kantv/tree/stablediffusion-mixed-inference. the ggml-backend.cpp in this branch is 1:1 copied from your patch ggml-org/ggml@8606b82. the generated apk crashed at the launch stage.
zhouwg:$ ./android-ndk-r28/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-addr2line -Cfe ./android/kantvplayer/build/intermediates/cxx/Debug/3s1k6l44/obj/arm64-v8a/libggml-jni.so 00000000006cb980 we can see that whisper.cpp already support mixed inference because it seems there is a full-time AI expert(danbev) works/focus on https://github.com/ggml-org/whisper.cpp. whisper.cpp + ggml-hexagon backend can works fine as expected(although the performance of ggml-hexagon is slower than the default ggml backend at the moment) on master branch: |
It's a bit harder to debug this way and pinpoint exactly where it fails. Could you please compile sdcpp with the Hexagon backend using debug flags and run the binary through ADB? That would make debugging easier. I can also take a look, even though I don't have a device with this NPU. I can use Firebase's device streaming feature to get one |
I see and I agree with your opinion: a pure command line program is easier for troubleshooting.
I'll try your approach later and provide feedback accordingly.
thanks for your time and hope we can fix this problem finally. |
the following steps has verified in my x86-Linux workstation: build stable-diffusion.cpp + ggml-hexagon(a specified backend for Qualcomm Hexagon NPU) in command line mode on Linux:
should I submit a PR to stable-diffusion community? then you and other AI experts can work on that PR accordingly? self-contained-build was enabled in that branch(simplify workflow and assist you and other AI researchers/experts to focus on highly-value hard-core AI related R&D activities) and it contains a big customized toolchains from Qualcomm, so I think submit a PR to stable-diffusion community might-be inappropriate. |
I've just tested your script. You need to include This line appears in the output:
I tested this on a Snapdragon 8 Gen 3 and compiled with the flags Here is the full log: [DEBUG] stable-diffusion.cpp:188 - enable GGML_HEXAGON
[ggmlhexagon_load_cfg, 1859]: load hexagon appcfg from /data/local/tmp/ggml-hexagon.cfg
[operator(), 1865]: section[cdsp ],[thread_counts ] = [1]
[operator(), 1865]: section[cdsp ],[enable_all_q_mulmat ] = [0]
[operator(), 1865]: section[cdsp ],[enable_rpc_ion_mempool ] = [1]
[operator(), 1865]: section[qnn ],[precision_mode ] = [fp16]
[operator(), 1865]: section[qnn ],[enable_dlbc ] = [1]
[operator(), 1865]: section[qnn ],[vtcm_size_in_mb ] = [8]
[operator(), 1865]: section[qnn ],[hvx_threads ] = [8]
[operator(), 1865]: section[qnn ],[print_qnn_internal_log ] = [0]
[operator(), 1865]: section[general ],[profiler_counts ] = [200]
[operator(), 1865]: section[general ],[profiler_duration ] = [5]
[operator(), 1865]: section[general ],[enable_profiler ] = [0]
[operator(), 1865]: section[general ],[enable_perf ] = [1]
[operator(), 1865]: section[general ],[version ] = [1.08]
[operator(), 1865]: section[general ],[dump_op_info ] = [0]
[operator(), 1865]: section[general ],[hwaccel_approach ] = [2]
[operator(), 1865]: section[general ],[hexagon_backend ] = [4]
[operator(), 1865]: section[general ],[ggmldsp_version ] = [0.63]
[operator(), 1865]: section[general ],[enable_pinned_memory ] = [0]
[operator(), 1865]: section[general ],[print_tensors_info ] = [0]
[operator(), 1865]: section[general ],[enable_q_mulmat ] = [0]
[ggmlhexagon_load_cfg, 1893]: internal ggml_hexagon_version=1.08
[ggmlhexagon_load_cfg, 1894]: internal ggml_dsp_version=0.63
[ggmlhexagon_load_cfg, 1895]: external ggml_hexagon_version=1.08
[ggmlhexagon_load_cfg, 1896]: external ggml_dsp_version=0.63
[ggmlhexagon_load_cfg, 1899]: hwaccel_approach=2(HWACCEL_CDSP)
[ggmlhexagon_load_cfg, 1901]: hexagon_backend=4(ggml)
[ggmlhexagon_load_cfg, 1902]: runtime libpath=/data/local/tmp/
[ggmlhexagon_load_cfg, 1903]: enable_perf=1
[ggmlhexagon_load_cfg, 1904]: enable_profiler=0
[ggmlhexagon_check_valid_appcfg, 1929]: using default ggml backend
[ggmlhexagon_check_valid_appcfg, 1960]: it seems there is wrong configuration in ggml-hexagon.cfg, will using the default ggml backend accordingly
[DEBUG] stable-diffusion.cpp:197 - Using CPU backend
[INFO ] stable-diffusion.cpp:206 - loading model from '/data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors'
[INFO ] model.cpp:912 - load /data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors using safetensors format
[DEBUG] model.cpp:983 - init from '/data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors'
[INFO ] stable-diffusion.cpp:253 - Version: SD 1.x
[INFO ] stable-diffusion.cpp:286 - Weight type: f16
[INFO ] stable-diffusion.cpp:287 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:288 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:289 - VAE weight type: f16
[DEBUG] stable-diffusion.cpp:291 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1217 - clip params backend buffer size = 307.44 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1217 - unet params backend buffer size = 1640.25 MB(RAM) (686 tensors)
[DEBUG] ggml_extend.hpp:1217 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:428 - loading weights
[DEBUG] model.cpp:1731 - loading tensors from /data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors
|==================================================| 1130/1130 - 500.00it/s
[INFO ] stable-diffusion.cpp:527 - total params memory size = 2042.16MB (VRAM 0.00MB, RAM 2042.16MB): clip 307.44MB(RAM), unet 1640.25MB(RAM), vae 94.47MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:531 - loading model from '/data/local/tmp/realisticVisionV60B1_v51HyperVAE.safetensors' completed, taking 1.31s
[INFO ] stable-diffusion.cpp:565 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:609 - finished loaded file
[DEBUG] stable-diffusion.cpp:1557 - txt2img 256x256
[DEBUG] stable-diffusion.cpp:1250 - prompt after extract and remove lora: "lovely cat"
[INFO ] stable-diffusion.cpp:699 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1255 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:357 - parse 'lovely cat' to [['lovely cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 1.40 MiB
[DEBUG] ggml_extend.hpp:1152 - clip compute buffer size for CPU: 1.40 MB
[DEBUG] conditioner.hpp:485 - computing condition graph completed, taking 449 ms
[INFO ] stable-diffusion.cpp:1388 - get_learned_condition completed, taking 449 ms
[INFO ] stable-diffusion.cpp:1411 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1448 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp:817 - Sample
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 49.57 MiB
[DEBUG] ggml_extend.hpp:1152 - unet compute buffer size for CPU: 49.57 MB
|==================================================| 1/1 - 5.92s/it
[INFO ] stable-diffusion.cpp:1487 - sampling completed, taking 5.93s
[INFO ] stable-diffusion.cpp:1495 - generating 1 latent images completed, taking 6.00s
[INFO ] stable-diffusion.cpp:1498 - decoding 1 latents
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 416.00 MiB
[DEBUG] ggml_extend.hpp:1152 - vae compute buffer size for CPU: 416.00 MB
[DEBUG] stable-diffusion.cpp:1099 - computing vae [mode: DECODE] graph completed, taking 22.32s
[INFO ] stable-diffusion.cpp:1508 - latent 1 decoded, taking 22.32s
[INFO ] stable-diffusion.cpp:1512 - decode_first_stage completed, taking 22.32s
[INFO ] stable-diffusion.cpp:1637 - txt2img completed in 28.77s
save result PNG image to 'output.png'
|
that branch has removed git submodule and imported the entire ggml source codes from project ggml-hexagon to simplify workflow for troubleshooting this issue(mixed inference in stable-diffusion). at the same time, your patches has merged to that branch manually.
thanks for your reminder. I already modified accordingly in new commit, pls check.
because the hexagon_backend is hardcoded to 4 in the scripts/ggml-hexagon.cfg for hexagon-cdsp backend, we need to modify hexagon_backend to 3 in the scripts/ggml-hexagon.cfg manually before run "./scripts/build-run-android.sh run_sdcpp". I already modified it to 3 in the new commit to simplify workflow. my running log with the new commit in that branch as following(you can see the hexagon-cdsp backend has initialized successfully and there is a same issue in the ggml-backend.cpp --- which is 1:1 copied from your patches):
|
Uh oh!
There was an error while loading. Please reload this page.
it seems stable-diffusion.cpp can't support mixed inference currently:
https://github.com/leejet/stable-diffusion.cpp/blob/master/ggml_extend.hpp#L1230-L1262
the backed-scheduler feature in llama.cpp should be added in stable-diffusion.cpp to support mixed inference.
The text was updated successfully, but these errors were encountered: