Skip to content

Commit d04248f

Browse files
committed
update docs
1 parent 46eeff5 commit d04248f

File tree

9 files changed

+69
-5
lines changed

9 files changed

+69
-5
lines changed

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,12 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
1212
- Super lightweight and without external dependencies
1313
- SD1.x, SD2.x, SDXL and SD3 support
1414
- !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).
15+
- [Flux-dev/Flux-schnell Support](./docs/flux.md)
1516

1617
- [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support
1718
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support.
1819
- 16-bit, 32-bit float support
19-
- 4-bit, 5-bit and 8-bit integer quantization support
20+
- 2-bit, 3-bit, 4-bit, 5-bit and 8-bit integer quantization support
2021
- Accelerated memory-efficient CPU inference
2122
- Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image, enabling Flash Attention just requires ~1.8GB.
2223
- AVX, AVX2 and AVX512 support for x86 architectures
@@ -57,7 +58,6 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
5758
- The current implementation of ggml_conv_2d is slow and has high memory usage
5859
- [ ] Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
5960
- [ ] Implement Inpainting support
60-
- [ ] k-quants support
6161

6262
## Usage
6363

@@ -171,7 +171,7 @@ arguments:
171171
--normalize-input normalize PHOTOMAKER input id images
172172
--upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.
173173
--upscale-repeats Run the ESRGAN upscaler this many times (default 1)
174-
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
174+
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_k, q3_k, q4_k)
175175
If not specified, the default is the type of the weight file.
176176
--lora-model-dir [DIR] lora model directory
177177
-i, --init-img [IMAGE] path to the input image, required by img2img
@@ -198,7 +198,7 @@ arguments:
198198
--vae-tiling process vae in tiles to reduce memory usage
199199
--control-net-cpu keep controlnet in cpu (for low vram)
200200
--canny apply canny preprocessor (edge detection)
201-
--color colors the logging tags according to level
201+
--color Colors the logging tags according to level
202202
-v, --verbose print extra info
203203
```
204204

@@ -209,6 +209,7 @@ arguments:
209209
# ./bin/sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"
210210
# ./bin/sd -m ../models/sd_xl_base_1.0.safetensors --vae ../models/sdxl_vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
211211
# ./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 1024 -W 1024 -p 'a lovely cat holding a sign says \"Stable Diffusion CPP\"' --cfg-scale 4.5 --sampling-method euler -v
212+
# ./bin/sd --diffusion-model ../models/flux1-dev-q3_k.gguf --vae ../models/ae.sft --clip_l ../models/clip_l.safetensors --t5xxl ../models/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v
212213
```
213214

214215
Using formats of different precisions will yield results of varying quality.

assets/flux/flux1-dev-q2_k.png

416 KB
Loading

assets/flux/flux1-dev-q3_k.png

490 KB
Loading

assets/flux/flux1-dev-q4_0.png

464 KB
Loading
566 KB
Loading

assets/flux/flux1-dev-q8_0.png

475 KB
Loading

assets/flux/flux1-schnell-q8_0.png

481 KB
Loading

docs/flux.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# How to Use
2+
3+
You can run Flux using stable-diffusion.cpp with a GPU that has 6GB or even 4GB of VRAM, without needing to offload to RAM.
4+
5+
## Download weights
6+
7+
- Download flux-dev from https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors
8+
- Download flux-schnell from https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/flux1-schnell.safetensors
9+
- Download vae from https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors
10+
- Download clip_l from https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/clip_l.safetensors
11+
- Download t5xxl from https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp16.safetensors
12+
13+
## Convert flux weights
14+
15+
Using fp16 will lead to overflow, but ggml's support for bf16 is not yet fully developed. Therefore, we need to convert flux to gguf format here, which also saves VRAM. For example:
16+
```
17+
.\bin\Release\sd.exe -M convert -m ..\..\ComfyUI\models\unet\flux1-dev.sft -o ..\models\flux1-dev-q8_0.gguf -v --type q8_0
18+
```
19+
20+
## Run
21+
22+
- `--cfg-scale` is recommended to be set to 1.
23+
24+
### Flux-dev
25+
For example:
26+
27+
```
28+
.\bin\Release\sd.exe --diffusion-model ..\models\flux1-dev-q8_0.gguf --vae ..\models\ae.sft --clip_l ..\models\clip_l.safetensors --t5xxl ..\models\t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v
29+
```
30+
31+
Using formats of different precisions will yield results of varying quality.
32+
33+
| Type | q8_0 | q4_0 | q3_k | q2_k |
34+
|---- | ---- |---- |---- |---- |
35+
| **Memory** | 12068.09 MB | 6394.53 MB | 4888.16 MB | 3735.73 MB |
36+
| **Result** | ![](../assets/flux/flux1-dev-q8_0.png) |![](../assets/flux/flux1-dev-q4_0.png) |![](../assets/flux/flux1-dev-q3_k.png) |![](../assets/flux/flux1-dev-q2_k.png)|
37+
38+
39+
40+
### Flux-schnell
41+
42+
43+
```
44+
.\bin\Release\sd.exe --diffusion-model ..\models\flux1-schnell-q8_0.gguf --vae ..\models\ae.sft --clip_l ..\models\clip_l.safetensors --t5xxl ..\models\t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --steps 4
45+
```
46+
47+
| q8_0 |
48+
| ---- |
49+
|![](../assets/flux/flux1-schnell-q8_0.png) |
50+
51+
## Run with LoRA
52+
53+
Since many flux LoRA training libraries have used various LoRA naming formats, it is possible that not all flux LoRA naming formats are supported. It is recommended to use LoRA with naming formats compatible with ComfyUI.
54+
55+
### Flux-dev q8_0 with LoRA
56+
57+
- LoRA model from https://huggingface.co/XLabs-AI/flux-lora-collection/tree/main (using comfy converted version!!!)
58+
59+
```
60+
.\bin\Release\sd.exe --diffusion-model ..\models\flux1-dev-q8_0.gguf --vae ...\models\ae.sft --clip_l ..\models\clip_l.safetensors --t5xxl ..\models\t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'<lora:realism_lora_comfy_converted:1>" --cfg-scale 1.0 --sampling-method euler -v --lora-model-dir ../models
61+
```
62+
63+
![output](../assets/flux/flux1-dev-q8_0%20with%20lora.png)

examples/cli/main.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ void print_usage(int argc, const char* argv[]) {
179179
printf(" --normalize-input normalize PHOTOMAKER input id images\n");
180180
printf(" --upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.\n");
181181
printf(" --upscale-repeats Run the ESRGAN upscaler this many times (default 1)\n");
182-
printf(" --type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)\n");
182+
printf(" --type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_k, q3_k, q4_k)\n");
183183
printf(" If not specified, the default is the type of the weight file.\n");
184184
printf(" --lora-model-dir [DIR] lora model directory\n");
185185
printf(" -i, --init-img [IMAGE] path to the input image, required by img2img\n");

0 commit comments

Comments
 (0)