Skip to content

Commit 64d231f

Browse files
leejetGreen-Sky
andauthored
feat: add flux support (leejet#356)
* add flux support * avoid build failures in non-CUDA environments * fix schnell support * add k quants support * add support for applying lora to quantized tensors * add inplace conversion support for f8_e4m3 (leejet#359) in the same way it is done for bf16 like how bf16 converts losslessly to fp32, f8_e4m3 converts losslessly to fp16 * add xlabs flux comfy converted lora support * update docs --------- Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>
1 parent 697d000 commit 64d231f

25 files changed

+1886
-172
lines changed

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,12 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
1212
- Super lightweight and without external dependencies
1313
- SD1.x, SD2.x, SDXL and SD3 support
1414
- !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).
15+
- [Flux-dev/Flux-schnell Support](./docs/flux.md)
1516

1617
- [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support
1718
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support.
1819
- 16-bit, 32-bit float support
19-
- 4-bit, 5-bit and 8-bit integer quantization support
20+
- 2-bit, 3-bit, 4-bit, 5-bit and 8-bit integer quantization support
2021
- Accelerated memory-efficient CPU inference
2122
- Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image, enabling Flash Attention just requires ~1.8GB.
2223
- AVX, AVX2 and AVX512 support for x86 architectures
@@ -57,7 +58,6 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
5758
- The current implementation of ggml_conv_2d is slow and has high memory usage
5859
- [ ] Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
5960
- [ ] Implement Inpainting support
60-
- [ ] k-quants support
6161

6262
## Usage
6363

@@ -202,7 +202,7 @@ arguments:
202202
--normalize-input normalize PHOTOMAKER input id images
203203
--upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.
204204
--upscale-repeats Run the ESRGAN upscaler this many times (default 1)
205-
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
205+
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_k, q3_k, q4_k)
206206
If not specified, the default is the type of the weight file.
207207
--lora-model-dir [DIR] lora model directory
208208
-i, --init-img [IMAGE] path to the input image, required by img2img
@@ -229,7 +229,7 @@ arguments:
229229
--vae-tiling process vae in tiles to reduce memory usage
230230
--control-net-cpu keep controlnet in cpu (for low vram)
231231
--canny apply canny preprocessor (edge detection)
232-
--color colors the logging tags according to level
232+
--color Colors the logging tags according to level
233233
-v, --verbose print extra info
234234
```
235235
@@ -240,6 +240,7 @@ arguments:
240240
# ./bin/sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"
241241
# ./bin/sd -m ../models/sd_xl_base_1.0.safetensors --vae ../models/sdxl_vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
242242
# ./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 1024 -W 1024 -p 'a lovely cat holding a sign says \"Stable Diffusion CPP\"' --cfg-scale 4.5 --sampling-method euler -v
243+
# ./bin/sd --diffusion-model ../models/flux1-dev-q3_k.gguf --vae ../models/ae.sft --clip_l ../models/clip_l.safetensors --t5xxl ../models/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v
243244
```
244245

245246
Using formats of different precisions will yield results of varying quality.

assets/flux/flux1-dev-q2_k.png

416 KB
Loading

assets/flux/flux1-dev-q3_k.png

490 KB
Loading

assets/flux/flux1-dev-q4_0.png

464 KB
Loading
566 KB
Loading

assets/flux/flux1-dev-q8_0.png

475 KB
Loading

assets/flux/flux1-schnell-q8_0.png

481 KB
Loading

common.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,7 @@ class SpatialTransformer : public GGMLBlock {
367367
int64_t n_head;
368368
int64_t d_head;
369369
int64_t depth = 1; // 1
370-
int64_t context_dim = 768; // hidden_size, 1024 for VERSION_2_x
370+
int64_t context_dim = 768; // hidden_size, 1024 for VERSION_SD2
371371

372372
public:
373373
SpatialTransformer(int64_t in_channels,

conditioner.hpp

Lines changed: 241 additions & 15 deletions
Large diffs are not rendered by default.

control.hpp

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
*/
1515
class ControlNetBlock : public GGMLBlock {
1616
protected:
17-
SDVersion version = VERSION_1_x;
17+
SDVersion version = VERSION_SD1;
1818
// network hparams
1919
int in_channels = 4;
2020
int out_channels = 4;
@@ -26,19 +26,19 @@ class ControlNetBlock : public GGMLBlock {
2626
int time_embed_dim = 1280; // model_channels*4
2727
int num_heads = 8;
2828
int num_head_channels = -1; // channels // num_heads
29-
int context_dim = 768; // 1024 for VERSION_2_x, 2048 for VERSION_XL
29+
int context_dim = 768; // 1024 for VERSION_SD2, 2048 for VERSION_SDXL
3030

3131
public:
3232
int model_channels = 320;
33-
int adm_in_channels = 2816; // only for VERSION_XL
33+
int adm_in_channels = 2816; // only for VERSION_SDXL
3434

35-
ControlNetBlock(SDVersion version = VERSION_1_x)
35+
ControlNetBlock(SDVersion version = VERSION_SD1)
3636
: version(version) {
37-
if (version == VERSION_2_x) {
37+
if (version == VERSION_SD2) {
3838
context_dim = 1024;
3939
num_head_channels = 64;
4040
num_heads = -1;
41-
} else if (version == VERSION_XL) {
41+
} else if (version == VERSION_SDXL) {
4242
context_dim = 2048;
4343
attention_resolutions = {4, 2};
4444
channel_mult = {1, 2, 4};
@@ -58,7 +58,7 @@ class ControlNetBlock : public GGMLBlock {
5858
// time_embed_1 is nn.SiLU()
5959
blocks["time_embed.2"] = std::shared_ptr<GGMLBlock>(new Linear(time_embed_dim, time_embed_dim));
6060

61-
if (version == VERSION_XL || version == VERSION_SVD) {
61+
if (version == VERSION_SDXL || version == VERSION_SVD) {
6262
blocks["label_emb.0.0"] = std::shared_ptr<GGMLBlock>(new Linear(adm_in_channels, time_embed_dim));
6363
// label_emb_1 is nn.SiLU()
6464
blocks["label_emb.0.2"] = std::shared_ptr<GGMLBlock>(new Linear(time_embed_dim, time_embed_dim));
@@ -307,7 +307,7 @@ class ControlNetBlock : public GGMLBlock {
307307
};
308308

309309
struct ControlNet : public GGMLRunner {
310-
SDVersion version = VERSION_1_x;
310+
SDVersion version = VERSION_SD1;
311311
ControlNetBlock control_net;
312312

313313
ggml_backend_buffer_t control_buffer = NULL; // keep control output tensors in backend memory
@@ -318,7 +318,7 @@ struct ControlNet : public GGMLRunner {
318318

319319
ControlNet(ggml_backend_t backend,
320320
ggml_type wtype,
321-
SDVersion version = VERSION_1_x)
321+
SDVersion version = VERSION_SD1)
322322
: GGMLRunner(backend, wtype), control_net(version) {
323323
control_net.init(params_ctx, wtype);
324324
}

0 commit comments

Comments
 (0)