Skip to content

Commit a469688

Browse files
bssrdfbssrdfleejet
authored
feat: add TencentARC PhotoMaker support (leejet#179)
* first efforts at implementing photomaker; lots more to do * added PhotoMakerIDEncoder model in SD * fixed soem bugs; now photomaker model weights can be loaded into their tensor buffers * added input id image loading * added preprocessing inpit id images * finished get_num_tensors * fixed a bug in remove_duplicates * add a get_learned_condition_with_trigger function to do photomaker stuff * add a convert_token_to_id function for photomaker to extract trigger word's token id * making progress; need to implement tokenizer decoder * making more progress; finishing vision model forward * debugging vision_model outputs * corrected clip vision model output * continue making progress in id fusion process * finished stacked id embedding; to be tested * remove garbage file * debuging graph compute * more progress; now alloc buffer failed * fixed wtype issue; input images can only be 1 because issue with transformer when batch size > 1 (to be investigated) * added delayed subject conditioning; now photomaker runs and generates images * fixed stat_merge_step * added photomaker lora model (to be tested) * reworked pmid lora * finished applying pmid lora; to be tested * finalized pmid lora * add a few print tensor; tweak in sample again * small tweak; still not getting ID faces * fixed a bug in FuseBlock forward; also remove diag_mask op in for vision transformer; getting better results * disable pmid lora apply for now; 1 input image seems working; > 1 not working * turn pmid lora apply back on * fixed a decode bug * fixed a bug in ggml's conv_2d, and now > 1 input images working * add style_ratio as a cli param; reworked encode with trigger for attention weights * merge commit fixing lora free param buffer error * change default style ratio to 10% * added an option to offload vae decoder to CPU for mem-limited gpus * removing image normalization step seems making ID fidelity much higher * revert default style ratio back ro 20% * added an option for normalizing input ID images; cleaned up debugging code * more clean up * fixed bugs; now failed with cuda error; likely out-of-mem on GPU * free pmid model params when required * photomaker working properly now after merging and adapting to GGMLBlock API * remove tensor renaming; fixing names in the photomaker model file * updated README.md to include instructions and notes for running PhotoMaker * a bit clean up * remove -DGGML_CUDA_FORCE_MMQ; more clean up and README update * add input image requirement in README * bring back freeing pmid lora params buffer; simply pooled output of CLIPvision * remove MultiheadAttention2; customized MultiheadAttention * added a WIN32 get_files_from_dir; turn off Photomakder if receiving no input images * update docs * fix ci error * make stable-diffusion.h a pure c header file This reverts commit 27887b6. * fix ci error * format code * reuse get_learned_condition * reuse pad_tokens * reuse CLIPVisionModel * reuse LoraModel * add --clip-on-cpu * fix lora name conversion for SDXL --------- Co-authored-by: bssrdf <bssrdf@gmail.com> Co-authored-by: leejet <leejet714@gmail.com>
1 parent 6198017 commit a469688

28 files changed

+3915
-166
lines changed

README.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
1414
- !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).
1515

1616
- [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support
17+
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support.
1718
- 16-bit, 32-bit float support
1819
- 4-bit, 5-bit and 8-bit integer quantization support
1920
- Accelerated memory-efficient CPU inference
@@ -151,7 +152,7 @@ cmake --build . --config Release
151152
### Run
152153
153154
```
154-
usage: ./build/bin/sd [arguments]
155+
usage: ./bin/sd [arguments]
155156

156157
arguments:
157158
-h, --help show this help message and exit
@@ -163,6 +164,9 @@ arguments:
163164
--taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
164165
--control-net [CONTROL_PATH] path to control net model
165166
--embd-dir [EMBEDDING_PATH] path to embeddings.
167+
--stacked-id-embd-dir [DIR] path to PHOTOMAKER stacked id embeddings.
168+
--input-id-images-dir [DIR] path to PHOTOMAKER input id images dir.
169+
--normalize-input normalize PHOTOMAKER input id images
166170
--upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.
167171
--upscale-repeats Run the ESRGAN upscaler this many times (default 1)
168172
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
@@ -175,6 +179,7 @@ arguments:
175179
-n, --negative-prompt PROMPT the negative prompt (default: "")
176180
--cfg-scale SCALE unconditional guidance scale: (default: 7.0)
177181
--strength STRENGTH strength for noising/unnoising (default: 0.75)
182+
--style-ratio STYLE-RATIO strength for keeping input identity (default: 20%)
178183
--control-strength STRENGTH strength to apply Control Net (default: 0.9)
179184
1.0 corresponds to full destruction of information in init image
180185
-H, --height H image height, in pixel space (default: 512)
@@ -299,6 +304,39 @@ You can use ESRGAN to upscale the generated images. At the moment, only the [Rea
299304
sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat" --upscale-model ../models/RealESRGAN_x4plus_anime_6B.pth
300305
```
301306

307+
#### Using PhotoMaker to personalize image generation
308+
309+
You can use [PhotoMaker](https://github.com/TencentARC/PhotoMaker) to personalize generated images with your own ID.
310+
311+
**NOTE**, currently PhotoMaker **ONLY** works with **SDXL** (any SDXL model files will work).
312+
313+
Download PhotoMaker model file (in safetensor format) [here](https://huggingface.co/bssrdf/PhotoMaker). The official release of the model file (in .bin format) does not work with ```stablediffusion.cpp```.
314+
315+
- Specify the PhotoMaker model path using the `--stacked-id-embd-dir PATH` parameter.
316+
- Specify the input images path using the `--input-id-images-dir PATH` parameter.
317+
- input images **must** have the same width and height for preprocessing (to be improved)
318+
319+
In prompt, make sure you have a class word followed by the trigger word ```"img"``` (hard-coded for now). The class word could be one of ```"man, woman, girl, boy"```. If input ID images contain asian faces, add ```Asian``` before the class
320+
word.
321+
322+
Another PhotoMaker specific parameter:
323+
324+
- ```--style-ratio (0-100)%```: default is 20 and 10-20 typically gets good results. Lower ratio means more faithfully following input ID (not necessarily better quality).
325+
326+
Other parameters recommended for running Photomaker:
327+
328+
- ```--cfg-scale 5.0```
329+
- ```-H 1024```
330+
- ```-W 1024```
331+
332+
If on low memory GPUs (<= 8GB), recommend running with ```--vae-on-cpu``` option to get artifact free images.
333+
334+
Example:
335+
336+
```bash
337+
bin/sd -m ../models/sdxlUnstableDiffusers_v11.safetensors --vae ../models/sdxl_vae.safetensors --stacked-id-embd-dir ../models/photomaker-v1.safetensors --input-id-images-dir ../assets/examples/scarletthead_woman -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0 --sampling-method euler -H 1024 -W 1024 --style-ratio 10 --vae-on-cpu -o output.png
338+
```
339+
302340
### Docker
303341

304342
#### Building using Docker
@@ -345,3 +383,4 @@ Thank you to all the people who have already contributed to stable-diffusion.cpp
345383
- [k-diffusion](https://github.com/crowsonkb/k-diffusion)
346384
- [latent-consistency-model](https://github.com/luosiallen/latent-consistency-model)
347385
- [generative-models](https://github.com/Stability-AI/generative-models/)
386+
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker)
38.8 KB
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)