You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add TencentARC PhotoMaker support (leejet#179)
* first efforts at implementing photomaker; lots more to do
* added PhotoMakerIDEncoder model in SD
* fixed soem bugs; now photomaker model weights can be loaded into their tensor buffers
* added input id image loading
* added preprocessing inpit id images
* finished get_num_tensors
* fixed a bug in remove_duplicates
* add a get_learned_condition_with_trigger function to do photomaker stuff
* add a convert_token_to_id function for photomaker to extract trigger word's token id
* making progress; need to implement tokenizer decoder
* making more progress; finishing vision model forward
* debugging vision_model outputs
* corrected clip vision model output
* continue making progress in id fusion process
* finished stacked id embedding; to be tested
* remove garbage file
* debuging graph compute
* more progress; now alloc buffer failed
* fixed wtype issue; input images can only be 1 because issue with transformer when batch size > 1 (to be investigated)
* added delayed subject conditioning; now photomaker runs and generates images
* fixed stat_merge_step
* added photomaker lora model (to be tested)
* reworked pmid lora
* finished applying pmid lora; to be tested
* finalized pmid lora
* add a few print tensor; tweak in sample again
* small tweak; still not getting ID faces
* fixed a bug in FuseBlock forward; also remove diag_mask op in for vision transformer; getting better results
* disable pmid lora apply for now; 1 input image seems working; > 1 not working
* turn pmid lora apply back on
* fixed a decode bug
* fixed a bug in ggml's conv_2d, and now > 1 input images working
* add style_ratio as a cli param; reworked encode with trigger for attention weights
* merge commit fixing lora free param buffer error
* change default style ratio to 10%
* added an option to offload vae decoder to CPU for mem-limited gpus
* removing image normalization step seems making ID fidelity much higher
* revert default style ratio back ro 20%
* added an option for normalizing input ID images; cleaned up debugging code
* more clean up
* fixed bugs; now failed with cuda error; likely out-of-mem on GPU
* free pmid model params when required
* photomaker working properly now after merging and adapting to GGMLBlock API
* remove tensor renaming; fixing names in the photomaker model file
* updated README.md to include instructions and notes for running PhotoMaker
* a bit clean up
* remove -DGGML_CUDA_FORCE_MMQ; more clean up and README update
* add input image requirement in README
* bring back freeing pmid lora params buffer; simply pooled output of CLIPvision
* remove MultiheadAttention2; customized MultiheadAttention
* added a WIN32 get_files_from_dir; turn off Photomakder if receiving no input images
* update docs
* fix ci error
* make stable-diffusion.h a pure c header file
This reverts commit 27887b6.
* fix ci error
* format code
* reuse get_learned_condition
* reuse pad_tokens
* reuse CLIPVisionModel
* reuse LoraModel
* add --clip-on-cpu
* fix lora name conversion for SDXL
---------
Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
Copy file name to clipboardExpand all lines: README.md
+40-1Lines changed: 40 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,7 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
14
14
- !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).
15
15
16
16
-[SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support
--strength STRENGTH strength for noising/unnoising (default: 0.75)
182
+
--style-ratio STYLE-RATIO strength for keeping input identity (default: 20%)
178
183
--control-strength STRENGTH strength to apply Control Net (default: 0.9)
179
184
1.0 corresponds to full destruction of information in init image
180
185
-H, --height H image height, in pixel space (default: 512)
@@ -299,6 +304,39 @@ You can use ESRGAN to upscale the generated images. At the moment, only the [Rea
299
304
sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat" --upscale-model ../models/RealESRGAN_x4plus_anime_6B.pth
300
305
```
301
306
307
+
#### Using PhotoMaker to personalize image generation
308
+
309
+
You can use [PhotoMaker](https://github.com/TencentARC/PhotoMaker) to personalize generated images with your own ID.
310
+
311
+
**NOTE**, currently PhotoMaker **ONLY** works with **SDXL** (any SDXL model files will work).
312
+
313
+
Download PhotoMaker model file (in safetensor format) [here](https://huggingface.co/bssrdf/PhotoMaker). The official release of the model file (in .bin format) does not work with ```stablediffusion.cpp```.
314
+
315
+
- Specify the PhotoMaker model path using the `--stacked-id-embd-dir PATH` parameter.
316
+
- Specify the input images path using the `--input-id-images-dir PATH` parameter.
317
+
- input images **must** have the same width and height for preprocessing (to be improved)
318
+
319
+
In prompt, make sure you have a class word followed by the trigger word ```"img"``` (hard-coded for now). The class word could be one of ```"man, woman, girl, boy"```. If input ID images contain asian faces, add ```Asian``` before the class
320
+
word.
321
+
322
+
Another PhotoMaker specific parameter:
323
+
324
+
-```--style-ratio (0-100)%```: default is 20 and 10-20 typically gets good results. Lower ratio means more faithfully following input ID (not necessarily better quality).
325
+
326
+
Other parameters recommended for running Photomaker:
327
+
328
+
-```--cfg-scale 5.0```
329
+
-```-H 1024```
330
+
-```-W 1024```
331
+
332
+
If on low memory GPUs (<= 8GB), recommend running with ```--vae-on-cpu``` option to get artifact free images.
333
+
334
+
Example:
335
+
336
+
```bash
337
+
bin/sd -m ../models/sdxlUnstableDiffusers_v11.safetensors --vae ../models/sdxl_vae.safetensors --stacked-id-embd-dir ../models/photomaker-v1.safetensors --input-id-images-dir ../assets/examples/scarletthead_woman -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0 --sampling-method euler -H 1024 -W 1024 --style-ratio 10 --vae-on-cpu -o output.png
338
+
```
339
+
302
340
### Docker
303
341
304
342
#### Building using Docker
@@ -345,3 +383,4 @@ Thank you to all the people who have already contributed to stable-diffusion.cpp
0 commit comments