Users can input one or a few face photos, along with a text prompt, to receive a customized photo or painting within seconds (no training required!). Additionally, this model can be adapted to any base model based on SDXL or used in conjunction with other LoRA modules.

It mainly contains two parts corresponding to two keys in loaded state dict:

  • id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers.
  • lora_weights applies to all attention layers in the UNet, and the rank is set to 64.

For more information you can check the GitHub repository.

Usage Tips:

  • Upload more photos of the person to be customized to improve ID fidelity. If the input is Asian face(s), maybe consider adding 'Asian' before the class word.
  • When stylizing, does the generated face look too realistic? Adjust the Style strength to 30-50, the larger the number, the less ID fidelity, but the stylization ability will be better. You could also try out other base models or LoRAs with good stylization effects.
  • Reduce the number of generated images and sampling steps for faster speed. However, please keep in mind that reducing the sampling steps may compromise the ID fidelity.

You May Also Like: