Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language

Method

we propose SaRA, a novel fine-tuning method for pre-trained diffusion models that trains the parameters with relatively small absolute values.

Interpolate start reference image.

Fig.1 The comparison between our SaRA (d) and the previous parameter efficient finetuning methods, including (a) addictive fine-tuning, (b) reparameterized fine-tuning and (c) the selective fine-tuning.

Implementation

SaRA can be implemented by simply modifying a single line of code, where you can just replace the original optimizer with the corresponding optimizer in SaRA.

Interpolate start reference image.

Improving the Backbone

SaRA can improve the performance of pre-trained models on the main task (the original task it is trained on), by optimizing the initially ineffective parameters to be effective and thus increasing the number of effective parameters.

Interpolate start reference image.

Fig.2 Quantitative comparisons among different PEFT methods on Backbone Fine-tuning on ImageNet, FFHQ, and CelebA-HQ datasets. Our method achieves the best FID scores, indicating our method effectively improves the performance of the pre-trained models on the main task.


Downstream Dataset Fine-tuning

In this experiment, we choose 5 widely-used datasets from CIVITAI with 5 different styles to conduct the fine-tuning experiments, which are Barbie Style, Cyberpunk Style, Elementfire Style, Expedition Style and Hornify Style from top to bottom. Our methods can learn the target domain style accurately while achieving good alignment between the generated images and the text prompts.

Interpolate start reference image.

Fig.3 Qualitative comparison on Stable Diffusion 1.5. Our can learn the domain-specific knowledge well whiling generating images that are consistent to the given prompts.


Interpolate start reference image.

Tab.1 Comparisons with different parameter-efficient fine-tuning methods, along with full-parameter fine- tuning on Stable Diffusion 1.5, 2.0, and 3.0. For most of the conditions, our model achieves the best FID and VLHI score, indicating that our model learns domain-specific knowledge successfully while keeping the prior information well. Bold and underline represent optimal and sub-optimal results, respectively.


Image Customization

Since Dreambooth requires fine-tuning the UNet network, SaRA can be employed to fine-tune the UNet and achieve image customization.

Interpolate start reference image.

Fig.4 Qualitative comparisons among different PEFT methods on image customization by fine-tuning the UNet model in Dreambooth. Our model can accurately capture the target feature while preventing the model from overfitting, outperforming Dreambooth with other PEFT methods and Textual inversion.


Controllable Video Generation

We further investigate the effectiveness of our method in fine-tuning video generation models (AnimateDiff) on datasets with different camera motions, including: ZoomIn, ZoomOut, PanLeft, and PanRight. SaRA keeps the model prior well and learns accurate camera motions.

GIF 1

BibTeX

@inproceedings{hu2024sara,
  title={SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation},
  author={Hu, Teng and Zhang, Jiangning and Yi, Ran and Huang, Hongrui and Wang, Yabiao and Ma Lizhuang},
  booktitle={Arxiv},
  year={2024}
}