You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from user instructions.
🧙 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
This work presents a versatile image-to-image visual assistant, PixWizard, designed for image generation, manipulation, and translation based on free-from user instructions. [📖 Paper]
[🤗 PixWizard-Data-500K]
Clone this repository and navigate to PixWizard folder
git clone https://github.com/AFeng-x/PixWizard.git
cd PixWizard
nvcc Check
Before installation, ensure that you have a working nvcc
# The command should work and show the same version number as in our case. (12.1 in our case).
nvcc --version
On some outdated distros (e.g., CentOS 7), you may also want to check that a late enough version of
gcc is available
# The command should work and show a version of at least 6.0.# If not, consult distro-specific tutorials to obtain a newer version or build manually.
gcc --version
Install packages
# Create a new conda environment named 'PixWizard
conda create -n PixWizard -y
# Activate the 'sphinx-v' environment
conda activate PixWizard
# Install python and pytorch
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
# Install required packages from 'requirements.txt'
pip install -r requirements.txt
# Install Flash-Attention
pip install flash-attn --no-build-isolation
🚀 Inference
run the following command:
bash exps/inference_pixwizard.sh
🔥 Training
Prepare data
First, refer to the provided annotation_example to prepare your own training dataset.
Second, refer to s1.yaml and s2.yaml to write your prepared annotation JSON.
Run training
Place the downloaded weights for clip-vit-large-patch14-336 in the models/clip directory.
Update the model paths and data path in the script then run it.
🖊️: Citation
If you find our project useful for your research and applications, please kindly cite using this BibTeX:
@article{lin2024pixwizard,
title={PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions},
author={Lin, Weifeng and Wei, Xinyu and Zhang, Renrui and Zhuo, Le and Zhao, Shitian and Huang, Siyuan and Xie, Junlin and Qiao, Yu and Gao, Peng and Li, Hongsheng},
journal={arXiv preprint arXiv:2409.15278},
year={2024}
}
About
[ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from user instructions.