You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions.
Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka
This code base is modified based on the repo Grounded Segment Anything.
Pipeline
Our proposed framework has three components: language processor, segmenter, and image editor. Language processor processes the user instruction using a large language model. The goal of this processing is to parse the user instruction and output prompts for the segmenter and captions for the image editor. We adopt ChatGPT and optionally BLIP2 for this step. Segmenter uses the segmentation prompt provided by the language processor. We employ a state-of-the-art segmentation framework Grounded Segment Anything to automatically generate a high-quality mask based on the segmentation prompt. Image editor uses the captions from the language processor and the masks from the segmenter to compute the edited image. We adopt Stable Diffusion and the mask-guided generation from DiffEdit for this purpose.
Set up environment
Please set the environment variable manually as follows if you want to build a local GPU environment:
The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter is also required to run the example notebooks.
After setting up the environment, please also specify the openai key in chatgpt.py.
Playground
We provide a notebook (grounded_sam_instructedit_demo.ipynb), a python script (grounded_sam_instructedit_demo.py) and a gradio app (gradio_intructedit.py) for you to play around with.
Citation
@misc{wang2023instructedit,
title={InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions},
author={Qian Wang and Biao Zhang and Michael Birsak and Peter Wonka},
year={2023},
eprint={2305.18047},
archivePrefix={arXiv},
primaryClass={cs.CV}
}