You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
This repository contains PyTorch implementation for paper "Unleashing Text-to-Image Diffusion Models for Visual Perception" (ICCV 2023).
VPD (Visual Perception with Pre-trained Diffusion Models) is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
Download the checkpoint of stable-diffusion (we use v1-5 by default) and put it in the checkpoints folder. Please also follow the instructions in stable-diffusion to install the required packages.
Semantic Segmentation with VPD
Equipped with a lightweight Semantic FPN and trained for 80K iterations on $512\times512$ crops, our VPD can achieve 54.6 mIoU on ADE20K.
If you find our work useful in your research, please consider citing:
@article{zhao2023unleashing,
title={Unleashing Text-to-Image Diffusion Models for Visual Perception},
author={Zhao, Wenliang and Rao, Yongming and Liu, Zuyan and Liu, Benlin and Zhou, Jie and Lu, Jiwen},
journal={ICCV},
year={2023}
}
About
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.