HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://gengzigang.github.io/instructdiffusion.github.io/ x-github-request-id: 1449:21D6A4:953E2A:A73E5B:6952D7B5 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 19:34:14 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210073-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767036854.009537,VS0,VE203 vary: Accept-Encoding x-fastly-request-id: 49b8068c14f062f1677fd2eaf93d065dbc2d9d7e content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 17 Oct 2023 02:03:59 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"652deb8f-321f" expires: Mon, 29 Dec 2025 19:44:14 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 3ECF:292AC1:93EEEF:A5EFEB:6952D7B6 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 19:34:14 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210073-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767036854.225982,VS0,VE223 vary: Accept-Encoding x-fastly-request-id: cfbe92aac75171a0f5b9cedf590dd6ee45883cf1 content-length: 3580 InstructDiffusion

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

Zigang Geng^*, Binxin Yang^*, Tiankai Hang^*, Chen Li^*, Shuyang Gu^†,
Ting Zhang, Jianmin Bao, Zheng Zhang, Han Hu, Dong Chen, Baining Guo

Microsoft Research Asia
^*Indicates Equal Contribution
^†Indicates Corresponding Author

Paper Code WebDemo

InstructDiffusion is a unifying and generic framework for aligning computer vision tasks with human instructions.

Abstract

We present InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions. Unlike existing approaches that integrate prior knowledge and pre-define the output space (\eg, categories and coordinates) for each vision task. We cast diverse vision tasks into a human-intuitive image-manipulating process whose output space is a flexible and interactive pixel space. Concretely, the model is based on the diffusion process and learned to predict the pixel according to user instructions (such as circling the left shoulder of the man with red and placing a blue mask on the left car). InstructDiffusion could handle various vision tasks such as understanding tasks (segmentation and keypoint detection) and generative tasks (editing and restoration). It even demonstrates the ability to handle unseen tasks and outperforms previous methods on unseen datasets. This represents a significant step towards a generalist modeling interface for vision tasks and advancing artificial general intelligence in computer vision.

Keypoint Detection

(a) Mark the car logo with a blue circle.
(b) Put a blue circle on the nose of the white tiger and use the red color to draw a circle around the left shoulder of the white tiger.
(c) Create a yellow circle around the right eye of the whale.
(d) Use blue to encircle the right wrist of the person on the far left and draw a yellow circle over the left wrist of the person on the far right.

Segmentation

(a) Mark the pixels of cat in the mirror to blue and leave the rest unchanged.
(b) Fill in the pixels of neutrophil with yellow, retaining the existing colors of the remaining pixels.
(c) Modify the pixels of Oriental Pearl Tower to red without affecting any other pixels.
(d) Paint the pixels of shadow in blue and maintain the current appearance of the other pixels.

Low Level Tasks

Image Editing

BibTeX


      @article{Geng23instructdiff,
        author       = {Zigang Geng and
                        Binxin Yang and
                        Tiankai Hang and
                        Chen Li and
                        Shuyang Gu and
                        Ting Zhang and
                        Jianmin Bao and
                        Zheng Zhang and
                        Han Hu and
                        Dong Chen and
                        Baining Guo},
        title        = {InstructDiffusion: {A} Generalist Modeling Interface for Vision Tasks},
        journal      = {CoRR},
        volume       = {abs/2309.03895},
        year         = {2023},
        url          = {https://doi.org/10.48550/arXiv.2309.03895},
        doi          = {10.48550/arXiv.2309.03895},
      }

This page was built using the Academic Project Page Template which was adopted from the Nerfies project page. You are free to borrow the of this website, we just ask that you link back to this page in the footer.
This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source