HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://jshi31.github.io/InstantBooth/ access-control-allow-origin: * strict-transport-security: max-age=31556952 expires: Tue, 30 Dec 2025 14:37:02 GMT cache-control: max-age=600 x-proxy-cache: MISS x-github-request-id: D685:2DDCFF:A2C6FF:B6C39C:6953E136 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 14:27:02 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210098-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767104822.221207,VS0,VE209 vary: Accept-Encoding x-fastly-request-id: d0793dd377010a96477b7254042ef7647d2e9908 content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 06 Aug 2024 05:17:40 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"66b1b1f4-3f64" expires: Tue, 30 Dec 2025 14:37:02 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 96C3:123DE:A297C4:B697BF:6953E135 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 14:27:02 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210098-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767104822.444062,VS0,VE206 vary: Accept-Encoding x-fastly-request-id: 9413d3dfd51dcd094eda1f9de760d29b768bc849 content-length: 4710 InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Jing Shi*

Wei Xiong*

Zhe Lin

Hyun Joon Jung

Adobe

* Equal Contribution

Personalized text-to-image generation: given a set of images consisting of the same concept, the model can generate new scenes based on the input concept while following the input prompts.

Abstract

Recent advances in personalized image generation allow a pre-trained text-to-image model to learn a new concept from a set of images. However, existing personalization approaches usually require test-time finetuning for each concept, which is time-consuming and difficult to scale. We propose InstantBooth, a novel approach built upon pre-trained text-to-image models that enables instant text-guided image personalization without test-time finetuning. We achieve this with several major components. First, we learn the general concept of the input images by converting them to a textual token with a learnable image encoder. Second, to keep the fine details of the identity, we learn rich visual feature representation by introducing a few adapter layers to the pre-trained model. We train our components only on text-image pairs without using paired images of the same concept. Compared to test-time finetuning-based methods like DreamBooth and Textual-Inversion, our model can generate competitive results on unseen concepts concerning language-image alignment, image fidelity, and identity preservation while being 100 times faster.

Model Structure

An overview of our approach. We first inject a unique identifier $\hat{V}$ to the original input prompt to obtain "Photo of a $\hat{V}$ person", where $\hat{V}$ represents the input concept. Then we use the concept image encoder to convert the input images to a compact textual embedding and use a frozen Text encoder to map the other words to form the final prompt embeddings. We extract rich patch feature tokens from the input images with a patch encoder and then inject them to the adapter layers for better identity preservation. The U-Net of the pre-trained diffusion model takes the prompt embeddings and the rich visual feature as conditions to generate new images of the input concept. During training, only the image encoders and the adapter layers are trainable, the other parts are frozen. The model is optimized with only the reconstruction loss of the diffusion model. (We omit the object masks of the input images for simplicity.).

Visual comparison with other methods

Visualization for comparison of our method with Textual Inversion and DreamBooth

More Visual Result

Paper

Jing Shi, Wei Xiong, Zhe Lin, Hyun Joon Jung
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
(ArXiv)

Acknowledgements

We thank Qing Liu for dataset preparation and He Zhang for object mask computation. The template of this webpage is borrowed from Richard Zhang .

Contact

For further questions and suggestions, please contact Jing Shi and Wei Xiong (jingshi@adobe.com, wxiong@adobe.com).

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source