HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Mon, 21 Jul 2025 04:19:53 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"687dbfe9-4e6a" expires: Tue, 30 Dec 2025 14:11:11 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 1D34:3827E5:A35D22:B765A1:6953DB26 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 14:01:11 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210048-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767103272.611130,VS0,VE230 vary: Accept-Encoding x-fastly-request-id: 21790c2d122c3ea128641445918a8ddd7ab6a6d1 content-length: 4764 GroundingBooth: Grounding Text-to-Image Customization

GroundingBooth: Grounding Text-to-Image Customization

Zhexiao Xiong¹, Wei Xiong², Jing Shi², He Zhang², Yizhi Song³, Nathan Jacobs¹

¹Washington University in St. Louis, ²Adobe, ³Purdue University

arXiv Code

We propose GroundingBooth, a framework for grounded text-to-image customization. GroundingBooth supports: (a) grounded single-subject customization, and (b) joint grounded customization for multi-subjects and text entities. GroundingBooth achieves prompt following, layout grounding for both subjects and background objects, and identity preservation of subjects simultaneously.

Abstract

Recent approaches in text-to-image customization have primarily focused on preserving the identity of the input subject, but often fail to control the spatial location and size of objects. We introduce GroundingBooth, which achieves zero-shot, instance-level spatial grounding on both foreground subjects and background objects in the text-to-image customization task. Our proposed grounding module and subject-grounded cross-attention layer enable the creation of personalized images with accurate layout alignment, identity preservation, and strong text-image coherence. In addition, our model seamlessly supports personalization with multiple subjects. Our model shows strong results in both layout-guided image synthesis and text-to-image customization tasks.

Multi-concept customization on DreamBench objects. Results show that our method achieves joint foreground-background control with text alignment and identity preservation of foreground objects. Our model seamlessly supports the customization of multiple subjects. Even when the bounding boxes of the foreground objects have a large overlap with the background text entities, the model can distinguish subject-driven foreground generation from text-driven background generation.

Visual comparison with existing methods on DreamBench objects for the single-subject customization task. Previous non-grounding based customization methods are inclined to generate objects that are very large and in the center of the image, which gains benefit in CLIP-I score and DINO score during evaluation. However, in real-world scenarios, users may expect to flexibly control the size of the subject in the generated images. They may choose to generate larger background with broader textual information, where, in such cases, non-grounding customization methods cannot generate the desired result. The visual results demonstrate that our results achieves better identity preservation performance with accurate layout-alignment.

Visaul results of reference-guided image generation with complex layout and text entities as conditions on COCO validation set. Results show that even if we input complex layouts and text entities to the model, our model can still generate high-quality scenes with precise layout alignment of all the objects and regions, and accurate identity preservation of the reference object, while preserving the text-alignment. Compared with previous layout-to-image generation methods, our model has a competitive accuracy in grounding the visual concepts and remarkable improvement on identity preservation.

Method

(a) grounded single-subject customization, and (b) joint grounded customization for multi-subjects and text entities. GroundingBooth achieves prompt following, layout grounding for both subjects and background objects, and identity preservation of subjects simultaneously.

Grounding Module of our proposed framework. Our grounding module takes both the prompt-layout pairs and reference object-layout pairs as input. For the foreground reference object, both CLIP text token and the DINOv2 image class token are utilized.

Pipeline of our proposed masked cross-attention. Q, K, and V are image query, key, and value respectively, and A is the affinity matrix.

More Results

More results on complex scene generation on COCO validation set.

This page was built using the Academic Project Page Template which was adopted from the Nerfies project page.
This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source