Carview!

HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Thu, 28 Nov 2024 02:05:44 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"6747cff8-3200" expires: Mon, 29 Dec 2025 12:19:17 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: AE2B:36A0B4:8C8C42:9DC5BB:69526F6D accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 12:09:17 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210029-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767010158.592328,VS0,VE215 vary: Accept-Encoding x-fastly-request-id: 17747d935372d83742cef4f15d7c1cc39081c886 content-length: 3905 ROICtrl ROICtrl

ROICtrl: Boosting Instance Control for Visual Generation

Yuchao Gu¹ Yipin Zhou² Yunfan Ye² Yixin Nie² Licheng Yu²
Pingchuan Ma³ Kevin Qinghong Lin¹ Mike Zheng Shou¹

¹Show Lab, National University of Singapore ²GenAI, Meta ³MIT

[Paper (arXiv)] [Code (Github)]

📖TL;DR: ROICtrl, built on ROI-Align and the newly proposed ROI-Unpool, can extend existing diffusion models and their add-ons (e.g., ControlNet, T2I-Adapter, IP-Adapter, ED-LoRA) to support controllable multi-instance generation.

Control Diffusion Model with Free-Form Instance Caption 👇 👇 👇

Abstract

Natural language often struggles to accurately associate positional and attribute information with multiple instances, which limits current text-based visual generation models to simpler compositions featuring only a few dominant instances. To address this limitation, this work enhances diffusion models by introducing regional instance control, where each instance is governed by a bounding box paired with a free-form caption. Previous methods in this area typically rely on implicit position encoding or explicit attention masks to separate regions of interest (ROIs), resulting in either inaccurate coordinate injection or large computational overhead. Inspired by ROI-Align in object detection, we introduce a complementary operation called ROI-Unpool. Together, ROI-Align and ROI-Unpool enable explicit, efficient, and accurate ROI manipulation on high-resolution feature maps for visual generation. Building on ROI-Unpool, we propose ROICtrl, an adapter for pretrained diffusion models that enables precise regional instance control. ROICtrl is compatible with community-finetuned diffusion models, as well as with existing spatial-based add-ons (\eg, ControlNet, T2I-Adapter) and embedding-based add-ons (\eg, IP-Adapter, ED-LoRA), extending their applications to multi-instance generation. Experiments show that ROICtrl achieves superior performance in regional instance control while significantly reducing computational costs.

Comparison of Various ROI Injections

(a) Limitations of ROI Injection with Embedding:
- Inaccurate spatial alignment due to implicit box embedding.
- Attribute binding issues caused by the use of self-attention for injecting instance captions.
(b) Limitations of ROI Injection with Attention Mask:
- Computation cost is related to the feature resolution (costly because of the large feature resolution in visual generation).
- Coordinate quantization errors make it difficult to accurately inject instance captions to spatial feature map.
(c) ROI Injection with ROI-Align and ROI-Unpool (Ours):
- Computation cost is independent of the feature resolution.
- No quantization errors when injecting instance captions to spatial feature map.

Method Overview --- ROICtrl

In parallel with the pretrained global caption injection, we introduce an additional instance caption injection. The global attention output and instance attention output are then fused using learnable blending.

Applications

1. Instance Control (or Layout Control)

2. Compatible to Community-Finetuned Models

3. Compatible to Spatial-Based Add-ons (e.g., T2I-Adapter, ControlNet)

4. Compatible to Embedding-Based Add-ons (e.g., IP-Adapter, ED-LoRA)

5. Continue Generation with Local Change

Bibtex

    @article{gu2024roictrl,
        title={ROICtrl: Boosting Instance Control for Visual Generation},
        author={Gu, Yuchao and Zhou, Yipin and Ye, Yunfan and Nie, Yixin and Yu, Licheng and Ma, Pingchuan and Lin, Kevin Qinghong and Shou, Mike Zheng},
        journal={arXiv preprint arXiv:2411.17949},
        year={2024}
    }

This page was adapted from this source code.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source