| CARVIEW |
Select Language
HTTP/2 200
last-modified: Thu, 01 Aug 2024 00:39:19 GMT
cache-control: max-age=3600
content-type: text/html; charset=utf-8
content-security-policy: frame-ancestors 'none'
x-frame-options: SAMEORIGIN
x-cloud-trace-context: 53e0c993adc55c6c1428970d7aeff57f
server: Google Frontend
via: 1.1 google, 1.1 varnish, 1.1 varnish
accept-ranges: bytes
age: 582180
date: Thu, 01 Jan 2026 01:41:17 GMT
x-served-by: cache-lga21949-LGA, cache-bom-vanm7210043-BOM
x-cache: HIT, MISS
x-timer: S1767231678.655147,VS0,VE207
content-length: 47794
[2312.03048] DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
Skip to main content
[v1] Tue, 5 Dec 2023 18:34:12 UTC (9,869 KB)
[v2] Mon, 8 Apr 2024 08:59:24 UTC (12,271 KB)
[v3] Wed, 31 Jul 2024 13:02:51 UTC (41,022 KB)
In just 5 minutes help us improve arXiv:
Annual Global Survey
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors.
Donate
Computer Science > Computer Vision and Pattern Recognition
arXiv:2312.03048 (cs)
[Submitted on 5 Dec 2023 (v1), last revised 31 Jul 2024 (this version, v3)]
Title:DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
Authors:Yuru Jia, Lukas Hoyer, Shengyu Huang, Tianfu Wang, Luc Van Gool, Konrad Schindler, Anton Obukhov
View a PDF of the paper titled DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control, by Yuru Jia and 6 other authors
View PDF
HTML (experimental)
Abstract:Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content, specialize to user data through few-shot fine-tuning, and condition their output on other modalities, such as semantic maps. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this question in the context of autonomous driving, and answer it with a resounding "yes". We propose an efficient data generation pipeline termed DGInStyle. First, we examine the problem of specializing a pretrained LDM to semantically-controlled generation within a narrow domain. Second, we propose a Style Swap technique to endow the rich generative prior with the learned semantic control. Third, we design a Multi-resolution Latent Fusion technique to overcome the bias of LDMs towards dominant objects. Using DGInStyle, we generate a diverse dataset of street scenes, train a domain-agnostic semantic segmentation model on it, and evaluate the model on multiple popular autonomous driving datasets. Our approach consistently increases the performance of several domain generalization methods compared to the previous state-of-the-art methods. The source code and the generated dataset are available at this https URL.
| Comments: | ECCV 2024, camera ready |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2312.03048 [cs.CV] |
| (or arXiv:2312.03048v3 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2312.03048
arXiv-issued DOI via DataCite
|
Submission history
From: Yuru Jia [view email][v1] Tue, 5 Dec 2023 18:34:12 UTC (9,869 KB)
[v2] Mon, 8 Apr 2024 08:59:24 UTC (12,271 KB)
[v3] Wed, 31 Jul 2024 13:02:51 UTC (41,022 KB)
Full-text links:
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
View a PDF of the paper titled DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control, by Yuru Jia and 6 other authors
References & Citations
export BibTeX citation
Loading...
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.