| CARVIEW |
ControlRoom3D 🤖
Room Generation using
Semantic Proxy
Rooms
published at CVPR 2024
Abstract
Manually creating 3D environments for AR/VR applications is a complex process requiring expert knowledge in 3D modeling software. Pioneering works facilitate this process by generating room meshes conditioned on textual style descriptions. Yet, many of these automatically generated 3D meshes do not adhere to typical room layouts, compromising their plausibility, e.g., by placing several beds in one bedroom. To address these challenges, we present ControlRoom3D, a novel method to generate high-quality room meshes. Central to our approach is a user-defined 3D semantic proxy room that outlines a rough room layout based on semantic bounding boxes and a textual description of the overall room style. Our key insight is that when rendered to 2D, this 3D representation provides valuable geometric and semantic information to control powerful 2D models to generate 3D consistent textures and geometry that aligns well with the proxy room. Backed up by an extensive study including quantitative metrics and qualitative user evaluations, our method generates diverse and globally plausible 3D room meshes, thus empowering users to design 3D rooms effortlessly without specialized knowledge.
Video
Animations
Geometry Alignment
Scale ambiguity leads to significant inaccuracies in state-of-the-art metric depth estimators such as ZoeDepth. In contrast, our proposed depth alignment module iteratively optimizes the alignment loss to achieve strong alignment with the proxy room.
No Optimization
After Depth Alignment
SAM Masks
We leverage SAM to obtain pixel-precise instance masks for
each object.
For pixels located within the rendered bounding box but outside the SAM mask, we assign the near depth
value to the far depth.
Including SAM masks leads to sharper 3D object boundaries, resulting in a more seamless integration
into the 3D room mesh.
Normal Loss
Although the depth alignment loss effectively aligns the frame with the 3D proxy room, it may
occasionally distort the surface of objects to fit them within their bounding boxes.
To counter this, we introduce the normal preservation loss, retaining the original shape of the
objects.
BibTeX
@inproceedings{schult24controlroom3d,
author = {Schult, Jonas and Tsai, Sam and H\"ollein, Lukas and Wu, Bichen and Wang, Jialiang and Ma, Chih-Yao and Li, Kunpeng and Wang, Xiaofang and Wimbauer, Felix and He, Zijian and Zhang, Peizhao and Leibe, Bastian and Vajda, Peter and Hou, Ji},
title = {ControlRoom3D: Room Generation using Semantic Proxy Rooms},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024},
}
