CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://jonasschult.github.io/ControlRoom3D/ x-github-request-id: 8868:2B0FD4:84ED25:9560BA:695209CD accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 04:55:43 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210063-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766984144.547027,VS0,VE200 vary: Accept-Encoding x-fastly-request-id: 5ed53fc1b50e03e0a8f6687d1ce0a65bb4c1ef51 content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 27 Feb 2024 14:11:23 GMT access-control-allow-origin: * etag: W/"65dded8b-4614" expires: Mon, 29 Dec 2025 05:05:43 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 6612:328FD3:84C474:953742:695209CF accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 04:55:44 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210063-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766984144.766166,VS0,VE243 vary: Accept-Encoding x-fastly-request-id: bb12421862390745e7480fb34c41f4b3eae700e5 content-length: 4526 ControlRoom3D 🤖

ControlRoom3D 🤖
Room Generation using
Semantic Proxy Rooms

published at CVPR 2024

Jonas Schult^1,2,*, Sam Tsai¹, Lukas Höllein^1,3*, Bichen Wu¹, Jialiang Wang¹, Chih-Yao Ma¹, Kunpeng Li¹, Xiaofang Wang¹, Felix Wimbauer^1,3*, Zijian He¹, Peizhao Zhang¹, Bastian Leibe², Peter Vajda¹, Ji Hou¹

¹Meta GenAI, ²RWTH Aachen University, ³Technical University of Munich

^*Work performed during internship at Meta GenAI.

Paper arXiv Video Code

ControlRoom3D creates diverse and plausible 3D room meshes aligning well with user-defined room layouts and textual descriptions of the room style.

Abstract

Manually creating 3D environments for AR/VR applications is a complex process requiring expert knowledge in 3D modeling software. Pioneering works facilitate this process by generating room meshes conditioned on textual style descriptions. Yet, many of these automatically generated 3D meshes do not adhere to typical room layouts, compromising their plausibility, e.g., by placing several beds in one bedroom. To address these challenges, we present ControlRoom3D, a novel method to generate high-quality room meshes. Central to our approach is a user-defined 3D semantic proxy room that outlines a rough room layout based on semantic bounding boxes and a textual description of the overall room style. Our key insight is that when rendered to 2D, this 3D representation provides valuable geometric and semantic information to control powerful 2D models to generate 3D consistent textures and geometry that aligns well with the proxy room. Backed up by an extensive study including quantitative metrics and qualitative user evaluations, our method generates diverse and globally plausible 3D room meshes, thus empowering users to design 3D rooms effortlessly without specialized knowledge.

Video

Paper

Animations

Geometry Alignment

Scale ambiguity leads to significant inaccuracies in state-of-the-art metric depth estimators such as ZoeDepth. In contrast, our proposed depth alignment module iteratively optimizes the alignment loss to achieve strong alignment with the proxy room.

No Optimization

After Depth Alignment

SAM Masks

We leverage SAM to obtain pixel-precise instance masks for each object. For pixels located within the rendered bounding box but outside the SAM mask, we assign the near depth value to the far depth. Including SAM masks leads to sharper 3D object boundaries, resulting in a more seamless integration into the 3D room mesh.

(Hover over image to see the effect.)

Normal Loss

Although the depth alignment loss effectively aligns the frame with the 3D proxy room, it may occasionally distort the surface of objects to fit them within their bounding boxes. To counter this, we introduce the normal preservation loss, retaining the original shape of the objects.