You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Revise the parameters within the script accordingly if one wants to change example.
We use SAM and Depth-FM for getting estimated mask and depth. The background area in the depth map should be cropped out.
You should revise the dataset path in the configs/inference_cobj.yaml and configs/inference_c3dfs.yaml file (data-params-root_dir) before running the training script.
Note that we provide the models used in C_Obj as well, if you only want to use the renderings for benchmarking, please change the path to the renderings folder.
Training
Download image-conditioned stable diffusion checkpoint released by Lambda Labs:
One should revise the dataset path in the configs/3d_mix.yaml file (data-params-root_dir) before running the training script.
Note that this training script is set for an 8-GPU system, each with 80GB of VRAM. If you have smaller GPUs, consider using smaller batch size and gradient accumulation to obtain a similar effective batch size.
Acknowledgement
This repository is based on Zero-1-to-3. We would like to thank the authors of these work for publicly releasing their code.
Citation
@article{lu2024movis,
title={MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes},
author={Lu, Ruijie and Chen, Yixin and Ni, Junfeng and Jia, Baoxiong and Liu, Yu and Wan, Diwen and Zeng, Gang and Huang, Siyuan},
journal={arXiv preprint arXiv:2412.11457},
year={2024}
}
About
Official implementation of CVPR 2025 paper "MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes"