MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

This repository contains the official implementation for MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

Project Page | Paper | Weights | Dataset | Rendering_Scripts

Install

conda create -n movis python=3.9
conda activate movis
cd MOVIS
pip install -r requirements.txt
git clone https://github.com/CompVis/taming-transformers.git
pip install -e taming-transformers/
git clone https://github.com/openai/CLIP.git
pip install -e CLIP/

Download the checkpoint and put it under MOVIS.

Single-Image inference

bash eval_single.sh

Revise the parameters within the script accordingly if one wants to change example. We use SAM and Depth-FM for getting estimated mask and depth. The background area in the depth map should be cropped out.

Dataset inference

Download C_Obj or C3DFS_test split for benchmarking.

bash eval_batch_3d.sh
bash eval_batch_cobj.sh

You should revise the dataset path in the configs/inference_cobj.yaml and configs/inference_c3dfs.yaml file (data-params-root_dir) before running the training script.

Note that we provide the models used in C_Obj as well, if you only want to use the renderings for benchmarking, please change the path to the renderings folder.

Training

Download image-conditioned stable diffusion checkpoint released by Lambda Labs:

wget https://cv.cs.columbia.edu/zero123/assets/sd-image-conditioned-v2.ckpt

Download the dataset from here, the dataset structure should be like this:

MOVIS-train/
    000000_004999/
        0/
        1/
        ...
    095000_099999/
    train_path.json

Run training script:

bash train.sh

One should revise the dataset path in the configs/3d_mix.yaml file (data-params-root_dir) before running the training script. Note that this training script is set for an 8-GPU system, each with 80GB of VRAM. If you have smaller GPUs, consider using smaller batch size and gradient accumulation to obtain a similar effective batch size.

Acknowledgement

This repository is based on Zero-1-to-3. We would like to thank the authors of these work for publicly releasing their code.

Citation

@article{lu2024movis,
  title={MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes},
  author={Lu, Ruijie and Chen, Yixin and Ni, Junfeng and Jia, Baoxiong and Liu, Yu and Wan, Diwen and Zeng, Gang and Huang, Siyuan},
  journal={arXiv preprint arXiv:2412.11457},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
MOVIS		MOVIS
assets		assets
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

Project Page | Paper | Weights | Dataset | Rendering_Scripts

Install

Single-Image inference

Dataset inference

Training

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Jason-aplp/MOVIS-code

Folders and files

Latest commit

History

Repository files navigation

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

Project Page | Paper | Weights | Dataset | Rendering_Scripts

Install

Single-Image inference

Dataset inference

Training

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages