Official Implementation for Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
[07/22/2025] We released the training code and scripts to generate training data from TRELLIS
[07/07/2025] We released our pipeline code, pre-trained model and the GSO testset
-
Release our real-world dataset
-
Release the training code and the scripts to generate training data from TRELLIS
-
Release the pipeline code and pre-trained model
Following the steps in this section, you can run our whole pipeline to reconstruct the test videos simulated from GSO dataset.
-
Install the dependencies (we have tested the scripts on Ubuntu 22.04 + Nvidia H100 / Nvidia RTX 4090)
conda create -n vid2sim python=3.10 conda activate vid2sim2 bash setup.sh
Note: If you need to use different torch or cudatoolkit version (for correctly building other libraries), please also install the compatible version of kaolin and torch-cluster
-
Download the test dataset and checkpoints (including pre-trained models and LBS template network), unzip and put them into
datasetandcheckpoints. The folder structure should beVid2Sim |-- dataset └-- GSO |-- backpack |-- ... └-- turtle |-- checkpoints |-- ckpt_lbs_template.pth |-- ckpt_lgm.safetensors └-- ckpt_phys_predictor.pth
-
Run the script to reconstruct the toy bus case.
python run_pipeline.py --data_name bus
This script will run our 2-stage pipeline to reconstruct the to appearance, geometry and physics from input videos. The script follows a single default config file
config/gso.yamlthat works for all the cases in this test set. You can modify the config as you need when you use your own data.The frames will be generated at
outputs/bus(left video is ground-truth and right video is reconstruction)
Our feed-forward predictor was trained on 50k simulated animations using high-quality objects from the Objaverse dataset.
Due to policy restrictions, we are not able to release the originals objaverse object IDs we used for our model. Nevertheless, TRELLIS provides filtered high-quality Objaverse object list which can be used as a good subtitution. Here we provide a step-by-step tutorial for creating the training dataset using TRELLIS data
-
Download the objaverse sketchfab dataset (see here for more detail about the TRELLIS dataset)
python dataset_toolkits/build_metadata.py ObjaverseXL --source sketchfab --output_dir dataset/objaverse python dataset_toolkits/download.py ObjaverseXL --output_dir dataset/objaverse
-
Process the data list and simulate animations with random physical parameters
python dataset_toolkits/process_objaverse_dataset.py --task process python dataset_toolkits/process_objaverse_dataset.py --task simulate --start_idx 0 --end_idx 1 # Only simulate 1 object as an example -
Render the animation (if it fails to render a large amount of renderings, you can write another script to run this script multiple times)
python dataset_toolkits/render_objaverse_dataset.py --start_idx 0 --end_idx 1 # Only render 1 object as an exampleAfter generation, the objaverse dataset structure should be like
Vid2Sim |-- dataset |-- objaverse |-- ... |-- outputs |-- ... |-- 0a81d18db3c947fbbdc8d60edd1ef323 |--meshes |--models |--renderings |--gt_phys_params.yaml └--mesh.glb
-
Train the feed-forward predictor with rendered animations (you are also recommended to use 224x224 resolution for training since it's much faster to train and performs good)
python train_preprocessing.py python train_predictor.py
We acknowledge the following repositories for borrowing the codes:
Simplicits: https://github.com/NVIDIAGameWorks/kaolin/tree/master/examples/tutorial/physics
LGM: https://github.com/3DTopia/LGM
3DGS https://github.com/graphdeco-inria/gaussian-splatting
If you find this repository useful in your project, welcome to cite our work :)
@inproceedings{chen2025vid2sim,
title={Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation},
author={Chen, Chuhao and Dou, Zhiyang and Wang, Chen and Huang, Yiming and Chen, Anjun and Feng, Qiao and Gu, Jiatao and Liu, Lingjie},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={26545--26555},
year={2025}
}
