You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We release 50 demonstrations each for 3 tasks StoreCerealBoxUnderShelf, PutSpatulaOnTableFromUtensilCrock, and PlaceAppleFromBowlIntoBin in the Large Behavior Model (LBM) simulation. Each demonstration includes RGB-D observations (and robot actions) from 16 different camera poses, sampled from the upper hemisphere positioned above the workstation. All data can be downloaded here.
Checkpoints for pre-trained Stable Video Diffusion (SVD) and VAE can be found here.
In addition, we release fine-tuned VAE encoders for pointmaps and RGB images on our simulation dataset here, which outputs better latent representations for the specific robotic tasks they're trained on.
Checkpoints for 4D video generation models can be found here.
Tested on 4 NVIDIA A6000 GPUs with 48GB memory each, with batch size 1. The training takes about 2 days to finish.
Run inference example
(video_policy)$ python notebooks/eval.py
Citation
If you find this codebase useful, please consider citing our work:
@article{liu2025geometry,
title={Geometry-aware 4D Video Generation for Robot Manipulation},
author={Liu, Zeyi and Li, Shuang and Cousineau, Eric and Feng, Siyuan and Burchfiel, Benjamin and Song, Shuran},
journal={arXiv preprint arXiv:2507.01099},
year={2025}
}
About
Codebase for paper "Geometry-aware 4D Video Generation for Robot Manipulation"