| CARVIEW |
Justin Yu*, Letian Fu*, Huang Huang, Karim El-Refai, Rares Ambrus, Richard Cheng, Muhammad Zubair Irshad, Ken Goldberg
UC Berkeley, Toyota Research Institute* Equal contribution
Abstract ▾(click to expand)
Real Robot Rollouts
We train and evaluate two modern robot visuomotor policies (π0-FAST and Diffusion Policy) on either only rendered data generated by Real2Render2Real or only human teleoperated data across 5 manipulation tasks.
Performance Scaling
Comparative analysis of imitation-learning policies trained on R2R2R-generated data against human teleoperation data across 1050 physical robot experiments suggest that while real data is higher quality and more efficient per demonstration, R2R2R’s generation enables scaling trajectory diversity far beyond human throughput, achieving competitive final performance with less collection effort.
Scan, Track, Render
The distinction we make between simulation and rendering is often a point of confusion:
When we refer to simulation, we mean the use of a physics engine to computationally model dynamic interactions. In contrast, rendering refers to generating visual data from a graphics engine.
Why No Dynamics Simulation? ▾(click to expand)
In early experiments, we explored physics engines for real-to-sim-to-real data generation but found that with imperfect or unrefined real-to-sim assets, simulated dynamics often diverged from real-world behavior—especially in gripper-object interactions, where issues like interpenetration and unrealistic collisions were common. Still, we wanted to pursue scalable, high-quality data generation through computation. To that end, we use IsaacLab while disregarding its collision computation features, relying on it solely for photorealistic rendering. Object motion is grounded by distilling object-centric dynamics from real-world demonstration videos and object visual appearance is distilled from high-fidelity 3D reconstructions.
This paper is not a critique of physics engines or their role in robot manipulation, but rather a positive result: computational data generation can scale effectively even without yet simulating dynamics!
In early experiments, we explored physics engines for real-to-sim-to-real data generation but found that with imperfect or unrefined real-to-sim assets, simulated dynamics often diverged from real-world behavior—especially in gripper-object interactions, where issues like interpenetration and unrealistic collisions were common. Still, we wanted to pursue scalable, high-quality data generation through computation. To that end, we use IsaacLab while disregarding its collision computation features, relying on it solely for photorealistic rendering. Object motion is grounded by distilling object-centric dynamics from real-world demonstration videos and object visual appearance is distilled from high-fidelity 3D reconstructions. This paper is not a critique of physics engines or their role in robot manipulation, but rather a positive result: computational data generation can scale effectively even without yet simulating dynamics!
Rendering More Embodiments
Part trajectories from a single demonstration can be retargeted across different robot embodiments.
Domain Randomization
We randomize initial object poses, lighting, and camera poses to generate diverse synthetic rollouts for each object-task combination.
Trajectory Interpolation
From a single demonstration, R2R2R generates a distribution of plausible trajectories by interpolating 6-DoF part motion.
Full Project Video
BibTeX
@misc{yu2025real2render2realscalingrobotdata,
title={Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware},
author={Justin Yu and Letian Fu and Huang Huang and Karim El-Refai and Rares Andrei Ambrus and Richard Cheng and Muhammad Zubair Irshad and Ken Goldberg},
year={2025},
eprint={2505.09601},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2505.09601},
}