| CARVIEW |
TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation
Video
Abstract
The robotics field is evolving towards data-driven, end-to-end learning, inspired by multimodal large models. However, reliance on expensive real-world data limits progress. Simulators offer cost-effective alternatives, but the gap between simulation and reality challenges effective policy transfer.
This paper introduces TwinAligner, a novel Real2Sim2Real system that addresses both visual and dynamic gaps. The visual alignment module achieves pixel-level alignment through SDF reconstruction and editable 3DGS rendering, while the dynamic alignment module ensures dynamic consistency by identifying rigid physics from robot-object interaction.
TwinAligner improves robot learning by providing scalable data collection and establishing a trustworthy iterative cycle, accelerating algorithm development. Quantitative evaluations highlight TwinAligner's strong capabilities in visual and dynamic real-to-sim alignment. This system enables policies trained in simulation to achieve strong zero-shot generalization to the real world. The high consistency between real-world and simulated policy performance underscores TwinAligner's potential to advance scalable robot learning.
Physics-aware Sim2Real Learning
Tab. III. Success rates of Sim2Real policy learning. ''-'' means that the method does not support the corresponding task. Our method (1) covers all four types of tasks; (2) surpasses the two baselines for all tasks, which is consistent with our high visual-dynamic Real2Sim consistency.
Baseline Comparison
Pushing Milk Box
Stacking Biscuit Boxes
Pick-and-Place
Method
Fig. 2. Overview of TwinAligner. (1) Mesh-GS Digital Twin jointly reconstructs the detailed visual appearance and geometry. (2) Visual-Dynamic Real2Sim Alignment aligns the real-world dynamics interaction by jointly estimating the camera viewpoint, robot controller, and object rigid physics. (3) Sim2Real Policy Learning is performed on the aligned simulation environment with imitation learning policies, which achieves a reduced Sim2Real gap in both regular and physics-aware robot manipulation scenarios.
Visual Real2Sim
Fig. 4. Comparison of geometry reconstruction quality. Our method reconstruct watertight and detailed meshes, while the baseline results contain inaccurate depths, glitches, and holes.
Tab. II. Comparison of object rendering quality, measured with PSNR. Higher is better. our method achieves the highest PSNR on average, thanks to the training of 3DGS on mesh reconstruction.
Dynamic Real2Sim
The effectiveness of our visual-dynamic Real2Sim alignment. For the robot-object interaction trajectories, we compare real-world camera observation with physics simulation and 3DGS rendering results from TwinAligner. Our method strictly aligns visual-dynamic gap at the pixel level.
Tab. IV. Comparison of dynamic Real2Sim on our method and PIN-WM. Dynamics learned by TwinAligner achieve better alignment compared to those inferred by PIN-WM.
BibTeX
@article{fan2025twinaligner,
author = {Hongwei Fan and Hang Dai and Jiyao Zhang and Jinzhou Li and Qiyang Yan and Yujie Zhao and Mingju Gao and Jinghang Wu and Hao Tang and Hao Dong},
title = {TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation},
year={2025},
eprint={2512.19390},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2512.19390},
}
Join Us
We are hiring innovative undergraduate/graduate students, working on robotics and world model. If you are interested, don't hesitate to send E-mail to: hwfan25[at]stu.pku.edu.cn or hao.dong[at]pku.edu.cn