CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 23 Dec 2025 05:16:17 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"694a25a1-498c" expires: Sun, 28 Dec 2025 21:26:53 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: D168:2DDCFF:801CB3:8FBFBC:69519E45 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 21:16:53 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210074-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766956614.645186,VS0,VE205 vary: Accept-Encoding x-fastly-request-id: a1cf6cdfce2ee5f3c49faafa0badb14a66db737c content-length: 4384 TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation

TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation

Hongwei Fan^1,2*, Hang Dai^1,2*, Jiyao Zhang^1,2*, Jinzhou Li^1,2, Qiyang Yan^1,2,

Yujie Zhao^1,2, Mingju Gao³, Jinghang Wu^1,2, Hao Tang³ and Hao Dong^1,2

(* indicates equal contribution)

¹CFCS, School of Computer Science, Peking University, ²PKU-AgiBot Lab,

³State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University

arXiv Code

Video

Abstract

The robotics field is evolving towards data-driven, end-to-end learning, inspired by multimodal large models. However, reliance on expensive real-world data limits progress. Simulators offer cost-effective alternatives, but the gap between simulation and reality challenges effective policy transfer.

This paper introduces TwinAligner, a novel Real2Sim2Real system that addresses both visual and dynamic gaps. The visual alignment module achieves pixel-level alignment through SDF reconstruction and editable 3DGS rendering, while the dynamic alignment module ensures dynamic consistency by identifying rigid physics from robot-object interaction.

TwinAligner improves robot learning by providing scalable data collection and establishing a trustworthy iterative cycle, accelerating algorithm development. Quantitative evaluations highlight TwinAligner's strong capabilities in visual and dynamic real-to-sim alignment. This system enables policies trained in simulation to achieve strong zero-shot generalization to the real world. The high consistency between real-world and simulated policy performance underscores TwinAligner's potential to advance scalable robot learning.

Physics-aware Sim2Real Learning

Tab. III. Success rates of Sim2Real policy learning. ''-'' means that the method does not support the corresponding task. Our method (1) covers all four types of tasks; (2) surpasses the two baselines for all tasks, which is consistent with our high visual-dynamic Real2Sim consistency.

Baseline Comparison

Pushing Milk Box

Stacking Biscuit Boxes

Pick-and-Place

Method

Fig. 2. Overview of TwinAligner. (1) Mesh-GS Digital Twin jointly reconstructs the detailed visual appearance and geometry. (2) Visual-Dynamic Real2Sim Alignment aligns the real-world dynamics interaction by jointly estimating the camera viewpoint, robot controller, and object rigid physics. (3) Sim2Real Policy Learning is performed on the aligned simulation environment with imitation learning policies, which achieves a reduced Sim2Real gap in both regular and physics-aware robot manipulation scenarios.

Visual Real2Sim

Fig. 4. Comparison of geometry reconstruction quality. Our method reconstruct watertight and detailed meshes, while the baseline results contain inaccurate depths, glitches, and holes.

Tab. II. Comparison of object rendering quality, measured with PSNR. Higher is better. our method achieves the highest PSNR on average, thanks to the training of 3DGS on mesh reconstruction.

Dynamic Real2Sim

The effectiveness of our visual-dynamic Real2Sim alignment. For the robot-object interaction trajectories, we compare real-world camera observation with physics simulation and 3DGS rendering results from TwinAligner. Our method strictly aligns visual-dynamic gap at the pixel level.

Tab. IV. Comparison of dynamic Real2Sim on our method and PIN-WM. Dynamics learned by TwinAligner achieve better alignment compared to those inferred by PIN-WM.

BibTeX

@article{fan2025twinaligner,
  author    = {Hongwei Fan and Hang Dai and Jiyao Zhang and Jinzhou Li and Qiyang Yan and Yujie Zhao and Mingju Gao and Jinghang Wu and Hao Tang and Hao Dong},
  title     = {TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation},
  year={2025},
  eprint={2512.19390},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2512.19390},
}

Join Us

We are hiring innovative undergraduate/graduate students, working on robotics and world model. If you are interested, don't hesitate to send E-mail to: hwfan25[at]stu.pku.edu.cn or hao.dong[at]pku.edu.cn

Original Source | Taken Source