CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://xshenhan.github.io/Re3Sim/ x-github-request-id: EC4C:1387E:8A244A:9B3D66:69525EDC accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 10:58:37 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210041-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767005917.922021,VS0,VE200 vary: Accept-Encoding x-fastly-request-id: e18c9535d304e18ec27cbfc8caa10f4fc6292c03 content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 18 Feb 2025 11:16:58 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"67b46c2a-a08d" expires: Mon, 29 Dec 2025 11:08:37 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 177A:2BC55:8B0DAB:9C263C:69525EDC accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 10:58:37 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210041-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767005917.151637,VS0,VE218 vary: Accept-Encoding x-fastly-request-id: 9aed26165d89dd18a3f47a3402b537d6f34b58c8 content-length: 8175 Re3Sim

RE³SIM: Generating High-Fidelity Simulation Data via
3D-Photorealistic Real-to-Sim for Robotic Manipulation

We introduced RE³SIM, a novel Real-to-Sim-to-Real pipeline that integrates Gaussian splatting with NVIDIA Isaac Sim's PhysX engine, improving scene reconstruction and sim-to-real transfer for robotic manipulation tasks.

Highlights:

High-fidelity geometry and vision: small sim-to-real gaps in both geometry and visual aspects.
Highly efficient data collection: scene reconstruction in ~2.5 minutes and simulation data at 100 episodes per 10 minutes.
Zero-shot sim-to-real transfer: limited simulation data brings high success rates.

Key Observation:

Scaling law: Increasing the simulation data scale can enhance the success rate until it converges at a high-performance level.
Mixing Sim-Real: Co-training real-world data can integrate the characteristics of both datasets.

Xiaoshen Han Minghuan Liu ^{^} Yilun Chen^{^ †} Junqiu Yu Xiaoyang Lyu Yang Tian Bolun Wang
Weinan Zhang Jiangmiao Pang^†

Shanghai Jiao Tong University Shanghai AI Lab The University of Hong Kong

^{^}Project Lead ^†Corresponding author

arXiv Paper Code Video Bilibili Summary

➤ Real-to-Sim-to-Real for Diverse Robotic Manipulation Tasks

Note: Four tasks with individual policies are used to validate the effectiveness of RE³SIM.

Visual Comparison: Low Vision Gap

Background Rendering	PSNR	SSIM
Polycam	11.52 ± 1.40	0.34 ± 0.04
OpenMVS	13.40 ± 0.96	0.27 ± 0.03
3DGS	13.29 ± 1.11	0.37 ± 0.04

Note: We manually aligned the objects with those in the simulation, but noticeable pixel-level discrepancies remain. The background alignment also has some pixel-level deviations. These factors collectively lead to the relatively low PSNR and SSIM values of all methods, especially in the texture-rich scene.

Note: 3DGS outperforms Polycam in both RSNR and SSIM. Its PSNR is comparable to OpenMVS, but SSIM is notably higher. OpenMVS's reconstruction has cracks, causing an obvious sim-to-real gap. The qualitative and quantitative results demonstrate that RE³SIM is capable of producing high-quality and well-aligned reconstruction results, making zero-shot sim-to-real transfer possible.

Zero-Shot Sim-to-Real

Note: RE³SIM can generate high-quality simulation data for training generalizable robotic policies by zero-shot sim-to-real transfer. Here are the videos of the real-world experiments of tasks pick and drop a bottle into the basket, place a vegetable on the board, stack blocks and clear objects on the table. All videos are played at normal speed.

Pick and drop a bottle into the basket

Place a vegetable on the board

Stack blocks

Clear objects on the table

Real-to-Sim-to-Real Efficiency

Note: human effort in reconstruction. The table presents estimated reconstruction times at the table level. Additionally, we show the human effort for reconstructing an object with ARCode.

Input Types	Video	Images	ARCode
Human Efforts (s)	51.5	84.5	60.5

Note: time cost for simulation data collection. Time needed to collect 100 episodes of simulation data for each task, using a machine equipped with 8 RTX 4090 GPUs.

Tasks	Time Cost (minutes)
Pick and drop a bottle into the basket	12.35
Place a vegetable on the board	13.78
Stack blocks	6.45

Large-Scale Sim-to-Real

Note: To push the limit of utilizing synthetic data for real-world manipulation problems, we choose a clear objects on the table task and evaluate the generalizability of a policy trained on a large-scale simulation dataset.

Note: Doubling the data size often results in a large improvement in success rate until convergence.

Note: A large dataset enables the policy to exhibit some robustness to variations in objects or lighting.

➤ Comparison over Simulated and Real Data

Note: Real-world and simulation data often exhibit variations in both distribution and quality, because of differences in scene initialization methods and trajectory preferences between human operators and the rule-based policy.

Object Location

Note: Despite efforts to randomize object positions, data distributions differ slightly due to the challenge of achieving true randomness in real-world settings.

Data Quality

• In simulation, the motion planner tends to take the shortest path, resulting in shorter trajectories but with larger angular variations.
• Longer trajectories may include more pauses, which can negatively impact model training due to reduced action continuity. This is more often observed in real-world data.

➤ Co-training and Fine-tuning

Note: Left: Kernel Density Estimate (KDE) of the Euclidean distance traveled by the robotic arm's end effector between adjacent time steps. Right: The number of time steps taken by the robotic arm from the start of movement to the first closure of the gripper. "Sim" and "Real" indicate models trained on simulated and real data, respectively, while "Co-train" and "Fine-tune" refer to models trained on a mix of data and pre-trained with real data, respectively.

Note: The distribution of simulation and real data is generally similar. The data generated by our method can be integrated into real data through pretraining or co-training, introducing new features without causing the training process to collapse.

➤ More Details

Framework

RE³SIM leverages 3D reconstruction and a physics-based simulator, providing small 3D gaps that enable large-scale simulation data generation for learning manipulation skills via sim-to-real transfer. We first reconstruct the background and the objects of the scene separately, and then align them with the robot in the real world. Then high-quality simulation data can be generated in the reconstructed simulator, which can be used to train a policy that can be transferred to the real world.