You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repo encapsulates multiple sub-modules of the Real2Code pipeline.
Dataset: Overall, all modules use the same synthetic dataset of RGBD images, part-level meshes, and code snippets for joint structures for each object. We have released this dataset here, and provide processing & rendering utility scripts in data_utils/ if you want to generate your own data.
Part-level 2D-Segmentation and 3D Shape Completion: With the same set of objects, we fine-tune a 2D SAM model for part-level segmentation and train a PointNet-based model for 3D shape completion. More details on each sub-module is further documented in the READMEs in part segmentation and shape completion.
LLM Fine-tuning: We fine-tune a CodeLlama model on the code representations of our articulated objects. See this fork for LLM fine-tuning script.
Real World Evaluation See real_obj/. We use DUSt3R to achieve reconstruction from multi-view pose-free RGB images, the DUSt3R-generated 3D pointmaps are provided in the real world dataset below.
We have released the real objects data used for evaluating Real2Code. These are objects found in the common lab/household settings around Stanford campus. Raw data is captured using a LiDAR-equipped iPhone camera and the 3dScanner App
Structure: each object folder is structured as follows:
ls obj_id/
- raw/
- sam/
- a list of (id.jpg, id_mask.png, id_scene.npz),
Each id corresponds to one 512x512 RGB image selected from the raw dataset, e.g. 00000.jpg; id_mask.png is the foreground object mask obtained from prompting the SAM model with randomly sampled query points in the image margin area; id_scene.npz is the globally-aligned 3D point-cloud obtained from DUSt3R.