You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ScanReason is the first comprehensive and hierarchical 3D reasoning grounding benchmark. We define 5 types of questions depending on which type of reasoning is required: Spatial reasoning and function reasoning require fundamental understanding of the 3D physical world, focusing on objects themselves and inter-object spatial relationships in a 3D scene respectively, and logistic reasoning, emotional reasoning, and safety reasoning are high-level reasoning skills built upon the two fundamental reasoning abilities to address user-centric real-world applications.
Model Overview
🔥 News
[2023-10-10] We release our pre-version of ScanReason validation benchmark. Download here. The corresponding 3D bounding boxes annotations could be obtained through the object ids from EmbodiedScan.
[2023-10-01] We release the training and inference codes of ReGround3D.
Follow EmbodiedScan Data Preparation Doc to download the raw scan (RGB-D) datasets and modify the VIDEO_FOLDER in train_ds.sh to the raw data path.
Download the text annotations from Google Drive and modify the JSON_FOLDER in train_ds.sh to the annotations path, and modify the INFO_FILE data path which is included in the annotations.
3. Training ReGround3D
We provide the slurm training script with 4 A100 GPUs:
./scripts/train_ds.sh
4. Evaluation ReGround3D
After training, you can run the
./scripts/convert_zero_to_fp32.sh
to convert the weights to pytorch_model.bin file, and then use
./scripts/merge_lora_weights.sh
to merge lora weight and obtain the final checkpoints under ReGround3D-7B.