The repository provides code for the paper "PUGS: Zero-shot Physical Understanding with Gaussian Splatting".
- π Our paper have been accepted by ICRA 2025 π
pugs.mp4
- Some qualitative results
qualitative_result.mp4
We recommend using conda to install the dependencies.
conda env create -f environment.yml
conda activate pugsFollow the NeRF2Physics, we also use the ABO-500 dataset for testing. You can download the dataset here. The data should be organized as follows:
data
βββ abo_500
βββ scenes
β βββ scene0000
β β βββ images
β β β βββ image0000.jpg
β β β βββ ...
β β βββ transforms.json
β βββ ...
βββ filtered_product_weights.json
βββ splits.jsonIf you want to use your own data, you should organize the data in the same way or other formats which can be parsed by the scene/dataset_readers.py.
In our reconstruction pipeline, we use SAM to get regions of the object. In default, we use the public ViT-H model for SAM. You can download the model from here and put it under the ./submodules/segment-anything/sam_ckpt/ directory.
Our method uses VLM to predict the physical properties of the object. During the inference, we use OpenAI API to get the physical properties. You need to get an OpenAI API key and put it in the my_api_key.py file. You can get the key from OpenAI, and set a variable named OPENAI_API_KEY in the my_api_key.py file.
echo "OPENAI_API_KEY = '<yourkey>'" >> ./my_api_key.pyOur pipeline is shown below, each step is a separate python script. Related arguments can be found in the settings.py.
Firstly, we use 3DGS to reconstruct the object from multi-view images. During the training, we use Geometry-Aware Regularization Loss and Region-Aware Feature Contrastive Loss to improve the quality of the reconstruction.
python gs_reconstruction.pyWe use VLM to predict the physical properties of the object. You can specify the property name using --property_name. And you can specify the type of the inference using --proposal_type. If you want to use VLM to predict the physical properties of the object, you can specify the type of the inference as gpt4o or gpt4v.
python material_proposal.py --property_name <property-name> --mats_save_name info --proposal_type <gpt4o|gpt4v>We also provide a text-reasoning based inference. You can specify the type of the inference as text-reasoning to use this mode. The text-reasoning based inference means two-stage inference. First, we use VLM to generate the caption of the object. Then, we use LLM to predict the physical properties. Therefore, you need to specify the name of the saved caption using --caption_load_name.
python material_proposal.py --property_name <property-name> --caption_load_name info --mats_save_name info --proposal_type text-reasoning This step gets the clip feature for source points, which will be used for property propagation.
python clip_feature_fusion.pyThen we can predict the physical properties of the object. The following command specify the prediction mode as grid, which can get dense prediction result of the physical properties about specific property.
python predict_property.py --mats_load_name info --property_name <property-name> --prediction_mode gridIf you want to predict the object-level physical properties, you can specify the type of the inference as integral, and specify the method for volume estimation in volume_method.
python predict_property.py --mats_load_name info --property_name <property-name> --prediction_mode integral --volume_method gaussian --preds_save_name mass We also provide some other utilities for evaluation and visualization.
This script evaluates the predictions. You can specify the path to the predictions and ground truth using --preds_json_path and --gts_json_path.
python evaluation.py --preds_json_path <path-to-predictions> --gts_json_path <path-to-ground-truth>For the reconstruction results, you can use the following command to visualize the results.
python visualization.py --scene_name <scene-name> --property_name <property-name> --value_low <value-low> --value_high <value-high>You can use the following command for video rendering, you can render the 360 degree video about the reconstructed object.
python video_render.py -m <path-to-model> --render_path --export_trajSome parts of the code are borrowed from NeRF2Physics, SegAnyGaussians, PGSR and 2DGS. We thank the authors for their great work.
