KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation
Project Page | Video | Arxiv
To set up the environment, please follow these steps:
-
Create the conda environment:
conda create -n kuda python=3.8 conda activate kuda pip install -r requirements.txt
-
Download the checkpoints for GroundingSAM:
cd perception/models mkdir checkpoints cd checkpoints wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth wget https://huggingface.co/lkeab/hq-sam/resolve/main/sam_hq_vit_h.pth
-
Download the checkpoints for SpaTracker:
cd ../../../dynamics/tracker mkdir checkpoints cd checkpoints pip install gdown gdown 18YlG_rgrHcJ7lIYQWfRz_K669z6FdmUX
You can manually download the checkpoints from Google Drive.
To quickly test the visual prompting functionality without setting up a robot, please follow these steps:
-
Replace the
api_keyindemo.pywith your OpenAI API key. -
Run the demo:
python demo.py
Please modify the
imgandinstructionvariables indemo.pyto experiment with different tasks. You can see examples in results.
To execute tasks in the real world, please follow these steps:
-
Dynamics Models:
Please download the dynamics model checkpoints from this link. Please update the corresponding paths indynamics/dyn_utils.pyto ensure the checkpoints are properly loaded and accessible. -
Calibration:
We use the xArm6 robot and a ChArUco calibration board. Please run the following code for calibration:cd xarm-calibrate python calibrate.py python camera_to_base_transforms.pyPlease replace the camera serial number in
xarm-calibrate/real_world/real_env.pyand the robot IP inxarm-calibrate/real_world/xarm6.py.To verify the calibration results, please run:
python verify_stationary_cameras.py
-
Robot Execution:
Please ensure the following steps for real world execution:- We employ different end-effctors to manipulate various objects. Specifically, we use the cylinder stick for T shape, ropes, board pusher for cubes and granular pieces. You can download them for 3D prints from here. Please update the robot setup in
config/real_config.yaml->plannerandenvs/real_env.py. Ensure the top-down and side cameras have clear views. - Please change hyperparameters such as
radiusinplanner/planner.pyandbox_thresholdinperception/models/grounding_dino_wrapper.pyfor various objects. - Please replace the
api_keyinlaunch.pywith your OpenAI API key.
Launch the execution:
python launch.py
It is expected to see the execution results in
logs/low_leveland dynamics predictions inlogs/{material}-planning-{time.time()}. - We employ different end-effctors to manipulate various objects. Specifically, we use the cylinder stick for T shape, ropes, board pusher for cubes and granular pieces. You can download them for 3D prints from here. Please update the robot setup in
We thank the authors of the following projects for making their code open source:
This repository is released under the MIT license.
@misc{liu2025kudakeypointsunifydynamics,
title={KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation},
author={Zixian Liu and Mingtong Zhang and Yunzhu Li},
year={2025},
eprint={2503.10546},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2503.10546},
}
