KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation

🛠️ Installation

To set up the environment, please follow these steps:

Create the conda environment:

conda create -n kuda python=3.8
conda activate kuda
pip install -r requirements.txt

Download the checkpoints for GroundingSAM:

cd perception/models
mkdir checkpoints
cd checkpoints
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget https://huggingface.co/lkeab/hq-sam/resolve/main/sam_hq_vit_h.pth

Download the checkpoints for SpaTracker:

cd ../../../dynamics/tracker
mkdir checkpoints
cd checkpoints
pip install gdown
gdown 18YlG_rgrHcJ7lIYQWfRz_K669z6FdmUX

You can manually download the checkpoints from Google Drive.

🕹️ Usage

Visual Prompting (No Robot Setup)

To quickly test the visual prompting functionality without setting up a robot, please follow these steps:

Replace the api_key in demo.py with your OpenAI API key.
Run the demo:
```
python demo.py
```
Please modify the img and instruction variables in demo.py to experiment with different tasks. You can see examples in results.

Real-World Execution

To execute tasks in the real world, please follow these steps:

Dynamics Models:
Please download the dynamics model checkpoints from this link. Please update the corresponding paths in dynamics/dyn_utils.py to ensure the checkpoints are properly loaded and accessible.
Calibration:
We use the xArm6 robot and a ChArUco calibration board. Please run the following code for calibration:
```
cd xarm-calibrate
python calibrate.py
python camera_to_base_transforms.py
```
Please replace the camera serial number in xarm-calibrate/real_world/real_env.py and the robot IP in xarm-calibrate/real_world/xarm6.py.

To verify the calibration results, please run:
```
python verify_stationary_cameras.py
```
Robot Execution:
Please ensure the following steps for real world execution:
- We employ different end-effctors to manipulate various objects. Specifically, we use the cylinder stick for T shape, ropes, board pusher for cubes and granular pieces. You can download them for 3D prints from here. Please update the robot setup in config/real_config.yaml->planner and envs/real_env.py. Ensure the top-down and side cameras have clear views.
- Please change hyperparameters such as radius in planner/planner.py and box_threshold in perception/models/grounding_dino_wrapper.py for various objects.
- Please replace the api_key in launch.py with your OpenAI API key.
Launch the execution:
```
python launch.py
```
It is expected to see the execution results in logs/low_level and dynamics predictions in logs/{material}-planning-{time.time()}.

🔬 Acknowledgements

We thank the authors of the following projects for making their code open source:

🏷️ License

This repository is released under the MIT license.

🔭 Citation

@misc{liu2025kudakeypointsunifydynamics,
      title={KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation}, 
      author={Zixian Liu and Mingtong Zhang and Yunzhu Li},
      year={2025},
      eprint={2503.10546},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2503.10546}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation

🛠️ Installation

🕹️ Usage

Visual Prompting (No Robot Setup)

Real-World Execution

🔬 Acknowledgements

🏷️ License

🔭 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
dynamics		dynamics
envs		envs
perception		perception
planner		planner
prompts		prompts
xarm-calibrate		xarm-calibrate
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
launch.py		launch.py
requirements.txt		requirements.txt
utils.py		utils.py

License

StoreBlank/KUDA

Folders and files

Latest commit

History

Repository files navigation

KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation

🛠️ Installation

🕹️ Usage

Visual Prompting (No Robot Setup)

Real-World Execution

🔬 Acknowledgements

🏷️ License

🔭 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages