You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We run inference using a policy server and a hardware client. The instructions for running policy server can be found at examples/umi/README.md, and we provide the UMI hardware client code in this repository.
📷 Data
We provide access to the following datasets:
Robot Datasets: Datasets for the cocktail and open-world visual grounding tasks.
Vision-Language Datasets: Datasets contains synthetic images and annotated reasoning for the open-world visual grounding task.
All datasets are hosted on Hugging Face. You can find them here.
We provide code for converting UMI data format to LeRobot data format here.
Synthetic Image Augmentation
To make the synthetic images more closely resemble real robot observations, we randomly apply several augmentations, including random fisheye distortion and compositing a robot gripper with adaptive brightness adjustments. The implementation is available in scripts/augment_vl_data/augment.py.
Here we show an example. From left to right, the images are: the original image, the image with fisheye distortion, the image compositing a robot gripper with adaptive brightness adjustments, and the image with both applied.
🙏 Acknowledgements
We express our sincere gratitude to the developers of the openpi for open-sourcing their code.
About
Official implementation of "OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning"