You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation
InternVLA-A1 is an end-to-end vision–language–action (VLA) framework unifing understanding, generation ,and action for robotic manipulation. It leverages predictive imagination of task evolution to guide execution, enabling enhanced manipulation in highly dynamic environments.
🔥 Highlights
Novel Model Archituecture: A Mixture-of-Transformers architecture for unified understanding, generation, and action.
Hybrid Synthetic-Real Data Corpus: A hybrid synthetic-real manipulation dataset InternData-A1, integrating 5 heterogeneous robots, 15 skills, and 200+ scenes, emphasizing multi-robot collaboration under dynamic scenarios.
Impressive Real-World performance: InternVLA-A1 demonstrates strong effectiveness and generalization in highly dynamic scenarios involving dynamic grasping of conveyor belts and multi-robot collaboration.
🏆 Unified Understanding-Generation-Action Family
F1-VLA (F1 is a prequel version of InternVLA-A1): Paper | Code | Model
The model handles dynamically shaped packages on conveyor belts, tracking and predicting their trajectories in real-time to achieve high-speed stable grasping, while adaptively flipping packages and identifying express information from delivery notes.
Multi-robot collaboration on long-horizon tasks in dynamic environments
multi-robot-long-horizon.mp4
The model swiftly identifies, locates, and grips high-speed ingredients based on task demands, showcasing its adaptability in complex environments.