You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
Intern Robotics
Building inclusive infrastructure for Embodied AI, from Shanghai AI Lab.
🏆HoST[Best Systems Paper Finalist at RSS 2025]: Learning Humanoid Standing-up Control across Diverse Postures
HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit
Manipulation
Datasets:
InternData-A1: A hybrid synthetic-real manipulation dataset integrating 5 heterogeneous robots, 15 skills, and 200+ scenes, emphasizing multi-robot collaboration under dynamic scenarios.
InternData-M1: A large-scale synthetic dataset for generalizable pick-and-place over 80K objects, with open-ended instructions covering object recognition, spatial and commonsense reasoning, and long-horizon tasks.
Models and Research:
InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
F1-VLA: Visual foresight generation for planning-based control
VLAC: A generalist vision-language-action-critic model for robotic real-world reinforcement learning
Seer: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Navigation
Datasets:
InternData-N1: A high-quality navigation dataset with the most diverse scenes and extensive randomization across embodiments/viewpoints, including 3k+ scenes and 830k VLN data.
Models and Research:
InternVLA-N1: An Open Dual-System Vision-Language Navigation Foundation Model with Learned Latent Plans
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
VLN-PE: A Holistic Study of Physical and Visual Disparities in Vision-and-Language Navigation
AIGC for Embodied AI
Datasets:
OmniWorld: A large-scale, multi-domain, multi-modal dataset, enables significant performance improvements in 4D reconstruction and video generation.
Models and Research:
MeshCoder: Generate Structured 3D Object Blender Code from Point Clouds
Infinite-Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation