| CARVIEW |
SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing
Autonomous Suturing Demonstration 4X Speed
Complete end-to-end autonomous suturing demonstration on the dVRK platform, showcasing the full pipeline from needle pickup through tissue penetration to secure knot tying.
Abstract
Robotic suturing is a prototypical long-horizon dexterous manipulation task, requiring coordinated needle grasping, precise tissue penetration, and secure knot tying.
Despite numerous efforts toward end-to-end autonomy, a fully autonomous suturing pipeline has yet to be demonstrated on physical hardware. We introduce SutureBot: an autonomous suturing benchmark on the da Vinci Research Kit (dVRK), spanning needle pickup, tissue insertion, and knot tying. To ensure repeatability, we release a high-fidelity dataset comprising 1,890 suturing demonstrations.
Furthermore, we propose a goal-conditioned framework that explicitly optimizes insertion-point precision, improving targeting accuracy by 59%-74% over a task-only baseline. To establish this task as a benchmark for dexterous imitation learning, we evaluate state-of-the-art vision-language-action (VLA) models, including π0, GR00T N1, OpenVLA-OFT, and multitask ACT, each augmented with a high-level task-prediction policy. Autonomous suturing is a key milestone toward achieving robotic autonomy in surgery. These contributions support reproducible evaluation and development of precision-focused, long-horizon dexterous manipulation policies necessary for end-to-end suturing.
System Architecture
Overview of SutureBot's precision-conditioned control framework for long-horizon, dexterous surgical tasks. Image observations are processed by a high-level language policy, which selects the current task and generates the associated language condition. The user specifies target needle insertion and exit points via a graphical interface, which is used to generate the goal condition. These inputs, language condition, goal condition, and real-time kinematic data, are then processed by the low-level policy to produce precise, continuous control commands for the robot.
Methodology
Suturing Task Decomposition
We decompose the complete suturing procedure into three sequential subtasks: (1) needle pickup, (2) needle throw (tissue penetration), and (3) knot tying. This decomposition enables focused data collection, policy training, and systematic evaluation of each critical phase.
Precision-Conditioned Control
Our goal-conditioned framework allows control of insertion-point precision through goal-conditioning, achieving 59%-74% improvement in targeting accuracy compared to task-only baselines. We use three goal condition representations: point label, binary mask, and distance map.
Experimental Setup
Hardware Platform
Our experiments are conducted on the da Vinci Research Kit (dVRK) Si version, featuring dual-arm robotic manipulation capabilities. The setup includes a Soft Tissue Suture Pad as the task surface, wrist-mounted cameras for close-up manipulation, an endoscope for global scene observation, and specialized robot grippers. Data collection focuses on wound one, with wounds two through six reserved for generalization testing.
Dataset & Precision Evaluation
We release a comprehensive dataset of 1,890 high-fidelity suturing demonstrations collected across multiple sessions. The dataset includes multi-modal observations from wrist cameras and endoscope, along with precise action sequences for each suturing subtask. Precision evaluation uses UV-marked insertion points to quantify targeting accuracy.
Results
Precision Targeting Performance
Our precision-conditioned control framework demonstrates significant improvements in insertion-point targeting accuracy, achieving 59%-74% better precision compared to task-only baselines. The evaluation shows that the point label goal condition achieves the highest precision on both ACT and π0.
Vision-Language-Action Model Benchmark
We establish SutureBot as a comprehensive benchmark for dexterous manipulation by evaluating state-of-the-art vision-language-action (VLA) models including π-O, GR00T N1, OpenVLA-OFT, and multitask ACT. Each model is augmented with our high-level task-prediction policy, demonstrating the framework's versatility and establishing new benchmarks for autonomous surgical robotics.
Citation
@inproceedings{haworth2025suturebot,
author = {Haworth, Jesse and Chen, Juo-Tung and Nelson, Nigel and Kim, Ji Woong and Moghani, Masoud and Finn, Chelsea and Krieger, Axel},
title = {SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing},
booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
year = {2025},
}