HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://ai4ce.github.io/SeeDo/ x-github-request-id: 7E24:36A0B4:82ED9F:932939:6951EB9B accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 02:46:51 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210067-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766976412.763491,VS0,VE198 vary: Accept-Encoding x-fastly-request-id: 1c849c53af4437b14791a65a6f2ae0f6ab1981de content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 15 Oct 2024 05:05:42 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"670df826-5918" expires: Mon, 29 Dec 2025 02:56:52 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: D49A:2D64E0:81ED49:9226E1:6951EB9B accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 02:46:52 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210067-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766976412.975165,VS0,VE217 vary: Accept-Encoding x-fastly-request-id: 2faa725c38a58be9e3f33bb4bd5869aeeafecd96 content-length: 5375 VLM See, Robot Do

VLM See, Robot Do:

Human Demo Video to Robot Action Plan via Vision Language Model

Beichen Wang^*¹, Juexiao Zhang^*¹, Shuwen Dong^†¹ Irving Fang^†¹, Chen Feng¹,

¹ New York University,
^* Equal contribution, first authors. ^† Equal contribution, second authors.

arXiv Code

TLDR

Interpret human demonstration videos and generate robot action plans using a pipeline of keyframe selection, visual perception and vision language model reasoning.

Method

Module 1: Keyframe Selection
We use APIs from the MediaPipe to detect the hand keypoints and calculate the speed of the hand.
The speed plot is then interpolated to be continous and the valleys are used as the keyframe selections.

Data and demos

We collected a dataset of human demonstration videos in three diverse catogories: vegetable organization, garment organization, and wooden block stacking.

Below are the data and corresponding results.

Human demonstration
Here is a demonstration video for vegetable organization. The videos illustrate how a human arrange the vegetable toys into specific containers one by one.

Robot execution
In this video, the robot executes the vegetable organization task in the same order as the human demonstrates.

BibTeX


Coming Soon

Acknowledgements

The work was supported in part through NSF grants 2238968, 2322242, and 2024882, and the NYU IT High Performance Computing resources, services, and staff expertise.

This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This webpage template is from Nerfies. We sincerely thank Keunhong Park for developing and open-sourcing this template. This website is inspired by the project page of FusionSense.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source