You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This paper presents a UNIfied HSI framework, UniHSI, which supports unified control of diverse interactions through language commands. This framework is built upon the definition of interaction as Chain of Contacts (CoC): steps of human joint-object part pairs, which is inspired by the strong correlation between interaction types and human-object contact regions.
Based on the definition, UniHSI constitutes a Large Language Model (LLM) Planner to translate language prompts into task plans in the form of CoC, and a Unified Controller that turns CoC into uniform task execution. To facilitate training and evaluation, we collect a new dataset named ScenePlan that encompasses thousands of task plans generated by LLMs based on diverse scenarios. Comprehensive experiments demonstrate the effectiveness of our framework in versatile task execution and generalizability to real scanned scenes.
[2024-01] UniHSI is accepted as ICLR 2024 spotlight. Thanks for the recognition!
[2023-09] We release the paper of UniHSI. Please check the 👉 webpage 👈 and view our demos! 🎇;
🔍 Overview
The whole pipeline consists of two major components: the LLM Planner and the Unified Controller. The LLM planner takes language inputs and background scenario information as inputs and outputs multi-step plan in the form of a Chain of Contacts. The Unified Controller then executes task plans step-by-step and output interaction movements.
Installation
Download Isaac Gym from the website, then
follow the installation instructions.
Once Isaac Gym is installed, install the external dependencies for this repo:
We select and process motion clips from SAMP and CIRCLE.
Training
We adopt step-by-step training.
sh train_partnet_simple.sh
sh train_partnet_mid.sh
sh train_partnet_hard.sh
Demo
sh demo_scannet.sh
Evaluation
sh test_partnet_simple.sh
sh test_partnet_mid.sh
sh test_partnet_hard.sh
sh test_scannet_simple.sh
sh test_scannet_mid.sh
sh test_scannet_hard.sh
Source
Success Rate (%)
Contact Error
Success Steps
Simple
Mid
Hard
Simple
Mid
Hard
Simple
Mid
Hard
PartNet
85.5
67.9
40.5
0.035
0.037
0.040
2.13
4.11
4.84
ScanNet
73.2
43.1
22.3
0.061
0.072
0.062
2.21
3.47
4.78
The results will be saved in the "output" folder.
There will be ~10% variance due to randomness in sampling.
🔗 Citation
If you find our work helpful, please cite:
@inproceedings{
xiao2024unified,
title={Unified Human-Scene Interaction via Prompted Chain-of-Contacts},
author={Zeqi Xiao and Tai Wang and Jingbo Wang and Jinkun Cao and Wenwei Zhang and Bo Dai and Dahua Lin and Jiangmiao Pang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=1vCnDyQkjg}
}