While previous approaches to 3D human motion generation have achieved notable success, they often rely on extensive training and are limited to specific tasks. To address these challenges, we introduce Motion-Agent, an efficient conversational framework designed for general human motion generation, editing, and understanding. Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text. This is accomplished by encoding and quantizing motions into discrete tokens that align with the language model's vocabulary. With only 1-3% of the model's parameters fine-tuned using adapters, MotionLLM delivers performance on par with diffusion models and other transformer-based methods trained from scratch. By integrating MotionLLM with GPT-4 without additional training, Motion-Agent is able to generate highly complex motion sequences through multi-turn conversations, a capability that previous models have struggled to achieve. Motion-Agent supports a wide range of motion-language tasks, offering versatile capabilities for generating and customizing human motion through interactive conversational exchanges.
- [2025/05/15] The training script is released.
- [2025/02/19] Demo and evaluation code are available.
- [2025/02/06] Motion-Agent is accepted to ICLR 2025.
- [2024/10/08] Motion-Agent paper is available.
- [2024/05/28] Original version MotionLLM paper is available.
If you find our work useful, please cite us. The BibTeX is as follows.
@article{wu2024motion,
title={Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs},
author={Wu, Qi and Zhao, Yubo and Wang, Yifan and Liu, Xinhang and Tai, Yu-Wing and Tang, Chi-Keung},
journal={arXiv preprint arXiv:2405.17013},
year={2024}
}
conda create -n motionagent python=3.10
conda activate motionagent
pip install -r requirements.txtDownload Motion-Agent ckpts.
bash prepare/download_ckpt.shDownload evaluation models and gloves for evaluation.
bash prepare/download_glove.sh
bash prepare/download_extractor.shWe use Google Gemma2-2B as MotionLLM's backbone. Please grant access from huggingface and use huggingface-cli login to login.
We provide an interactive demo for Motion-Agent that runs in your terminal. You will need to setup your own Azure OpenAI API key and endpoint. To start the demo:
python demo.pyHere are some examples of what you can ask Motion-Agent:
- Motion Generation
Generate a motion of a person runs forward and then does a backflip.- Motion Reasoning
Why is the person doing this? ./assets/motion_example.npyNote: For motion reasoning, make sure your motion file is in the correct .npy format (HumanML3D format) and exists in the specified path.
To get the full data of HumanML3D, please follow the instruction in HumanML3D.
python eval_mllm.pyTo train your own tokenier, you can refer to T2M-GPT.
Motion generation and motion captioning are trained separately. You can train MotionLLM by running the following commands.
python train_mllm.py --training_task t2mpython train_mllm.py --training_task m2tWe would like to thank the following open-source projects for their contributions to our codes: T2M-GPT, NExT-GPT, text-to-motion.
