If you find our code or paper helpful, please consider starring our repository and citing:
@inproceedings{javed2025intermask,
title={InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling},
author={Muhammad Gohar Javed and Chuan Guo and Li Cheng and Xingyu Li},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=ZAyuwJYN8N}
}
Details
conda env create -f environment.yml
conda activate intermask
The code was tested on Python 3.7.7 and PyTorch 1.13.1
The following script will download pre-trained model checkpoints for both the VQ-VAE and Inter-M Transformer on both the InterHuman and Inter-X datasets.
python prepare/download_models.py
For evaluation on InterHuman dataset only (obtained from the InterGen github repo). Evaluation models for Inter-X dataset can be downloaded alongwith the dataset as explained later.
bash prepare/download_evaluator.sh
The download scripts use the gdown package. If you face problems, try running the following command and try again. The solution is from this github issue.
rm -f ~/.cache/gdown/cookies.json
You need to download the SMPL-X models and then place them under ./data//body_models/smplx.
βββ SMPLX_FEMALE.npz
βββ SMPLX_FEMALE.pkl
βββ SMPLX_MALE.npz
βββ SMPLX_MALE.pkl
βββ SMPLX_NEUTRAL.npz
βββ SMPLX_NEUTRAL.pkl
βββ SMPLX_NEUTRAL_2020.npz
Follow the instructions on the InterGen github repo to download the InterHuman dataset and place it in the ./data/InterHuman/ foler and unzip the motions_processed.zip archive such that the directory structure looks like:
./data
βββ InterHuman
βββ annots
βββ LICENSE.md
βββ motions
βββ motions_processed
βββ split
Follow the intstructions on the Inter-X github repo to download the Inter-X dataset and place it in the ./data/Inter-X_Dataset folder and unzip the processed/texts_processed.tar.gz archive, such that the directory structure looks like:
./data
βββ Inter-X_Dataset
βββ annots
βββ misc
βββ processed
βββ glove
βββ motions
βββ texts_processed
βββ inter-x.h5
βββ splits
βββ text2motion
βββ LICENSE.md
For Inter-X Evaluation Models
Once you have downloaded the dataset, move the contents of the ./data/Inter-X_Dataset/text2motion/checkpoints/ folder to the ./checkpoints/hhi/ folder.
Details
python infer.py --gpu_id 0 --dataset_name interhuman --name trans_default
The inference script obtains text prompts from the file ./prompts.txt. The format is each text prompt per line. By default the script generateds motion of 3 seconds in length. In our work, motion is in 30 fps.
The output files are stored under folder ./checkpoints/<dataset_name>/<name>/animation_infer/, which is this case would be ./checkpoints/interhuman/trans_default/animation_infer/. The output files are organized as follows:
keypoint_npy: generated motions with shape of (nframe, 22, 9) for each interacting individual.keypoint_mp4: stick figure animation in mp4 format with two viewpoints.
We also apply naive foot ik to the generated motions, see files with prefix ik_. It sometimes works well, but sometimes will fail.
Details
Note: You have to train the VQ-VAE BEFORE training the Inter-M Transformers. They CAN NOT be trained simultaneously.
python train_vq.py --gpu_id 0 --dataset_name interhuman --name vq_test
python train_vq.py --gpu_id 0 --dataset_name interx --batch_size 128 --name vq_test
python train_transformer.py --gpu_id 0 --dataset_name interhuman --name trans_test --vq_name vq_test
python train_transformer.py --gpu_id 0 --dataset_name interx --batch_size 128 --name trans_test
Selected arguments:
--gpu_id: GPU id.--dataset_name: interaction dataset,interhumanfor InterHuman andinterxfor Inter-X.--name: name your experiment. This will create a saving directory at./checkpoints/<dataset_name>/<name>.--vq_name: when training Inter-M Transformer, you need to specify the name of previously trained vq-vae model for tokenization.--batch_size: For InterHuman, we use256for VQ-VAE training and52for the Inter-M Transformer. For Inter-X, we use128for both.--do_eval: to perform evaluations during training. Note: Make sure you have downloaded the evaluation models.--max_epoch: number of total epochs to run.50for VQ-VAE and500for Inter-M Transformer. All the trained model checkpoints, logs and intermediate evaluations will be saved at./checkpoints/<dataset_name>/<name>.
Details
InterHuman:
python eval.py --gpu_id 0 --use_trans False --dataset_name interhuman --name vq_default
Inter-X:
python eval.py --gpu_id 0 --use_trans False --dataset_name interx --name vq_default
InterHuman:
python eval.py --gpu_id 0 --dataset_name interhuman --name trans_default
Inter-X:
python eval.py --gpu_id 0 --dataset_name interx --name trans_default
Selected arguments
--gpu_id: GPU id.--use_trans: whether to use transformer. default:True. SetFalseto perform inference on only the VQ-VAE.--dataset_name: interaction dataset,interhumanfor InterHuman andinterxfor Inter-X.--name: name of your trained model experiment.--which_epoch: checkpoint name of the model: [all,best_fid,best_top1,latest,finest]--save_vis: whether to save visualization results. default =True.--time_steps: number of iterations for transformer inference. default:20.--cond_scales: scale of classifer-free guidance. default:2.--topkr: percentile of low score tokens to ignore while inference. default:0.9.
The final evaluation results will be saved in ./checkpoints/<dataset_name>/<name>/eval/evaluation_<which_epoch>_ts<time_steps>_cs<cond_scales>_topkr<topkr>.log
Components in this code are derived from the following open-source efforts: