This is the official implementation of our ICLR 2025 paper "SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training". Hope to share our work in Singapore 🇸🇬, April, 2025~!!
[Paper] [Openreview] [Code] [Project Page]
Nie Lin*, Takehiko Ohkawa*, Yifei Huang, Mingfang Zhang, Minjie Cai, Ming Li, Ryosuke Furuta, Yoichi Sato (*equal contribution). "SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training", ICLR 2025.
[Mar 13th, 2025]: Pre-training weights release. [Download]
[Mar 9th, 2025]: Project page release. [Project Page]
[Jan 30th, 2025]: Code release. [Code]
[Jan 23th, 2025]: Open repository.
We present a framework for pre-training of 3D hand pose estimation from in-the-wild hand images sharing with similar hand characteristics, dubbed SiMHand. Pre-training with large-scale images achieves promising results in various tasks, but prior methods for 3D hand pose pre-training have not fully utilized the potential of diverse hand images accessible from in-the-wild videos. To facilitate scalable pre-training, we first prepare an extensive pool of hand images from in-the-wild videos and design our pre-training method with contrastive learning. Specifically, we collect over 2.0M hand images from recent human-centric videos, such as 100DOH and Ego4D. To extract discriminative information from these images, we focus on the similarity of hands: pairs of non-identical samples with similar hand poses. We then propose a novel contrastive learning method that embeds similar hand pairs closer in the feature space. Our method not only learns from similar samples but also adaptively weights the contrastive learning loss based on inter-sample distance, leading to additional performance gains. Our experiments demonstrate that our method outperforms conventional contrastive learning approaches that produce positive pairs sorely from a single image with data augmentation. We achieve significant improvements over the state-of-the-art method (PeCLR) in various datasets, with gains of 15% on FreiHand, 10% on DexYCB, and 4% on AssemblyHands.
Install required packages:
git clone https://github.com/ut-vision/simhand.git
cd simhand
conda env create -f environment.yml
conda activate simhand
## python -c "import torch; print(torch.__version__)" # Make sure it work!TBD
For the pre-training of SiMHand, we use 2.0M in-the-wild similar hands from Ego4D-v1 and 100DOH, please follow the instruction here to prepare the your pre-training datasets!
export BASE_PATH='<path_to_repo>'
export COMET_API_KEY=''
export COMET_PROJECT=''
export COMET_WORKSPACE=''
export PYTHONPATH="$BASE_PATH"
export DATA_PATH="<path_to_hand100m>"
export SAVED_MODELS_BASE_PATH="$BASE_PATH/data/models/simhand"
export SAVED_META_INFO_PATH="$BASE_PATH/data/models" For pre-training of SiMHand , please run through the code below. We did not search for random augmentation strategies for SiMHand, and we inherited the description of PeCLR and SimCLR from the original PeCLR paper.
python src/experiments/main.py \
--experiment_type handclr_w \
--gpus 0,1,2,3,4,5,6,7 \ # 8 card pre-training
--color_jitter \ # Data Augmentation I
--random_crop \ # Data Augmentation II
--rotate \ # Data Augmentation III
--crop \ # Data Augmentation IV
-resnet_size 50 \ # ResNet size 50 or 152
--resize \
-sources ego4d \ # Pre-training Data Source: ego4d or 100doh
--datasets_scale 1m \ # Pre-training Data Size
-epochs 100 \
-batch_size 8192 \
-accumulate_grad_batches 1 \
-save_top_k 100 \
-save_period 1 \
-num_workers 24 \
--weight_type linear \ # Parameter-free Adaptive Weighting Strategy
--joints_type augmented \
--diff_type mpjpe \ # Distance caculation
--pos_neg pos_neg \ # Add Weight in Pos or Neg of Contrastive LossWe also prepare the PeCLR and SimCLR pre-training with adaptive weighting strategy:
For PeCLR with our adaptive weighting strategy, please run:
python src/experiments/main.py \
--experiment_type peclr_w \
--gpus 0,1,2,3,4,5,6,7 \ # 8 card pre-training
--color_jitter \ # Data Augmentation I
--random_crop \ # Data Augmentation II
--rotate \ # Data Augmentation III
--crop \ # Data Augmentation IV
-resnet_size 50 \ # ResNet size 50 or 152
--resize \
-sources ego4d \ # Pre-training Data Source: ego4d or 100doh
--datasets_scale 1m \ # Pre-training Data Size
-epochs 100 \
-batch_size 8192 \
-accumulate_grad_batches 1 \
-save_top_k 100 \
-save_period 1 \
-num_workers 24 \
--weight_type linear \ # Parameter-free Adaptive Weighting Strategy
--joints_type augmented \
--diff_type mpjpe \ # Distance caculation
--pos_neg pos_neg \ # Add Weight in Pos or Neg of Contrastive LossFor SimCLR with our adaptive weighting strategy, please run:
python src/experiments/main_pretrain.py \
--experiment_type simclr_w \
--gpus 0,1,2,3,4,5,6,7 \ # 8 card pre-training
--color_jitter \ # Data Augmentation I
--crop \ # Data Augmentation II
-resnet_size 50 \
-sources ego4d \ # Pre-training Data Source: ego4d or 100doh
--datasets_scale 1m \ # Pre-training Data Si
--resize \
-epochs 100 \
-batch_size 1024 \
-accumulate_grad_batches 1 \
-save_top_k 100 \
-save_period 1 \
-num_workers 4 \
--weight_type linear \ # parameter-free adaptive weighting strategy
--joints_type augmented \
--diff_type mpjpe \ # distance caculation
--pos_neg pos_neg # add weight in pos or neg of Contrastive lossWe provide the baseline model used to validate the effects of our pre-training: minimal-hand, thanks to the original author @CalciferZh, and @MengHao666 for minimal-hand replication via pytorch. Since the @MengHao666 implementation does not support several brand new datasets: FreiHands, DexYCB, Assemblyhands, please proceed directly with the newest model of minimal-hand we provide. You can find the fine-tuning model in here.
In this SiMHand project, I am grateful to all my collaborators, especially to Take Ohkawa for his high standards and to Yoichi Sato for his patient guidance. I also would also like to thank The University of Tokyo Scholarship and JST ACT-X for supporting my research. Thank you!
If you find our paper/code useful, please consider citing our paper:
@inproceedings{
lin2025simhand,
title={{SiMHand}: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training},
author={Nie Lin and Takehiko Ohkawa and Yifei Huang and Mingfang Zhang and Minjie Cai and Ming Li and Ryosuke Furuta and Yoichi Sato},
booktitle={The Thirteenth International Conference on Learning Representations (ICLR)},
year={2025},
url={https://openreview.net/forum?id=96jZFqM5E0}
}
