Kaiyan Zhang1, Xinghui Li, Jingyi Lu1, Kai Han1
1Visual AI Lab, The University of Hong Kong
We provide a paper list for all the semantic correspondence estimation methods discussed in the paper.
Meanwhile, we also created a repo, Awesome-Semantic-Correspondence, to collect all papers for semantic correspondence estimation, considering the growing body of the literature in the field. PRs are wellcome!
The environment can be easily installed through conda and pip. After downloading the code, run the following command:
$conda create -n sc_baseline python=3.10
$conda activate sc_baseline
$conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
$conda install xformers -c xformers
$pip install yacs pandas scipy einops matplotlib triton timm diffusers accelerate transformers datasets tensorboard pykeops scikit-learnDownload the dataset you need under the 'asset' folder.
- Download PF-Pascal dataset from link.
- Rename the outermost directory from
PF-dataset-PASCALtopf-pascal. - Download lists for image pairs from link.
- Place the lists for image pairs under
pf-pascaldirectory.
- Download PF-Willow dataset from the link.
- Rename the outermost directory from
PF-datasettopf-willow. - Download lists for image pairs from link.
- Place the lists for image pairs under
pf-willowdirectory.
Download SPair-71k dataset from link. After extraction, No more action required.
Follow the instrcution of GeoAware-SC to prepare for the AP-10k dataset.
The structure should be :
asset
├── ap-10k
│ ├── annotations
│ ├── ImageAnnotation
│ ├── JPEGImages
│ ├── PairAnnotation
├── pf-pascal
│ ├── PF-dataset-PASCAL
│ │ ├── test_pairs.csv
│ │ ├── trn_pairs.csv
│ │ └── val_pairs.csv
├── pf-willow
│ ├── PF-dataset
│ │ └── test_pairs.csv
└── SPair-71k
├── devkit
├── ImageAnnotation
├── JPEGImages
├── Layout
├── PairAnnotation
├── Segmentation
└── Visualization
The configuration file for training and testing can be access at config/base.py. For example, to train the model, run:
sh train.sh
Some important parameters here include:
dataset: dataset name, choose from 'spair', 'ap10k', 'pfwillow' or 'pfpascal'.method: set to 'dino' to use DINOv2 as the backbone.pre_extract: pre-extract image features to speed up validation.train_sampleandval_sample: only used for the AP-10k dataset.`save_thre: threshold for saving the model within an epoch.eval_interval: iteration interval for validation.ckpt_dir: directory to save the model, train log and evaluation log.resume_dir: directory to resume training. If starting from scratch, set to 'None'.
python test.py --dataset ap10k --method dino --resolution 840 --batch_size 4 --ckpt_dir $directory_of_the_model$
We provided pretrained weights to reproduce the results in the paper, you can download it here.
| SPair-71k | AP-10k | |||
|---|---|---|---|---|
| Ours(DINOv2) | 85.1% | Google Drive | 87.4% | Google Drive |
@article{zhang2025semantic,
title={Semantic Correspondence: Unified Benchmarking and a Strong Baseline},
author={Kaiyan Zhang and Xinghui Li and Jingyi Lu and Kai Han},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2025}
}