Dual Contrastive Learning for Spatio-temporal Representation

Official pytorch implementation of our ACM MM 2022 paper Dual Contrastive Learning for Spatio-temporal Representation.

Overview

Contrastive learning in the video domain exists severe background bias. When naively pulling two augmented views of a video closer, the model however tends to learn the common static background as a shortcut but fails to capture the motion information, a phenomenon dubbed as background bias. To tackle this challenge, this paper presents a novel dual contrastive formulation. Concretely, we decouple the input RGB video sequence into two complementary modes, static scene and dynamic motion. Then, the original RGB features are pulled closer to the static features and the aligned dynamic features, respectively. In this way, the static scene and the dynamic motion are simultaneously encoded into the compact RGB representation. We further conduct the feature space decoupling via activation maps to distill static- and dynamic-related features. We term our method as Dual Contrastive Learning for spatio-temporal Representation (DCLR).

Usage

Requirements

pytroch >= 1.8.1
tensorboard
cv2
kornia

Data preparation

Download the Kinetics400 dataset from the official website.
Download the UCF101 dataset from the official website.

Pretrain

In default, we train backbone R(2+1)D on UCF101 on a single node with 8 gpus for 200 epochs.

python main_view.py \
  --log_dir base_moco_r2d_view_ucf \
  --ckp_dir base_moco_r2d_view_ucf \
  --dataset ucf101 \
  -a r2plus1d_18 \
  --lr 0.01 \
  -cs 112 \
  -fpc 16 \
  -b 40 \
  -j 16 \
  --epochs 201 \
  --schedule 120 160 \
  --aug_plus \
  --mlp \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  $path/dataset/ucf-101

Action Recognition Downstream Evaluation

In default, we finetune backbone R(2+1)D on UCF101 on a single node with 4 gpus.

python -W ignore main_lincls.py \
  --log_dir log_finetune_18_ucf \
  --ckp_dir log_finetune_18_ucf \
  -a r2plus1d_18 \
  --num_class 101 \
  --lr 0.1 \
  --lr_decay 0.1 \
  --wd 0.0001 \
  -fpc 16 \
  -cpv 10 \
  -b 64 \
  -j 32 \
  --finetune \
  --pretrained $path_to_checkpoint_0199.pth.tar \
  --epochs 10 \
  --schedule 6 8 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  $path/dataset/ucf-101

Acknowledgement

Our code is based on the implementation of VideoMoCo and MoCo. We sincerely thanks those authors for their great works.

Citation

If our code is helpful to your work, please consider citing:

@inproceedings{ding2022dual,
  title={Dual contrastive learning for spatio-temporal representation},
  author={Ding, Shuangrui and Qian, Rui and Xiong, Hongkai},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={5649--5658},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Figure		Figure
dataset		dataset
moco		moco
utils		utils
README.md		README.md
cam_resnet.py		cam_resnet.py
convert_kinetics.py		convert_kinetics.py
i3d.py		i3d.py
main_view.py		main_view.py
modelr3d.py		modelr3d.py
r2plus1d.py		r2plus1d.py
r3d.py		r3d.py
simclr_view.py		simclr_view.py
start_eval.sh		start_eval.sh
start_freeze.sh		start_freeze.sh
start_view.sh		start_view.sh
test.py		test.py
video_resnet.py		video_resnet.py
view_resnet.py		view_resnet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dual Contrastive Learning for Spatio-temporal Representation

Overview

Usage

Requirements

Data preparation

Pretrain

Action Recognition Downstream Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

Mark12Ding/DCLR

Folders and files

Latest commit

History

Repository files navigation

Dual Contrastive Learning for Spatio-temporal Representation

Overview

Usage

Requirements

Data preparation

Pretrain

Action Recognition Downstream Evaluation

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages