You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TASED-Net is a novel fully-convolutional network architecture for video saliency detection. The main idea is simple but effective: spatially decoding 3D video features while jointly aggregating all the temporal information. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. We observe that our model is especially better at attending to salient moving objects.
Video saliency detection aims to model the gaze fixation patterns of humans when viewing a dynamic scene. Because the predicted saliency map can be used to prioritize the video information across space and time, this task has a number of applications such as video surveillance, video captioning, video compression, etc.
Examples
We compare our TASED-Net to ACLNet, which was the previously leading state-of-the-art method. As shown in the examples below, TASED-Net is better at attending to the salient information. We also would like to point out that TASED-Net has a much smaller network size (82 MB v.s. 252 MB).
Code Usage
First, clone this repository and download this weight
file.
Then, just run the code using
$ python run_example.py
This will generate frame-wise saliency maps.
You can also specify the input and output directories as command-line arguments. For example,
$ python run_example.py ./example ./output
Notes
The released model is a modified version to increase the performance. The updated results are reported above.
We recommend using PNG image files as input (although examples of this repository are in JPEG format).
For the encoder of TASED-Net, we use the S3D network. We pretrained S3D on Kinetics-400 dataset using PyTorch and it achieves 72.08% top1 accuracy (top5: 90.35%) on the validation set of the dataset. We release our weight file for S3D together this project. If you find it useful, you might want to consider citing our work.
For training, we recommend using ViP, which is the video platform for general purposes in PyTorch. Otherwise, you can just use run_train.py. Before running the training code, make sure to download our weight file for S3D.
Citation
@inproceedings{min2019tased,
title={TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection},
author={Min, Kyle and Corso, Jason J},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={2394--2403},
year={2019}
}
About
Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection (ICCV 2019)