Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language

Overview: Why is it challenging?


Method Overview



Our Video Super-Resolution (VSR) model is built upon the asymmetric U-Net architecture of the image GigaGAN upsampler. To enforce temporal consistency, we first inflate the image upsampler into a video upsampler by adding temporal attention layers into the decoder blocks. We also enhance consistency by incorporating the features from the flow-guided propagation module. To suppress aliasing artifacts, we use Anti-aliasing block in the downsampling layers of the encoder. Lastly, we directly shuttle the high frequency features via skip connection to the decoder layers to compensate for the loss of details in the BlurPool process.

Ablation study

Strong hallucination capability of image GigaGAN results in temporally flickering artifacts, especially aliasing caused by the artifacted LR input.

1

Slide to switch between different examples


We progressively add components to the base model to handle these artifacts →

Image GigaGAN (base model)

GT

Input

Comparison with GT

Comparison with previous methods

Compared to previous models, our models provides a detail-rich result with comparable temporal consistency.


1

Input

EDVR

Ours

Comparison

GT

Results on generic videos (128×128→512×512)

Our model is able to handle generic videos of different categories.

BibTeX

@article{xu2024videogigagan,
      title={VideoGigaGAN: Towards Detail-rich Video Super-Resolution}, 
      author={Yiran Xu and Taesung Park and Richard Zhang and Yang Zhou and Eli Shechtman and Feng Liu and Jia-Bin Huang and Difan Liu},
      year={2024},
      eprint={2404.12388},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
  }