You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training), initially described in arxiv, is a simple yet effective one-stage pre-training paradigm. It can integrate existing pre-training methods (supervised pre-training, weakly-supervised pre-training and self-supervised pre-training) under an unified mutual information perspective and maintain all desired properties through a single-stage pre-training. Notably, we successfully pre-train a 1B model (InternImage-H) with M3I Pre-training and achieve new record 65.4 mAP on COCO detection test-dev, 62.5 mAP on LVIS detection minival, and 62.9 mIoU on ADE20k.
Citation
If this work is helpful for your research, please consider citing the following BibTeX entry.
@InProceedings{Su_2023_CVPR,
author = {Su, Weijie and Zhu, Xizhou and Tao, Chenxin and Lu, Lewei and Li, Bin and Huang, Gao and Qiao, Yu and Wang, Xiaogang and Zhou, Jie and Dai, Jifeng},
title = {Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {15888-15899}
}
About
[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.