You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This work presents SuperClass, a super simple classification method that performs vision-language pre-training. Our method does not require a text encoder to be pre-trained on image-text data. Instead, it utilizes tokenized raw text as supervised classification labels, without the need for additional text filtering or selection.
News
2024-11-06: Paper & code are all released.
2024-10-02: SuperClass is accepted by NeurIPS 2024.
Usage
Prepraration
git clone https://github.com/x-cls/superclass
cd superclass
pip install -r requirements.txt
Please note that the default precision during training is set to amp_bfloat16. If your GPU (e.g., V100) does not support bf16, please change it to fp16 or amp.
We thank the OpenCLIP and the ViTamin for contributing such impressive codes and models to our community.
LICENSE
The models & code of SuperClass are released under the Apache-2.0 license.
Citation
If you find this project useful, please consider citing:
@inproceedings{superclass_huang,
title={Classification Done Right for Vision-Language Pre-Training},
author={Huang, Zilong and Ye, Qinghao and Kang, Bingyi and Feng, Jiashi and Fan, Haoqi},
booktitle={NeurIPS},
year={2024}
}
About
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training