Carview!

SuperClass: Classification Done Right for Vision-Language Pre-Training

Zilong Huang · Qinghao Ye · Bingyi Kang · Jiashi Feng · Haoqi Fan

Bytedance Seed

This work presents SuperClass, a super simple classification method that performs vision-language pre-training. Our method does not require a text encoder to be pre-trained on image-text data. Instead, it utilizes tokenized raw text as supervised classification labels, without the need for additional text filtering or selection.

News

2024-11-06: Paper & code are all released.
2024-10-02: SuperClass is accepted by NeurIPS 2024.

Usage

Prepraration

git clone https://github.com/x-cls/superclass
cd superclass
pip install -r requirements.txt

Download the datasets Datacomp-1B and ImageNet-1K. You can also use other image-text pair datasets for training.

Modify the DATA_PATH and VAL_DATA_PATH in training script train.sh and train_combo.sh to your local paths to Datacomp-1B and ImageNet-1K.

CLIP Training & Superclass Training

To start CLIP training and superclass training, use the following command:

bash train.sh <config_path> opencls

This script will navigate to the opencls directory and execute the training.

If you want to include the LiT training phase, use the following command:

bash train_combo.sh <cls_config_path> <lit_config_path> opencls

CLS training config are here opencls/configs/cls_schedule

For example:

bash train.sh configs/cls_schedule/cls_vit_b16_s1.28B_bs16k.yaml opencls

Please note that the default precision during training is set to amp_bfloat16. If your GPU (e.g., V100) does not support bf16, please change it to fp16 or amp.

Acknowledgement

Our codebase is built up on OpenCLIP and the ViTamin.

We thank the OpenCLIP and the ViTamin for contributing such impressive codes and models to our community.

LICENSE

The models & code of SuperClass are released under the Apache-2.0 license.

Citation

If you find this project useful, please consider citing:

@inproceedings{superclass_huang,
  title={Classification Done Right for Vision-Language Pre-Training}, 
  author={Huang, Zilong and Ye, Qinghao and Kang, Bingyi and Feng, Jiashi and Fan, Haoqi},
  booktitle={NeurIPS},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SuperClass: Classification Done Right for Vision-Language Pre-Training

News

Usage

Prepraration

CLIP Training & Superclass Training

Acknowledgement

LICENSE

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
opencls		opencls
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.sh		train.sh
train_combo.sh		train_combo.sh

License

x-cls/superclass

Folders and files

Latest commit

History

Repository files navigation

SuperClass: Classification Done Right for Vision-Language Pre-Training

News

Usage

Prepraration

CLIP Training & Superclass Training

Acknowledgement

LICENSE

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages