You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the official repository for the paper "LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Image Compression". Full paper is available on arXiv.
Please feel free to contact Murai(octachoron(at)suou.waseda.jp), Sun Heming or post an issue if you have any questions.
Demo Inference on Google Colab
We prepare demo inference code for google colaboratory.
You can check the inference without any environmental setting. Just click the 'Open in Colab' button above.
Requirements
Python 3.10 and some other packages are needed. Please refer to the How to Use section below.
Our experiments and verification are conducted on Linux(Ubuntu 22.04) and Docker container with cuda=1.2.1 and torch=2.1.
How to Use
First, clone this repository.
git clone https://github.com/tokkiwa/TextImageCoding/
cd TextImageCoding
Download the DiffBIR weights and our pre-trained weights to the /weights folder and /lic-weights/cheng folder respectively.
Install requirements (using virtual environment is recommended).
pip install -r requirements.txt
Caption Generation and Compression
Codes for Caption Generation and Compression can be found in llavanextCaption_Compression.ipynb.
Inference
We prepare text caption for kodak image datasets. Please download kodak dataset and put in ImageTextCoding/kodak folder, and run
bash run.sh
with necessary specification.
For other datasets, please generate and compress the caption by running llavanextCaption_Compression.ipynb and place the output csv to the df folder, and specify the dataset in run.sh.
Training
Our training code is based on CompressAI. Please run lic/train.sh with specification of the models, datasets and parameters.
Ackownledgement
Our codes are based on MISC, CompressAI, GPTZip and DiffBIR. We thank the authors for releasing their excellent work.
About
Official Repository for VCIP'24 paper "LMM-driven Image-Text Coding for Ultra Low-bitrate Learned Image Compression"