huggingface-cli download IntMeGroup/EvalMi-50K --repo-type dataset --local-dir ./EvalMi-50K
Clone this repository:
git clone https://github.com/IntMeGroup/LMM4LMM.git
Create a conda virtual environment and activate it:
conda create -n LMM4LMM python=3.9 -y
conda activate LMM4LMM
Install dependencies using requirements.txt:
pip install -r requirements.txt
Install flash-attn==2.3.6
pip install flash-attn==2.3.6 --no-build-isolation
Alternatively you can compile from source:
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.3.6
python setup.py install
Alternatively if you are cuda12 you can use the packed env from
huggingface-cli download IntMeGroup/env LMM4LMM.tar.gz --repo-type dataset --local-dir /home/user/anaconda3/envs
mkdir -p /home/user/anaconda3/envs/LMM4LMM
tar -xzf LMM4LMM.tar.gz -C /home/user/anaconda3/envs/LMM4LMM
Preparation
huggingface-cli download IntMeGroup/EvalMi-50K/data --repo-type dataset --local-dir ./data
for stage1 training (Text-based quality levels)
sh shell/train_stage1.sh
for stage2 training (Fine-tuning the vision encoder and LLM with LoRA)
sh shell/train_stage2.sh
for quastion-answering training (QA)
sh shell/train_qa.sh
Download the pretrained weights
huggingface-cli download IntMeGroup/LMM4LMM-Perception --local-dir ./weights/stage2/stage2_mos1
huggingface-cli download IntMeGroup/LMM4LMM-Correspondence --local-dir ./weights/stage2/stage2_mos2
huggingface-cli download IntMeGroup/LMM4LMM-QA --local-dir ./weights/qa
for perception and correspondence score evaluation (Scores)
sh shell/eval_scores.sh
for quastion-answering evaluation (QA)
sh shell/eval_qa.sh
Download the pretrained weights
huggingface-cli download IntMeGroup/LMM4LMM-Perception --local-dir ./weights/stage2/stage2_mos1
huggingface-cli download IntMeGroup/LMM4LMM-Correspondence --local-dir ./weights/stage2/stage2_mos2
Configuration File Paths Before running the inference scripts, make sure to modify the paths in the data/infer_mos1.json and data/infer_mos2.json configuration files as follows: Make sure to update the paths accordingly:
root: Path to your root directory where img data is stored.
annotation_infer: Path to the file containing image paths for inference.
img_prompt: Path to the file containing image prompts for inference.
For Perception Score Inference:
sh shell/infer_perception.sh
For T2I Correspondence Score Inference:
sh shell/infer_correspondence.sh
- β Release the training code
- β Release the evaluation code
- β Release the inference code
- β Release the EvalMi-50K Database
If you have any inquiries, please don't hesitate to reach out via email at wangjiarui@sjtu.edu.cn
If you find our work useful, please cite our paper as:
@misc{wang2025lmm4lmmbenchmarkingevaluatinglargemultimodal,
title={LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs},
author={Jiarui Wang and Huiyu Duan and Yu Zhao and Juntong Wang and Guangtao Zhai and Xiongkuo Min},
year={2025},
eprint={2504.08358},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.08358},
}