Carview!

InstructMoLE: Instruction-Guided Mixture of Low-rank Experts
for Multi-Conditional Image Generation

Jinqi Xiao^1,2 · Qing Yan¹ · Liming Jiang¹ · Zichuan Liu¹ · Hao Kang¹ · Shen Sang¹ · Tiancheng Zhi¹ · Jing Liu¹ · Cheng Yang¹ · Xin Lu¹ · Bo Yuan²

¹ByteDance Inc. ²Rutgers University

InstructMoLE (Instruction-Guided Mixture of Low-rank Experts) addresses task interference in multi-conditional image generation by aligning expert selection with global user intent. Unlike standard per-token routing mechanisms that can cause semantic and spatial artifacts, InstructMoLE introduces a unified routing strategy that ensures consistent expert choices across the entire image, enabling effective handling of diverse conditional generation tasks including single image editing, multi-subject generation, and spatial alignment.

Key Contributions

InstructMoLE solves task interference in multi-conditional image generation through two key innovations:

Instruction-Guided Routing (IGR): Replaces standard per-token routing with a single, global signal derived from the user's instruction. This enforces a consistent expert choice across the entire image, preventing semantic and spatial artifacts that arise from inconsistent routing decisions.
Output-Space Orthogonality Loss: A novel regularizer that forces experts to be functionally distinct. It directly prevents expert collapse by penalizing redundant outputs, ensuring effective specialization across different conditional generation tasks.

Installation

conda create -n instruct_mole python=3.11
conda activate instruct_mole
bash install_env.sh

The installation script will install all required dependencies including PyTorch, Diffusers, Transformers, and other necessary packages for training and evaluation.

Dataset Preparation

For training InstructMoLE, we support multiple open-source datasets covering different conditional generation scenarios:

Single Image Editing: OmniEdit
Multi-Subjects: MUSAR-Gen
Subject and Spatial Alignment: SubjectSpatial200K
Spatial Alignment: COCO 2017

Please prepare your datasets according to the expected format and place them in the appropriate directories. You can also use your own additional data for model training.

Training

To train InstructMoLE, use the provided training script:

bash run.sh

The training script uses accelerate launch for distributed training. You can customize training parameters by modifying train_config.json, which includes:

Configuration Parameters

MoE Configuration:

num_experts: Number of experts in the mixture (default: 8)
num_experts_per_tok: Number of experts activated per token (default: 4)
rank: Low-rank decomposition rank for experts (default: 32)
alpha: Scaling factor for expert outputs (default: 32)
type_aux_loss_alpha: Weight for type-based auxiliary loss (default: 0.1)
token_aux_loss_alpha: Weight for token-based auxiliary loss (default: 0.01)
orthogonal_reg_alpha: Weight for orthogonality regularization (default: 0.01)
use_type_embedding: Whether to use instruction-guided routing (default: true)

LoRA Configuration:

r: LoRA rank (default: 256)
lora_alpha: LoRA alpha scaling factor (default: 256)
target_modules: List of modules to apply LoRA

For more details on training, refer to train_kontext.py and train_config.json.

Evaluation

InstructMoLE supports evaluation on multiple benchmarks:

XVerseBench: Multi-subject conditional generation benchmark
OmniContext: Image editing benchmark
Spatial Alignment: Pose, depth, and canny edge evaluation

Evaluation scripts are provided in the eval/ directory. Please refer to the respective evaluation scripts for detailed usage instructions.

BibTeX

If you find InstructMoLE useful for your research and applications, please cite InstructMoLE using this BibTeX:

@misc{xiao2025instructmoleinstructionguidedmixturelowrank,
      title={InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation}, 
      author={Jinqi Xiao and Qing Yan and Liming Jiang and Zichuan Liu and Hao Kang and Shen Sang and Tiancheng Zhi and Jing Liu and Cheng Yang and Xin Lu and Bo Yuan},
      year={2025},
      eprint={2512.21788},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.21788}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
eval		eval
src		src
.gitignore		.gitignore
README.md		README.md
eval.sh		eval.sh
eval_edit.sh		eval_edit.sh
eval_kontext.py		eval_kontext.py
eval_spatial.sh		eval_spatial.sh
eval_spatial_align.py		eval_spatial_align.py
eval_xbench.sh		eval_xbench.sh
install_env.sh		install_env.sh
lora_config.json		lora_config.json
omni_id.py		omni_id.py
run.sh		run.sh
train_config.json		train_config.json
train_kontext.py		train_kontext.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InstructMoLE: Instruction-Guided Mixture of Low-rank Experts
for Multi-Conditional Image Generation

Key Contributions

Installation

Dataset Preparation

Training

Configuration Parameters

Evaluation

BibTeX

About

Uh oh!

Releases

Packages

Languages

yanq095/InstructMoLE

Folders and files

Latest commit

History

Repository files navigation

InstructMoLE: Instruction-Guided Mixture of Low-rank Expertsfor Multi-Conditional Image Generation

Key Contributions

Installation

Dataset Preparation

Training

Configuration Parameters

Evaluation

BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

InstructMoLE: Instruction-Guided Mixture of Low-rank Experts
for Multi-Conditional Image Generation

Packages