Carview!

ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning

Feng Han^1,2*, Yang Jiao^1,2*, Shaoxiang Chen³, Junhao Xu^1,2, Jingjing Chen^1,2, Yu-Gang Jiang^1,2

¹ Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University ² Shanghai Collaborative Innovation Center on Intelligent Visual Computing ³ Minimax

We introduce ControlThinker, a novel framework bridging the semantic gap in controllable image generation through enhanced visual reasoning. ControlThinker employs a "comprehend-then-generate" paradigm. It utilizes a Multimodal Large Language Model (MLLM) specifically enhanced via supervised and reinforcement fine-tuning to extract latent semantics from control images, generating enriched prompts that significantly enhance visual quality and semantic coherence of generated images without modifying image generators. Extensive experiments across various control types confirm ControlThinker's effectiveness.

📢 News

June 2, 2025: We have released the paper of ControlThinker.
May 30, 2025: The codes and models are coming soon.

📝 TODO

Release checkpoints and evaluation code
Released the training code along with an easy-to-follow tutorial.
Release the paper of ControlThinker

🧁 Results

Visualisation of generated images of ControlThinker and other baselines.

💥 Quick Start

1️⃣ Codes and tutorial will be available soon.

✍️ Citation

@article{han2025controlthinker,
  title={ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning},
  author={Han, Feng and Jiao, Yang and Chen, Shaoxiang and Xu, Junhao and Chen, Jingjing and Jiang, Yu-Gang},
  journal={arXiv preprint arXiv:2506.03596},
  year={2025}
}

📃 License

ControlThinker is licensed under the Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
asset/image		asset/image
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning

📢 News

📝 TODO

🧁 Results

💥 Quick Start

✍️ Citation

📃 License

About

Uh oh!

Releases

Packages

Maplebb/ControlThinker

Folders and files

Latest commit

History

Repository files navigation

ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning

📢 News

📝 TODO

🧁 Results

💥 Quick Start

✍️ Citation

📃 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages