Jiazhe Wei1,*, Ken Li1,*, Tianyu Lao2, Haofan Wang2, Liang Wang1,3, Caifeng Shan1, Chenyang Si1,†
1PRLab, Nanjing University, 2LibLib.ai, 3Institute of Automation, Chinese Academy of Sciences
*Equal Contribution, †Corresponding Author
📄 Paper | 🌐 Project Page |
- [2025-12-04] Our paper is now available on arXiv!
PosterCopilot is a cutting-edge framework that advances layout reasoning and controllable editing for professional graphic design using Large Multimodal Models (LMMs).
-
🎯 Geometrically Accurate Layouts
Achieves precise spatial positioning through a progressive three-stage training strategy that moves beyond simple regression to distribution-based learning -
🎨 Aesthetic Reasoning
Instills human-like design principles and aesthetics through reinforcement learning from aesthetic feedback -
✂️ Layer-level Control
Enables precise, fine-grained editing of individual layers while maintaining global visual consistency -
🔄 Multi-round Iterative Editing
Supports professional iterative design workflows with multiple refinement rounds on specific elements -
🎭 Versatile Applications
Handles complete layout generation, insufficient assets synthesis, theme switching, and canvas reframing
-
Perturbed Supervised Fine-Tuning (PSFT)
Reformulates coordinate regression into distribution-based learning for continuous spatial reasoning -
Reinforcement Learning for Visual-Reality Alignment (RL-VRA)
Introduces geometric reward signals to ensure visual-reality alignment and spatial accuracy -
Reinforcement Learning from Aesthetic Feedback (RLAF)
Employs learned aesthetic rewards to generate coherent and diverse compositions
One of the largest-scale, most thematically diverse, and highest-quality multi-layer poster datasets.
- 160K posters with 2.6M layers (1.2M text + 1.4M image/decorative elements)
- Spans 40+ distinct domains from commercial promotions to public announcements
- Novel OCR-based pipeline addresses over-segmentation challenges in multi-layer datasets
We are committed to making outstanding contributions to both academia and the graphic design industry with PosterCopilot. Our open-source plan includes:
- Project page and documentation
- Demo video
- Data Pipeline
- Test Dataset
- Training Code
- Model Weights
If you find PosterCopilot useful for your research, please consider citing:
@misc{wei2025postercopilot,
title={PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design},
author={Jiazhe Wei and Ken Li and Tianyu Lao and Haofan Wang and Liang Wang and Caifeng Shan and Chenyang Si},
year={2025},
eprint={2512.04082},
archivePrefix={arXiv},
primaryClass={cs.CV}
}For questions and collaborations, please contact:
- Jiazhe Wei: jzw6545@gmail.com
- Ken Li: kiyotakali075@gmail.com
- Chenyang Si: chenyangsi@smail.nju.edu.cn
We thank all contributors and the research community for their valuable feedback and support.
© 2025 PosterCopilot project. Released under the Apache 2.0 License.
