| CARVIEW |
Select Language
Summary
- We present the first model LongTextAR specifically designed for long-text image generation, addressing a significant gap in existing text-to-image methods that typically handle only short sentences.
- We pinpoint weak tokenization as a critical barrier for effective text rendering in existing multimodal autoregressive models, such as Chameleon.
- offers customizable text rendering with control over font attributes while generalizing to natural image generation through co-training. Our experiments demonstrate its potential for applications like document generation and PowerPoint editing.
Word Accuracy Comparison
Overview
Control over font attributes
Applications
BibTeX
@article{wang2025beyond,
title={Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models},
author={Wang, Alex Jinpeng and Li, Linjie and Yang, Zhengyuan and Wang, Lijuan and Li, Min},
journal={arXiv preprint arXiv:2503.20198},
year={2025}
}
|
|