CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Sat, 01 Mar 2025 11:08:56 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"67c2eac8-26fd" expires: Tue, 30 Dec 2025 03:44:54 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 61C6:444BC:995785:AC3144:6953485E accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 03:34:54 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210020-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767065695.533987,VS0,VE211 vary: Accept-Encoding x-fastly-request-id: ed0891d0f711f5bbb59cbc90716f3a136c0e3ef8 content-length: 2942 Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

Ronghuan Wu¹, Wanchao Su², Jing Liao¹

¹City University of Hong Kong, ²Monash University

CVPR 2025

arXiv Code More Results ⭐️

Abstract

Scalable Vector Graphics (SVG) has become the de facto standard for vector graphics in digital design, offering resolution independence and precise control over individual elements. Despite their advantages, creating high-quality SVG content remains challenging, as it demands technical expertise with professional editing software and a considerable time investment to craft complex shapes. Recent text-to-SVG generation methods aim to make vector graphics creation more accessible, but they still encounter limitations in shape regularity, generalization ability, and expressiveness. To address these challenges, we introduce Chat2SVG, a hybrid framework that combines the strengths of Large Language Models (LLMs) and image diffusion models for text-to-SVG generation. Our approach first uses an LLM to generate semantically meaningful SVG templates from basic geometric primitives. Guided by image diffusion models, a dual-stage optimization pipeline refines paths in latent space and adjusts point coordinates to enhance geometric complexity. Extensive experiments show that Chat2SVG outperforms existing methods in visual fidelity, path regularity, and semantic alignment. Additionally, our system enables intuitive editing through natural language instructions, making professional vector graphics creation accessible to all users.

How does it work?

Given a text prompt, our system first leverages an LLM to generate an SVG template composed of basic geometric primitives. The rendered template is enhanced through SDEdit with ControlNet to add visual details while preserving the overall composition, yielding a target image. The SVG then undergoes a dual-stage optimization process to match the target image.
(1) Primitives are converted to latent embeddings through latent inversion and optimized along with their visual attributes (i.e., filling colors c_i, stroke properties s_i, and transformation matrices T_i).
(2) Point-level optimization is performed to refine the geometric details of SVG paths.

Text-Guided SVG Generation

SVG examples generated by our Chat2SVG. We highlight some shapes to demonstrate semantic clarity and path quality.

Text-Guided SVG Editing

We perform two rounds of refinement on each SVG template and show the optimized output. This figure shows editing types including deletion, modification, and addition.

Original Source | Taken Source