| CARVIEW |
UniVideo
Unified Understanding, Generation, and Editing for Videos
Pengfei Wan2  Kun Gai2  Wenhu Chen† 1 
*Work done during an internship at Kling Team, Kuaishou Technology.  †Corresponding authors 
In‑Context Generation
Instruction: "A man dressed in a vibrant Hawaiian shirt with a colorful floral pattern, sits on a beach lounge chair. On his shoulder, a Pikachu with a small detective hat perches. The man holds an ice cream cone, taking a bite."
Instruction: "A man wearing in a black T-shirt rides a majestic tiger across a sunlit plain. He holds a gaint RTX 4090 graphics card in one hand, maintaining perfect balance as the tiger moves gracefully."
Instruction: "Wu kong, clad in ornate golden armor adorned with intricate red and black patterns, strides confidently through the aisles of a brightly lit modern supermarket."
Instruction: "A futuristic stainless-steel Tesla Cybertruck glides smoothly across the surface of a calm ocean under bright sunlight"
Instruction: "A highly detailed Lego tank, constructed from interlocking bricks, rolls steadily through a dense, sun-dappled forest."
Visual Prompt Understanding
In Context Editing