| CARVIEW |
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion
Wenqiang Sun *, Shuo Chen *, Fangfu Liu*, Zilong Chen, Yueqi Duan, Jun Zhu†, Jun Zhang†, Yikai Wang†
* Equal Contribution
† Corresponding author
ICCV 2025
TL;DR: Create 3D and 4D scenes from a single image with controllable video diffusion.
Video Demo
Any Camera Control Video Generation
Spatial-Temporal Fused Controllable Video Generation
Prompt: Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.
Prompt: A joyful golden retriever, its fur gently swaying in the breeze, stands in a sunlit park. The dog's eyes sparkle with excitement as it looks up, its mouth open in a wide, ecstatic smile, tongue lolling out in pure bliss. The background is a blur of people and greenery, emphasizing the dog's lively presence. The sunlight filters through the leaves, casting dappled shadows on the ground, adding to the vibrant, carefree atmosphere of the scene. The retriever seems to be in mid-motion, perhaps having just finished a playful chase or eagerly anticipating the next adventure.
Prompt: A man with tousled dark hair stands in a dramatic landscape, his eyes blazing with fury as he surveys the chaotic scene around him. Clad in a rugged leather jacket, he turns slightly, revealing a determined posture amid a backdrop of crumbling mountains and a valley littered with abandoned structures and scattered flags. The sky is overcast, adding a somber tone to the atmosphere, accentuating his emotional intensity. The camera captures a medium shot, focusing on his tense expression and the desolation surrounding him. The visual style is cinematic with high contrast, enhancing the grim and powerful mood of the moment.
Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
Single View 3D Generation (360 Degree Orbit)
Prompt: In the mesmerizing nightscape, a colossal whale glides gracefully through the star-studded sky, its vast, textured body illuminated by the soft, ethereal glow of the moon. The city below, a sprawling metropolis of towering skyscrapers, twinkles with countless lights, creating a captivating contrast between the urban jungle and the serene marine giant. The sky, painted in deep shades of blue and adorned with twinkling stars, adds a dreamlike quality to the scene. The whale, seemingly in motion, appears to be swimming through the clouds, its majestic form a surreal and awe-inspiring sight against the backdrop of the illuminated cityscape.
Prompt: The image depicts a breathtaking landscape bathed in the warm, golden hues of a setting sun. The sky is a dramatic canvas of swirling clouds, painted in shades of pink, orange, and purple, creating a mesmerizing backdrop. The lush green meadow stretches out, dotted with vibrant wildflowers swaying gently in the breeze. Towering trees, their leaves tinged with the soft glow of the sun, stand sentinel along the winding dirt path that meanders through the scene. The overall atmosphere is serene and idyllic, capturing the tranquil beauty of nature at its finest.
Prompt: In the heart of a grand, medieval hall, the scene is bathed in a warm, golden glow. A long wooden table stretches into the distance, adorned with flickering candles that cast a soft, inviting light. The air is filled with a sense of timelessness and reverence. A single, vibrant candle burns brightly in the foreground, its flame dancing gently atop a small, ornate holder. The background reveals a beautifully decorated Christmas tree, its twinkling lights adding a festive touch to the ancient stone walls and towering arched windows. The interplay of light and shadow creates an enchanting, almost magical atmosphere, evoking a sense of wonder and serenity.
Prompt: In the heart of a frozen landscape, a majestic lighthouse stands tall, its stone walls blanketed in a thick layer of snow. The lighthouse, adorned with icicles, emanates a warm, golden glow from its windows, contrasting beautifully with the ethereal green and purple hues of the Northern Lights dancing across the night sky. The icy waters surrounding the structure are dotted with jagged ice formations, creating a surreal and otherworldly scene. The lighthouse keeper's footsteps, lightly imprinted in the snow, lead up the steps to the welcoming wooden door, hinting at the warmth and safety within. The serene, almost magical atmosphere is palpable, as if time itself has slowed to admire this breathtaking winter wonderland.
Prompt: In the serene countryside, a majestic red barn stands proudly against the soft hues of the setting sun. The barn's weathered wooden exterior glows warmly, reflecting the golden light of the evening. Its large, white-trimmed doors, adorned with intricate cross patterns, hint at the rustic charm within. The surrounding grass sways gently in the breeze, casting subtle shadows on the dirt path leading up to the barn. The vast, open fields stretch out into the distance, meeting the horizon where the sky transitions from a soft pink to a tranquil blue. The scene exudes a sense of peaceful solitude, capturing the timeless beauty of rural life.
Prompt: In the lush, verdant meadow, a rabbit darts gracefully across the grass, its fur a blend of earthy browns and soft grays that seamlessly meld with the natural surroundings. The creature's large, alert ears perk up, capturing every sound in the tranquil environment. Its eyes, wide and watchful, scan the area with a keen sense of awareness. The rabbit's movements are swift and agile, its powerful hind legs propelling it forward with ease. As it hops, the grass sways gently beneath its paws, creating a serene symphony of nature. The background, a dense thicket of dark green foliage, provides a striking contrast to the rabbit's lighter tones, enhancing the beauty of this fleeting moment in the wild.
Prompt: In the heart of this stately room, a grand wooden desk commands attention, its intricate carvings and polished surface gleaming under the soft, golden light streaming through the large windows. The windows, draped in luxurious golden curtains, frame a serene view of lush greenery outside, adding a touch of nature to the opulent setting. Two elegant chairs, upholstered in rich fabric, flank the desk, inviting conversation or contemplation. The room is adorned with symbols of authority and tradition, including the American flag and the Presidential Seal, which stand proudly behind the desk. The circular rug beneath the desk features elaborate designs and motifs, echoing the room's grandeur. The overall atmosphere is one of power, dignity, and timeless elegance, evoking a sense of reverence and respect.
Prompt: The image depicts a serene, moonlit countryside scene. A winding cobblestone path leads through lush, green fields, flanked by rustic wooden houses with warm, glowing windows. The path meanders gently, disappearing into the distance where it meets a small, quaint village. The sky is painted with hues of deep blue and orange, as the full moon rises, casting a soft, golden glow over the landscape. Wispy clouds drift across the moon, adding a touch of mystery to the tranquil night. Trees stand tall on either side of the path, their branches silhouetted against the twilight sky. The overall atmosphere is peaceful and idyllic, evoking a sense of calm and timeless beauty.
Prompt: In a whimsical autumnal setting, a beautifully crafted cake takes center stage, resembling a giant pumpkin with a glossy, golden-brown surface. The cake is adorned with a vibrant green stem, adding a touch of realism. Surrounding the cake are smaller, equally detailed pumpkin decorations, their rich orange hues contrasting beautifully against the soft green background. Scattered around are a few autumn leaves, gently fallen, adding to the seasonal ambiance. The cake sits on a wooden platter, enhancing the rustic charm. The overall scene exudes warmth and festivity, inviting viewers to celebrate the harvest season.
Prompt: In a lush, sunlit meadow, a curious capybara stands on its hind legs, reaching out with its front paws to gently touch a small globe. The globe, with its detailed continents and oceans, is perched on a grassy mound. The capybara's fur glistens in the soft, golden light, and its large, expressive eyes seem to reflect a sense of wonder. The background is a blur of greenery, suggesting a peaceful, natural setting. The capybara's curious touch on the globe evokes a sense of exploration and discovery, as if it is contemplating the vast world beyond its immediate surroundings.
Sparse View 3D Scene Generation
Two Input Views.
Two Input Views.
Two Input Views.
Two Input Views.
Two Input Views.
Two Input Views.
4D Scene Generation
Prompt: In a cozy, well-lit kitchen, a man in a black apron and blue cap is meticulously crafting a cocktail. He stands behind a white countertop, expertly pouring a rich, amber liquid from a shaker into a martini glass. The scene is filled with various bottles of alcohol, a juicer, and other bar tools, indicating a well-equipped home bar. The window behind him reveals a serene suburban view, adding a touch of calm to the focused atmosphere. His precise movements and the array of ingredients suggest a passion for mixology, creating a moment of artistry in an everyday setting.
Prompt: Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.
Pipeline
Pipeline of DimensionX. Our framework is mainly divided into three parts. (a) ST-Director for Controllable Video Generation. We introduce ST-Director to decompose the spatial and temporal parameters in video diffusion models by learning dimension-aware LoRA on our collected dimension-variant datasets. (b) 3D Scene Generation with S-Director. Given one view, a high-quality 3D scene can be recovered from the video frames generated by S-Director. (c) 4D Scene Generation with ST-Director. Given a single image, a temporal-variant video is produced by T-Director, from which a key frame is selected to generate a spatial-variant reference video. Guided by the reference video, per-frame spatial-variant videos are generated by S-Director, which are then combined into multi-view videos. Through the multi-loop refinement of T-Director, consistent multi-view videos are then passed to optimize the 4D scene.
X Family
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
More X Family coming soon...
Citation
@article{sun2024dimensionx,
title={DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion},
author={Sun, Wenqiang and Chen, Shuo and Liu, Fangfu and Chen, Zilong and Duan, Yueqi and Zhang, Jun and Wang, Yikai},
journal={arXiv preprint arXiv:2411.04928},
year={2024}
}

























