| CARVIEW |
ActAnywhere
Subject-Aware Video Background Generation
Boxiao Pan1,2
Zhan Xu2
Chun-Hao Paul Huang2
Krishna Kumar Singh2
Yang Zhou2
Leonidas J. Guibas1
Jimei Yang3
1Stanford University 2Adobe Research 3Runway
NeurIPS 2024
Abstract
We study a novel problem to automatically generate video background that tailors to foreground subject motion. It is an important problem for the movie industry and visual effects community, which traditionally requires tedious manual efforts to solve. To this end, we propose ActAnywhere, a video diffusion model that takes as input a sequence of foreground subject segmentation together with an image of a novel background, and generates a video of the subject interacting in this background. We train our model on a large-scale dataset of 2.4M videos of human-scene interactions. Through extensive evaluation, we show that our model produces videos with realistic foreground-background interaction while strictly following the guidance of the condition image. Our model generalizes to diverse scenarios including non-human subjects, gaming and animation clips, as well as videos with multiple moving subjects. Both quantitative and qualitative comparisons demonstrate that our model significantly outperforms existing methods, which fail to accomplish the studied task.
Method
During training, we take a randomly sampled frame from the training video to condition the denoising process. At test time, the condition can be either a composited frame of the subject with a novel background, or a background-only image.
Results
Click on dropdowns to view different categories. Videos should play automatically and in a loop. We used Adobe Firefly to generate the composited frames shown here. Hover mouse over them to see the corresponding text prompts, which are either produced from ChatGPT 4 or manually written.
Video background generation with composited frame conditioning
|
Original video
(not used as model input) |
Segmentation
|
Mallard wandering around a firepit.
Condition
|
Output
|
|---|
|
Original video
(not used as model input) |
Segmentation
|
A man folding bed sheets.
Condition
|
Output
|
|---|
|
Original video
(not used as model input) |
Segmentation
|
Purple tie-dye jogger runs in serene park, mist over lake.
Condition
|
Output
|
|---|
|
Original video
(not used as model input) |
Segmentation
|
A woman is water-skiing.
Condition
|
Output
|
|---|
|
Original video
(not used as model input) |
Segmentation
|
A woman riding a horse.
Condition
|
Output
|
|---|
|
Original video
(not used as model input) |
Segmentation
|
A dog plays beside an old man.
Condition
|
Output
|
|---|
Video background generation with background-only frame conditioning
|
Original video
(not used as model input) |
Segmentation
|
Condition
|
Output
|
|---|
|
Original video
(not used as model input) |
Segmentation
|
Condition
|
Output
|
|---|
|
Original video
(not used as model input) |
Segmentation
|
Condition
|
Output
|
|---|
Diverse generated camera motion
|
Segmentation
|
Lost in thought, figure strolls through foggy cityscape in winter attire.
Condition
|
Seed 1
|
Seed 2
|
Seed 3
|
Seed 4
|
|---|
|
Segmentation
|
A woman riding a motorcycle in a city.
Condition
|
Seed 1
|
Seed 2
|
Seed 3
|
Seed 4
|
|---|
|
Segmentation
|
Infant in blue onesie explores a toy-filled nursery.
Condition
|
Seed 1
|
Seed 2
|
Seed 3
|
Seed 4
|
|---|
|
Segmentation
|
Child in blue jacket joyfully picks a pumpkin in autumn patch.
Condition
|
Seed 1
|
Seed 2
|
Seed 3
|
|---|
|
Segmentation
|
Traveler, backpack in tow, seeks secrets in desolate landscape's vastness.
Condition
|
Seed 1
|
Seed 2
|
Seed 3
|
|---|
|
Segmentation
|
Immersed gamer moves intensely in high-tech room, exploring virtual reality.
Condition
|
Seed 1
|
Seed 2
|
Seed 3
|
|---|
Different backgrounds with the same foreground
Woman in red faces vast grey, reflecting an inner journey
|
Original video
|
Segmentation
|
Condition 1
|
Output 1
|
Condition 2
|
Output 2
|
Condition 3
|
Output 3
|
Condition 4
|
Output 4
|
Condition 5
|
Output 5
|
Condition 6
|
Output 6
|
Condition 7
|
Output 7
|
|---|
Woman poised backstage, ready for defining theater spotlight moment.
|
Original video
|
Segmentation
|
Condition 1
|
Output 1
|
Condition 2
|
Output 2
|
Condition 3
|
Output 3
|
Condition 4
|
Output 4
|
|---|
Determined athlete runs through cool, overcast weather, undeterred in the morning.
|
Original video
|
Segmentation
|
Condition 1
|
Output 1
|
Condition 2
|
Output 2
|
Condition 3
|
Output 3
|
|---|
A determined athlete trains in diverse landscapes for marathon endurance.
|
Original video
|
Segmentation
|
Condition 1
|
Output 1
|
Condition 2
|
Output 2
|
|---|
Woman confidently at outdoor, engaging at sunset.
|
Original video
|
Segmentation
|
Condition 1
|
Output 1
|
Condition 2
|
Output 2
|
Condition 3
|
Output 3
|
Condition 4
|
Output 4
|
|---|
Diverse generated contents
|
Segmentation
|
Traveler, backpack in tow, seeks secrets in desolate landscape's vastness.
Condition
|
Seed 1
|
Seed 2
|
Seed 3
|
Seed 4
|
|---|
|
Segmentation
|
A child creating shimmering soap bubbles at a grassland.
Condition
|
Seed 1
|
Seed 2
|
Seed 3
|
Seed 4
|
|---|
|
Segmentation
|
Child in beach attire joyfully runs shore, bucket in hand, playing.
Condition
|
Seed 1
|
Seed 2
|
|---|
Condition frame of a different subject
|
Original video
(not used as model input) |
Segmentation
|
A man is holding a balloon, and floating up by the balloon.
Condition
|
Output
|
|---|
|
Original video
(not used as model input) |
Segmentation
|
Cyclist pauses, admires scenic overlook with open road and tranquil landscape.
Condition
|
Output
|
|---|
Comparison with baselines
Here we show the video version of Fig. 4 in the paper.
A car drifting on a snowy mountain road
|
Original video
|
Segmentation
|
Condition
|
|---|---|---|
|
Ours
|
Gen1 [9]
|
Text2LIVE [3]
|
|
TokenFlow [12]
|
Control-A-Video [7]
|
|
|
AnimateDiff [13]
|
VideoCrafter1 [6]
|
A woman performing motorcycle stunts
|
Original video
|
Segmentation
|
Condition
|
|---|---|---|
|
Ours
|
Gen1 [9]
|
Text2LIVE [3]
|
|
TokenFlow [12]
|
Control-A-Video [7]
|
|
|
AnimateDiff [13]
|
VideoCrafter1 [6]
|