| CARVIEW |
In-the-wild Cases
All the cases shown below are test set results. Horizontal and vertical videos are generated by the same model weights. You can use the maximize button in the bottom right corner of the video to play the video in full screen mode to observe more details.
Pexels Test Set Cases (Horizontal)
Pexels Test Set Cases (Vertical)
From left to right: input image, human pose condition, our result, groundtruth video.
TikTok Test Set Cases
The Tiktok test set uses the last 40 videos. Visualizations are from this set, using a model trained on HumanVid and the first 300 Tiktok videos. By setting camera parameters to be static, it produce static backgrounds which is consistent to Animate Anyone's setting.Cross-identity Cases
Our model requires paired camera trajectory and 2D human pose, making cross-identity inference more complex than Animate Anyone's static camera setting. We show successful cases with simple backgrounds, including source videos (first 2 videos) and cross-identity results (other videos). Besides extracting data from existing videos, you can project 3D human poses to camera space and export custom camera trajectories using software like Blender.Comparison with Previous methods
Visualization of Synthetic Videos
TL;DR: We synthesis two types of human-centric videos according to the asset of humans: (1) SMPL-X poses with UV texturemap and simulated clothes similar to BEDLAM, and (2) 3D anime character asset with rigged motions. The backgrounds are from HDRI images or 3D scenes. The motivation of using synthetic data is accurate human/camera poses and more diverse camera trajectories.
To view real videos, visit Pexels.com and search for human-related terms as we cannot redistribute their videos. Our HumanVid dataset's real part is filtered from Pexels videos.