| CARVIEW |
Introducing Seaweed
Seaweed, short for "Seed-Video," is a research effort for building a foundational model for video generation. This webpage showcases diffusion transformers with approximately 7 billion (7B) parameters, trained using compute equivalent to 1,000 H100 GPUs. Seaweed learns world representation from massive amounts of multi-modal data such as video, image, and text. It allows for creating videos of various resolutions, aspect ratios, and durations from text descriptions. In this article, we present its generated videos and highlight its hallmark capability as a foundational model capable of supporting a wide range of downstream applications.
Our model is highly adept at generating lifelike human characters that exhibit a diverse array of actions, gestures, and emotions.
Seaweed excels at generating a wide variety of landscapes. With intricate detail and dynamic composition, it can create visually stunning environments that enhance storytelling.
We demonstrate our model's generative capabilities through a short film. All the videos are generated and the only manually added components are the background music and the ending titles.
Watch a generated short film
Generate videos from images
Our video generation model offers enhanced controls that allow users to precisely create the content they envision. By providing an image as the first frame, users can direct the model to generate the rest of the video with consistent motion and style. This grants users full control over the visual aesthetics, making it ideal for applications where accuracy and creative direction are crucial.
Our model can also condition on both the first and last frames, allowing it to generate interesting transition videos for greater creative control.
Generate videos by references
Our model can also be finetuned to generate videos based on reference images, offering flexible input options for users. Whether it's a human reference image, an object reference image, or a combination of multiple reference images, the model can synthesize them into dynamic video sequences.
Learn more about PhantomHuman-centric video generation
Seaweed is adapted to generate content conditioned on audio inputs by Omnihuman, enabling the creation of realistic human characters that perfectly match the voice in the audio. The model ensures synchronized lip movements and body gestures that align with the tone and timing of the audio, creating a seamless and lifelike interaction.
Generate audio with video
Seaweed is also capable of generating both audio and video together. The audio generated is synced to reflect the action, scene, tone, rhythm, and style of the video. The audio complements and elevates the visual storytelling, providing a seamless multimedia experience.
Long-shot generation
Seaweed supports natively generating a single shot lasting 20 seconds without any extension technique. With the extension, it can generate videos up to a minute long.
Consistent storytelling
Seaweed is capable of generating consistent, multi-shot, long-form stories, maintaining continuity across scenes and shots. Users can provide both a global text description for the overarching narrative and fine-grained text descriptions for each individual shot.
Learn more about Long-Context-Tuning Learn more about VideoAuteur
Shot 1
Overview of the forest.
Shot 2
Close-up shot of the trees.
Shot 3
The boy follows the girl.
Shot 4
Cut to the front of the girl.
Shot 5
The boy follows.
Shot 6
Another overview of the forest.
Shot 7
The girl talks to the boy.
Shot 8
The boy becomes serious.
Shot 9
The girl becomes nervous.
Shot 10
The girl walks forward.
Shot 11
Drone view.
Shot 12
Wide angle ground view.
Shot 13
Camera dolly in.
Shot 14
A house appears in front.
Shot 15
Close-up shot of the house.
Shot 16
Close-up view of the characters.
Shot 17
The door. Dolly out.
Shot 18
The boy tries to open the door.
Shot 19
Inside the empty room.
Shot 20
The characters walk inside.
Shot 21
They look around in the house.
Shot 22
They walk into a new room.
Shot 23
An old bookshelf.
Shot 24
Close-up shot on the shelf.
Shot 25
The characters walk to a table.
Shot 26
A glowing ball floating on a map.
Shot 27
They look at each other.
High-resolution generation
Seaweed natively supports generating videos up to 1280x720 resolution. The result can also be further upsampled to 2K QHD (2560x1440) resolution. The super-resolution module can be separately applied to existing videos for upsampling and restoration.
Learn more about SeedVRReal-time generation
Seaweed can also generate videos in real-time at 1280x720 resolution and 24fps. This is particularly valuable for real-time and interactive applications, where immediate video generation is essential.
Learn more about Seaweed-APTWorld exploration
Seaweed can be utilized for modeling precise camera control through defined trajectories, providing not only enhanced creative direction but also an interactive way for users to explore the simulated world. With its real-time generation capability, Seaweed also serves as a foundational model for advanced research in world simulation.
Learn more about CameraCtrl-IIEnhanced Physically-Consistent Generation
Seaweed can also be post-trained on synthetic video rendered via computer-generated imagery (CGI), enabling it to enhance the physical consistency in video generation while preserving photorealism. Below, we showcase generated videos with superior 3D consistency and precise human pose integrity in complex actions, alongside the synthetic videos used for training.
Learn more about SimDropResearch
Alphabetical order
- Ceyuan Yang
- Fei Xiao
- Feng Cheng
- Hao Chen
- Haoyuan Guo
- Meng Wei
- Peihao Zhu
- Qi Zhao
- Shanchuan Lin
- Yang Zhao
- Zhijie Lin
- Zhiwu Qing
Research Model
- Fangyuan Kong
- Feilong Zuo
- Jiangqiao Yan
- Liangke Gui
- Lu Qi
- Sen Wang
- Sheng Bi
- Siyu Zhang
- Tuyen Hoang
- Xuejiao Zeng
- Zhibei Ma
- Ziyan Yang
Research Data
Research Lead
- Lu Jiang
- Jiashi Feng
- Zhenheng Yang
- Jianchao Yang
Infrastructure
Alphabetical order
* denotes individuals who held the role of point of contact
Feng Ling, Heng Zhang, Houmin Wei, Huafeng Kuang, Huixia Li*, Jerry Duncan, Jiashi Li*, Junda Zhang, Junru Zheng, Li Sun, Manlin Zhang, Renfei Sun, Rui Wang*, Shu Liu*, Xiaojie Li, Xin Xia, Xuefeng Xiao*, Xuyan Chi, Yanghua Peng, Yuxi Ren*, Zhongkai Zhao, Zuquan Song
Contributors
Alphabetical order
Bingchuan Li, Chao Liang, Deyao Zhu, Gaojie Lin, Gen Li, Hao He, Jianwen Jiang, Jianyi Wang, Jiaqi Yang, Jiawei Liu, Junfei Xiao, Lijie Liu, Lizhen Wang, Longhao Zhang, Qian He, Ruiqi Xia, Siyu Zhou, Tianshu Hu, Tianxiang Ma, Xiaobin Zhuang, Xiaohui Shen, Xinglong Wu, Yongming Zhu, Yuping Wang, Yuwei Guo, Yuxuan Luo, Yuxuan Wang, Zerong Zheng, Zhengkun Rong, Zhuo Chen, Zhuowei Chen
Acknowledgment
Special thanks to Seed leadership, Wenjia Zhu and Yonghui Wu, for their discussions and support.