| CARVIEW |
Biography
I am a Staff Research Scientist at Google DeepMind where I work on generative models of video. My primary goal is to develop models that learn to understand and generate the intricate dynamics of the real world from massive amounts of video data. These world models should be capable of interpreting the dynamics within a given video, forecast future events based on past observations, and to recombine learned concepts into novel, simulated versions of reality. I received my PhD from the Computer Science & Engineering Department at the University of Michigan, Ann Arbor under the supervision of Professor Honglak Lee. During my PhD, I mainly focused on building models for future frame prediction using self-supervised and supervised approaches. I also contributed to building world models successfully applied in model-based reinforcement learning.
Fun Facts: I played for my national basketball team (I am originally from Ecuador). I was also second best scorer in the nation in a national championship I played back in the day. I was part of a team that beat the media's projected champion during a championship in Quito (the guy that was best scorer in the national championship played for the other team :P). Let's have a Curry-range 3-point shootout. Ok, I'll stop now ...
News
- 08/2025: Our general purpose world model, Genie 3, was announced to the public.
- 05/2025: Our state-of-the-art audio and video generation model, Veo 3, was announced at Google I/O 2025, and made avaiable through Gemini Ultra.
- 12/2024: Our state-of-the-art video generation model, Veo 2, was announced and made available to a selected group of users.
- 05/2024: Our most capable video generation model to date, Veo, was announced at Google I/O 2024. Stay tuned for our official model release!.
- 05/2023: Our text-to-video model, Phenaki, was highlighted at Google I/O 2023.
- 03/2023: Honored to be giving a talk on Generative Models at Khipu2023, taking place in Montevideo, Uruguay.
- 02/2023: Our paper "Phenaki: Variable Length Video Generation From Open Domain Textual Description" has been accepted to ICLR 2023.
Publications
2024
|
ViC-MAE: Self-Supervised Representation Learning
from Images and Video with Contrastive Masked Autoencoders. Jefferson Hernandez, Ruben Villegas, Vicente Ordoñez
In European Conference on Computer Vision (ECCV), 2024
|
|
Text Prompting for Multi-Concept Video Customization by Autoregressive Generation.
Divya Kothandaraman, Kihyuk Sohn, Ruben Villegas, Paul Voigtlaender, Dinesh Manocha, Mohammad Babaeizadeh
In AI4CC Workshop at CVPR, 2024
|
2023
|
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Emanuele Bugliarello, Hernan Moraldo, Ruben Villegas, Mohammad Babaeizadeh, Mohammad Taghi Saffar, Han Zhang, Dumitru Erhan, Vittorio Ferrari, Pieter-Jan Kindermans, Paul Voigtlaender
In Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2023
|
|
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Ruben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi Saffar, Santiago Castro, Julius Kunze, Dumitru Erhan
In International Conference on Learning Representations (ICLR), 2023
|
2022
|
RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects Best Paper Award - Runner up
Yunseok Jang, Ruben Villegas, Jimei Yang, Duygu Ceylan, Xin Sun, Honglak Lee
In CVPR Workshop in AI for Content Creation, 2022
|
2021
|
Task-Generic Hierarchical Human Motion Prior using VAEs
Jiaman Li, Ruben Villegas, Duygu Ceylan, Jimei Yang, Zhengfei Kuang, Hao Li, Yajie Zhao
In International Conference on 3D Vision (3DV), 2021
|
|
Contact-Aware Retargeting of Skinned Motion
Ruben Villegas, Duygu Ceylan, Aaron Hertzmann, Jimei Yang, Jun Saito
In International Conference on Computer Vision (ICCV), 2021
|
|
Stochastic Scene-Aware Motion Prediction
Mohamed Hassan, Duygu Ceylan, Ruben Villegas, Jun Saito, Jimei Yang, Yi Zhou, Michael Black
In International Conference on Computer Vision (ICCV), 2021
|
|
Single-image Full-body Human Relighting
Manuel Lagunas, Xin Sun, Jimei Yang, Ruben Villegas, Jianming Zhang, Zhixin Shu, Belen Masia, Diego Gutierrez
In Eurographics Symposium on Rendering (EGSR), 2021
|
2020
|
Contact and Human Dynamics from Monocular Video Spotlight
Davis Rempe, Leonidas J. Guibas, Aaron Hertzmann, Bryan Russell, Ruben Villegas, Jimei Yang
In European Conference on Computer Vision (ECCV), 2020
Project page PDF ArXiv (Spotlight acceptance rate: 5%) |
2019
|
High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks
Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee
In Advances in Neural Information Processing Systems (NeurIPS), 2019
|
|
Unsupervised Learning of Object Structure and Dynamics from Videos
Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin Murphy, Honglak Lee
In Advances in Neural Information Processing Systems (NeurIPS), 2019
|
|
Learning Latent Dynamics for Planning from Pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson
In Proceedings of the 36th International Conference on Machine Learning (ICML), 2019
|
2018
|
MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics
Xinchen Yan, Akash Rastogi, Ruben Villegas, Kalyan Sunkavalli, Eli Shechtman, Sunil Hadap, Ersin Yumer, Honglak Lee
In European Conference on Computer Vision (ECCV), 2018
|
|
Hierarchical Long-term Video Prediction without Supervision
Nevan Wichers, Ruben Villegas, Dumitru Erhan, Honglak Lee
In Proceedings of the 35th International Conference on Machine Learning (ICML), 2018
|
|
Neural Kinematic Networks for Unsupervised Motion Retargetting Oral Presentation
Ruben Villegas, Jimei Yang, Duygu Ceylan, Honglak Lee
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Project page PDF ArXiv (Oral acceptance rate: 2.1%) |
2017
|
Learning to Generate Long-term Future via Hierarchical Prediction
Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee
In Proceedings of the 34th International Conference on Machine Learning (ICML), 2017
|
|
Decomposing Motion and Content for Natural Video Sequence Prediction
Ruben Villegas, Jimei Yang, Seunghoon Hong, Xunyu Lin, Honglak Lee
In International Conference on Learning Representations (ICLR), 2017
|
2015
|
Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction Oral Presentation
Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, Honglak Lee
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
Project page PDF ArXiv (Oral acceptance rate: 3.3%) |
2014
|
Who Do I Look Like? Determining Parent-Offspring Resemblance via Genetic Features
Afshin Dehghan, Enrique G. Ortiz, Ruben Villegas, Mubarak Shah
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014
|
Talks and Teaching
Invited Talks
- Tech Talk @T4V at CVPR2023, Vancouver, Canada (June 2023)
- Tech Talk @Khipu2023, Montevideo, Uruguay (March 2023)
- Tech Talk @IMAGINE École des Ponts ParisTech, Marne-la-Vallée, France (January 2023)
- Tech Talk @TaReCDa 2022, Piura, Peru (December 2022)
- Tech Talk @Adobe Research, San Jose, US (November 2022)
- Tech Talk @COMIA, Oaxaca, Oaxaca, Mexico (June 2022)
- Tech Talk @Google AI, Mountain View, CA, US (June 2021)
- Tech Talk @LXCV at CVPR, Seattle, US (June 2021)
- Tech Talk @KAIST, Daejeon, Korea (April 2021)
- Tech Talk @Waymo, Mountain View, US (March 2019)
- Tech Talk @Uber ATG, Toronto, CA (February 2019)
- Tech Talk @Adobe Research, San Jose, US (February 2019)
Teaching
- Graduate Student Instructor for EECS 598 @University of Michigan: Deep Learning (Winter 2019)
- Guest Lecture for EECS 492 @University of Michigan: Introduction to Artificial Intelligence (Winter 2019)