I am a Member of Technical Staff at xAI working on Pretraining and Multimodal. Previously, I graduated from UC Berkeley with a Bachelor's in Computer Science where I was advised by Prof. Jitendra Malik at
BAIR.
My research engineering interests are generally in deep learning (self-supervision, reasoning, scaling) and their applications
to computer vision and embodied systems.
We show how diffusion models benefit from scaling training and test-time compute for perceptual tasks and unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation.
We trained LLaMA models up to 1 billion parameters on 1 trillion visual tokens. The resulting model can do diverse tasks including image and video recognition, video tracking, action prediction, and robotics. We also study the scaling properties of these family of models.