I obtained my B.S. with summa cum laude honor from Rensselaer Polytechnic Institute in 2020 and I had a double major in Computer Science and Mathematics. During my undergraduate years, I had worked with Prof. Bülent Yener on discriminative and generative models for microstructure images and with Prof. Lirong Xia on preference learning from natural language.
I’ve spent great time at GoogleResearch, Reality Labs, Research Asia, CV Science, Research, and collaborated with many exceptional researchers.
I am seeking research-oriented opportunities in industry and would welcome the chance to connect if you think there’s a potential fit.
We introduce SpatialReasoner, a novel large vision-language model (LVLM) that address 3D spatial reasoning with explicit 3D representations shared between stages – 3D perception, computation, and reasoning. Explicit 3D representations provide a coherent interface that supports advanced 3D spatial reasoning and enable us to study the factual errors made by LVLMs.
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark
We systematically study the impact of 3D-informed data, architecture, and training setups and present SpatialLLM, an LMM with advanced 3D spatial reasoning abilities.
ImageNet3D: Towards General-Purpose Object-Level 3D Understanding
We propose 3D-DST that generates synthetic data with 3D groundtruth by incorporating 3D geomeotry control into diffusion models. With our diverse prompt generation, we effectively improve both in-distribution (ID) and out-of-distribution (OOD) performance for various 2D and 3D vision tasks.