Hi! I’m Jie-Ying (Jay) Lee. I am a Ph.D. student in Computer Science at National Yang Ming Chiao Tung University, advised by Prof. Yu-Lun Liu. I am also a Software Engineer on Google’s Pixel Camera Team, where I develop on-device algorithms for camera.
I received my B.S. in Computer Science from National Yang Ming Chiao Tung University. During my undergraduate studies, I was an exchange student at ETH Zurich.
In Summer 2024, I interned with Google’s Pixel Camera Team, where I integrated the Segment Anything Model (SAM) for mobile devices, hosted by Yu-Lin Chang and Chung-Kai Hsieh. My industry experience also includes positions as an R&D Intern at Microsoft and a Backend Engineer Intern at Appier.
I am actively seeking research collaborations.
Outside of work and research, I enjoy badminton, dance, and photography.
Research Interest
3D Scene Understanding & Synthesis
Neural Radiance Fields & 3D Gaussian Splatting
Large-scale Urban Reconstruction from Satellite/Aerial Imagery
Dynamic & Specular Scene Modeling
Generative Models for Vision
Diffusion-based Image Restoration & Inpainting
3D Generation from Pre-trained Priors
Embodied AI & Robotics
Vision-Language Navigation
Unmanned Aerial Systems
3D Perception for Robot Decision-Making
On-Device Perception (Segmentation, Camera Algorithms)
News
Sep. 2025: Joined Google as a Software Engineer on the Pixel Camera Team!
Sep. 2025: Started my Ph.D. journey at NYCU with Prof. Yu-Lun Liu!
Creating large-scale, photorealistic 3D urban scenes traditionally requires expensive 3D scanning and manual annotation. We present Skyfall-GS, a novel framework that synthesizes city-block scale environments by combining satellite imagery with diffusion models, enabling real-time exploration without costly 3D annotations.
LightsOut, a diffusion-based outpainting framework tailored to enhance SIFR by reconstructing off-frame light sources by leveraging a multitask regression module and LoRA fine-tuned diffusion model to ensure realistic and physically consistent outpainting results.
This work presents See, Point, Fly (SPF), a training-free aerial vision-and-language navigation (AVLN) framework built atop vision-language models (VLMs), to consider action prediction for AVLN as a 2D spatial grounding task.
The approach introduces (1) depth-aware unseen mask generation for accurate occlusion identification, (2) Adaptive Guided Depth Diffusion, a zero-shot method for accurate initial point placement without requiring additional training, and (3) SDEdit-based detail enhancement for multi-view coherence.
SpectroMotion is presented, a novel approach that combines 3D Gaussian Splatting with physically-based rendering (PBR) and deformation fields to reconstruct dynamic specular scenes and is the only existing 3DGS method capable of synthesizing photorealistic real-world dynamic specular scenes.
This paper presents a novel approach called BoostMVSNeRFs to enhance the rendering quality of MVS-based NeRFs in large-scale scenes, and identifies limitations in MVS-based NeRF methods, such as restricted viewport coverage and artifacts due to limited input views.