CS PhD Student @ Cornell University | Advised by Noah Snavely
I am interested in creating algorithms that can model the structure, appearance and dynamics of the world around us from casually captured images and videos. To this end, I typically work in the intersection of learning based 3D representations, computer vision and graphics.
Prior to joining Cornell for my PhD, I graduated from the MS in Robotics program at Carnegie Mellon University where I was advised by Shubham Tulsiani. There, I had explored problems in the intersection of diffusion models and learning based 3D representations. Please checkout my resume for additional information about my research and work experience.
I am always excited to chat about research, especially about topics related to 3D computer vision and generative models. Please feel free to reach out!
News
May 2025
Joined Google DeepMind as a Student Researcher for Summer 2025.
We propose UpFusion, a system that can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images without corresponding pose information. Current sparse-view 3D inference methods typically rely on camera poses to geometrically aggregate information from input views, but are not robust in-the-wild when such information is unavailable/inaccurate. In contrast, UpFusion sidesteps this requirement by learning to implicitly leverage the available images as context in a conditional generative model for synthesizing novel views. We incorporate two complementary forms of conditioning into diffusion models for leveraging the input views: a) via inferring query-view aligned features using a scene-level transformer, b) via intermediate attentional layers that can directly observe the input image tokens. We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images. We evaluate our approach on the Co3Dv2 and Google Scanned Objects datasets and demonstrate the benefits of our method over pose-reliant sparse-view methods as well as single-view methods that cannot leverage additional views. Finally, we also show that our learned model can generalize beyond the training categories and even allow reconstruction from self-captured images of generic objects in-the-wild.
Exploring Techniques to Improve Activity Recognition using Human Pose Skeletons
Bharath Raj N., Anand Subramanian, Kashyap Ravichandran, and Venkateswaran N.
2020 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2020
Human pose skeletons provide an explainable representation of the orientation of a person. Neural network architectures such as OpenPose can estimate the 2D human pose skeletons of people present in an image with good accuracy. Naturally, the human pose is a very attractive choice as a representation for building systems aimed at human activity recognition. However, raw pose keypoint representations suffer from various problems such as variance to translation and scale of the input images. Keypoints are also often missed by the pose estimation framework. These, and other factors lead to poor generalization and learning of networks that may be trained directly on these raw representations. This paper introduces various methods aimed at building a robust representation for training models related to activity recognition tasks, such as the usage of handcrafted features extracted from poses with the intent of introducing scale and translation invariance. Additionally, the usage of train-time techniques such as keypoint dropout are explored to facilitate better learning of models. Finally, we conduct an ablation study comparing the performance of deep learning models trained on raw keypoint representation and handcrafted features whilst incorporating our train-time techniques to quantify the effectiveness of our introduced methods over raw representations.
Single Image Haze Removal using a Generative Adversarial Network
Bharath Raj N., and Venkateswaran N.
2020 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET), 2020
Traditional methods to remove haze from images rely on estimating a transmission map. When dealing with single images, this becomes an ill-posed problem due to the lack of depth information. In this paper, we propose an end-to-end learning based approach which uses a modified conditional Generative Adversarial Network to directly remove haze from an image. We employ the usage of the Tiramisu model in place of the classic U-Net model as the generator owing to its higher parameter efficiency and performance. Moreover, a patch based discriminator was used to reduce artefacts in the output. To further improve the perceptual quality of the output, a hybrid weighted loss function was designed and used to train the model. Experiments on synthetic and real world hazy images demonstrates that our model performs competitively with the state of the art methods.
Selected Projects
Progressive Photon Mapping
Project Submission for Physics-based Rendering, 2023
In this project, I experimented with adding features to a custom version of the DIRT package such that a swimming pool could be rendered with realistic caustic effects. The current integrators in DIRT cannot render such scene realistically due to presense of paths of type 𝐿(𝑆+)𝐷(𝑆+). To overcome this issue, I added support for photon mapping in DIRT and experimented with adding enhancements to the base photon mapping algorithm. I also add support for progressive photon mapping, an approach that should reduce the artefacts that are typically observed with photon mapping. In addition to the above, I have also added support for the GGX BRDF and a directional light source.
Deploying Tiny YOLOv2 on Jetson Nano using DeepStream
Featured in Jetson Community Resources (Deep Learning section),
In this project, I experimented with deploying a Tiny YOLOv2 ONNX model on NVIDIA Jetson Nano using the DeepStream SDK. To this end, I modified existing C++ code to enable it to parse the output of the TinyYOLOv2 model.