CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Sat, 22 Nov 2025 21:09:45 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"69222699-5dd8" expires: Mon, 29 Dec 2025 06:39:03 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 5849:3946E9:855F08:95FD96:69521FAF accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 06:29:03 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210076-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766989744.774383,VS0,VE213 vary: Accept-Encoding x-fastly-request-id: 67e6436d066e5a843ddd670eb42cdebd220bc84d content-length: 4638 Bhavan Jasani — Applied Scientist

Bhavan Jasani

Applied Scientist · Computer Vision · Multimodal AI · Synthetic Data

I build practical machine learning systems at the intersection of computer vision, natural language processing, and reasoning.

Publications Download CV

Email: bjasani@alumni.cmu.edu
LinkedIn: /in/bhavan-jasani
Google Scholar: Scholar Profile
Location: San Francisco Bay Area

About

I’m an Applied Scientist who brings research to production, specializing in computer vision and multimodal machine learning. My work focuses on two areas:

Multi-modal learning — across images, text, video, audio, and structured data
Synthetic data generation & annotation — with humans and AI in the loop to overcome scarce or hard-to-label data

Areas of expertise: multi-modal machine learning, synthetic data generation, document intelligence (layout-aware transformers), visual grounding, chart reasoning & visual question answering, and scalable training/inference.

I’m increasingly motivated to apply my skills in AI to healthcare, genomics, and drug discovery — with the broader goal of contributing to research and products that have real clinical and societal impact.

Experience

Amazon AWS AI Labs

Applied Scientist (Computer Vision Research)

Sep 2019 – Present
Carnegie Mellon University, Robotics Institute

Research Assistant (Multi-modal Emotion Recognition)

Oct 2017 – Aug 2019
Nanyang Technological University, Singapore

Research Assistant (Hardware-efficient Computer Vision)

Jan 2016 – May 2017

Selected Publications

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

Conference on Computer Vision and Pattern Recognition (CVPR), 2024

B Jasani*, Z Li*, P Tang, S Ghadar

PDF
YORO: Lightweight End-to-End Visual Grounding

European Conference on Computer Vision (ECCV) Workshops, 2022

CH Ho, S Appalaraju, B Jasani, R Manmatha, N Vasconcelos

PDF
DocFormer: End-to-End Transformer for Document Understanding

International Conference on Computer Vision (ICCV), 2021

S Appalaraju, B Jasani, BU Kota, Y Xie, R Manmatha

PDF
End-to-End Visual Question Answering on Document Images

Amazon Machine Learning Conference (AMLC), 2021

B Jasani*, Y Xie*, R Manmatha

PDF
Exploiting Spatial Layout in Document Question Answering using Transformers

Amazon Machine Learning Conference (AMLC), 2021

Y Xie, B Pang, Y Zhang, B Jasani, V Mahadevan, R Manmatha

PDF
Are We Asking the Right Questions in MovieQA?

International Conference on Computer Vision (ICCV) Workshops, 2019

[Spotlight oral presentation]

B Jasani, R Girdhar, D Ramanan

PDF
Skeleton-based Zero Shot Action Recognition in Joint Pose-Language Semantic Space

arXiv, 2019

B Jasani, A Mazagonwalla

PDF
Automatic detection of human affective behavior in dyadic conversations

CMU RI Technical Report (Master's Thesis), 2019

B Jasani

PDF
Learning sampling policies for domain adaptation?

arXiv, 2019<

Y Patel*, K Chitta*, B Jasani*

PDF
Data-path unrolling with logic folding for area-time-efficient FPGA-based FAST corner detector

Journal of Real-Time Image Processing, 2019

SK Lam, T Lim, M Wu, B Cao, B Jasani

PDF
Threshold-Guided Design and Optimization for Harris Corner Detector Architecture

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Journal, 2017

B Jasani, SK Lam, PK Meher, M Wu

PDF
Area-time efficient FAST corner detector using data-path transposition

IEEE Transactions on Circuits and Systems II: Express Briefs, Journal, 2017

SK Lam, T Lim, M Wu, B Cao, B Jasani

PDF
Accelerating Feature Detectors For Real-Time Vision-Based Applications

Bachelor's Thesis, 2016

B Jasani

PDF

Patents

Global Prompts with Linear Adapter Tuning for Regression-Free Model Update

US Patent 85,779,920 — filed 2023

B. Jasani, P. Tan, P. Zhu, R. Manmatha, V. Mahadevan, Y. Xie
Document Visual Question Answering with Multimodal Transformer Encoder–Decoder Models

US Patent 85,528,792 — filed 2022

B. Jasani, N. Sankaran, P. Zhu, R. Manmatha, Y. Xie

Academic Services

Program Committee
- International Conference on Document Analysis and Recognition (ICDAR), 2025
- Transferring and Adapting Source Knowledge in Computer Vision (TASK-CV) Workshop, ICCV 2019
Reviewer
- Conference on Computer Vision and Pattern Recognition (CVPR)
- International Conference on Computer Vision (ICCV)
- European Conference on Computer Vision (ECCV)
- Amazon Computer Vision Conference (ACVC)
- Amazon Research Awards
- Book chapters – “Data Augmentation with Python”, Packt Publishing, 2023

Curriculum Vitae

Download my latest CV (updated 2025):

Download PDF Open in new tab

Fun Stuff

I enjoy partner dancing (Fusion and Salsa), have a deep passion for aviation and am working toward my private pilot license. I also love attending meditation retreats and going biking in my free time.

Original Source | Taken Source

Applied Scientist · Computer Vision · Multimodal AI · Synthetic Data

About

Experience

Amazon AWS AI Labs

Carnegie Mellon University, Robotics Institute

Nanyang Technological University, Singapore

Selected Publications

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

YORO: Lightweight End-to-End Visual Grounding

DocFormer: End-to-End Transformer for Document Understanding

End-to-End Visual Question Answering on Document Images

Exploiting Spatial Layout in Document Question Answering using Transformers

Are We Asking the Right Questions in MovieQA?

Skeleton-based Zero Shot Action Recognition in Joint Pose-Language Semantic Space

Automatic detection of human affective behavior in dyadic conversations

Learning sampling policies for domain adaptation?

Data-path unrolling with logic folding for area-time-efficient FPGA-based FAST corner detector

Threshold-Guided Design and Optimization for Harris Corner Detector Architecture

Area-time efficient FAST corner detector using data-path transposition

Accelerating Feature Detectors For Real-Time Vision-Based Applications

Patents

Global Prompts with Linear Adapter Tuning for Regression-Free Model Update

Document Visual Question Answering with Multimodal Transformer Encoder–Decoder Models

Academic Services

Program Committee

Reviewer

Curriculum Vitae

Fun Stuff