Hani Alomari hanialomari

Hi, I'm Hani Alomari 👋

I build retrieval systems that understand images, text, video, and sound - not just literal matches.

I'm a PhD researcher at Virginia Tech, working on vision-language models (VLMs), RAG, and ranking/reranking. My focus is multi-prompt (multi-vector) embeddings: many small, controllable "views" of meaning that make search richer, more interpretable, and less prone to collapse.

What I work on

Reasoning in vision-language models (VLMs).
Cross-modal retrieval across images, text, video, and audio.
Structured information extraction from multimodal data.
Knowledge representation for multimodal reasoning.
Exploring room acoustics (RIRs) as spatial signals for learning geometry-aware representations

Why it matters

Real-world queries are polysemous: idioms, metaphor, culture, and context often matter more than surface similarity. I design retrieval pipelines that surface the right connections, not only the nearest neighbor.

Projects (quick view)

Multi-Prompt Embedding for Retrieval
- One input -> multiple focused embeddings to boost recall and reduce length/bias collapse.
RAG + Reranker for Multimodal Search
- Lightweight bi-encoder retrieval + VLM reader + cross-encoder reranker for better final ranking.
Diversity-Aware VLM Retrieval
- Retrieves multiple perspectives (literal/figurative/emotional/abstract/background) instead of forcing a single vector.

Tech I use (most often)

Languages

ML / Data

Systems / Tools

Open to collaborations

If you are working on diversity-aware retrieval, interpretable VLMs, or multimodal reasoning benchmarks, lets talk.

How to reach me

Website: https://hanialomari.github.io/
Google Scholar: https://scholar.google.com/citations?user=Ft_qTcwAAAAJ&hl=en
LinkedIn: https://www.linkedin.com/in/hanialomari/
Email: mailto:hani@vt.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly