HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Fri, 26 Dec 2025 14:07:11 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"694e968f-2fd4"
expires: Mon, 29 Dec 2025 16:16:25 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: A012:2118F1:8FEE6A:A190D3:6952A701
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 16:06:26 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210054-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767024386.885398,VS0,VE203
vary: Accept-Encoding
x-fastly-request-id: 64caea748ca7b2fc041fbb3af223a5513c4de078
content-length: 2432
Damiano Marsili
Research
I'm interested in pushing the boundaries of vision capabilities of multimodal LLMs. Most of my
research involves post-training vision-language models (VLMs) for visual reasoning in 3D and
tool-use.
I'm best reachable via email at dmarsili at caltech dot edu.
News
[Dec. 2025] Released VALOR !
[Feb. 2025] VADAR was accepted to CVPR 2025!
[Sep. 2023] Started my Ph.D at Caltech working with Georgia Gkioxari and Pietro Perona.
[Jun. 2023] Started an internship at Amazon Robotics working on
3D Spatial Reasoning
[May 2023] Graduated with a double major in Computer Science and Mathematics.
[Aug. 2020] Started my undegraduate at Johns Hopkins.
Same or Not? Enhancing Visual Perception in Vision-Language Models
Damiano Marsili ,
Aditya Mehta ,
Ryan Lin ,
Georgia Gkioxari
In review , 2025
project
page
/
arXiv
/
code
/
bibtex
Your browser does not support the video tag.
No Labels, No Problem: Training Visual Reasoners with Multimodal
Verifiers
Damiano Marsili ,
Georgia Gkioxari
arXiv preprint , 2025
project
page
/
arXiv
/
code
/
bibtex
Your browser does not support the video tag.
VADAR: Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili* ,
Rohun Agrawal* ,
Yisong Yue ,
Georgia Gkioxari
CVPR , 2025
project page
/
arXiv
/
code
/
bibtex