| CARVIEW |
Sivan Doveh
I am a student researcher at Google and a PhD student in Computer Science at the Weizmann Institute of Science, supervised by Prof. Shimon Ullman. I study how vision-language models function. Exploring their core mechanisms, strengths, and limitations - mainly by developing new data and training approaches
I earned my Masterās degree in Electrical Engineering from Tel Aviv University and my Bachelorās degree in Electrical Engineering from Ben-Gurion University of the Negev (BGU). In parallel with my academic journey, I have also worked at Applied Materials and IBM Research.
I am actively looking for student collaborators in the area of multi-modal learning.
Contact: sivan.doveh [at] weizmann.ac.il
Recent News
Selected Publications
-
Teaching VLMs to Localize Specific Objects from In-context Examples (IPLoc)
Sivan Doveh, Nimrod Shabtay, Wei Lin, Eli Schwartz, Hilde Kuehne, Raja Giryes, Rogerio Feris, Leonid Karlinsky, James Glass, Assaf Arbelle, Shimon Ullman, M. Jehanzeb Mirza
ICCV 2025
-
Towards Multimodal In-Context Learning for Vision-Language Models
Sivan Doveh, Shaked Perek, M. Jehanzeb Mirza, Amit Alfassy, Assaf Arbelle, Shimon Ullman, Leonid Karlinsky
ECCVW 2024
-
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
Sivan Doveh, Assaf Arbelle, Sivan Harary, Roei Herzig, Donghyun Kim, Paola Cascante-Bonilla, Amit Alfassy, Rameswar Panda, Shimon Ullman, Leonid Karlinsky
NeurIPS 2023 Spotlight
-
Teaching Structured Vision-Language Concepts to Vision-Language Models
Sivan Doveh, Assaf Arbelle, Sivan Harary, Eli Schwartz, Roei Herzig, Raja Giryes, Rogerio Feris, Rameswar Panda, Shimon Ullman, Leonid Karlinsky
CVPR 2023
-
(Equal Contrib.) Detector-Free Weakly Supervised Grounding by Separation
*Assaf Arbelle, *Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky
ICCV 2021 Oral
[ Paper ]
-
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh, Wei Lin, M. Jehanzeb Mirza, Leshem Chosen, Mikhail Yurochkin, Yuekai Sun, Hiromi Wakaki, Yuki Mitsufuji, Assaf Arbelle, Leonid Karlinsky, Raja Giryes
ICLR 2025
-
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
M. Jehanzeb Mirza, Mengjie Zhao, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang, Saurav Jha, Hiromi Wakaki, Yuki Mitsufuji, Horst Possegger, Rogerio Feris, Leonid Karlinsky, James Glass
Arxiv 2024
-
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Irene Huang, Wei Lin, M. Jehanzeb Mirza, Jacob Hansen, Sivan Doveh, Victor Ion Butoi, Roei Herzig, Assaf Arbelle, Hilde Kuehne, Trevor Darrel, Chuang Gan, Aude Oliva, Rogerio Feris, Leonid Karlinsky
NeurIPS 2024
-
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Sivan Doveh, Jakub Micorek, Mateusz Kozinski, Hilde Kuehne, Horst Possegger
ECCV 2024
[ Paper | Project Page | Code ]
-
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gul Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky
ICCV 2023
[ Paper ]