| CARVIEW |
Israfel Salazar
Hi there!
I am an ELLIS PhD fellow at University of Copenhagen, advised by Desmond Elliot. My current research focuses on vision-language understanding and representation. I have broad interests in machine learning, including motion and spatial reasoning, and robotics.
Previously, I completed the M.Sc. in Applied Mathematics (MVA) at ENS Paris-Saclay and the M.Sc. in Electrical Engineering at Université Paris-Saclay. I’ve worked with generative models for image restoration at DxO, Bayesian generative modeling at Inria, and multimodal representation learning at HuggingFace. I worked as a robotics engineer after studying mechanical engineering and applied physics at the University of Chile.
News
- [2025-11] Presented SPECS at EMNLP 2025! 🇨🇳
- [2025-08] SPECS accepted to EMNLP 2025. 🥳
- [2025-08] CaMMT accepted to Findings of EMNLP 2025. 🥳
Publications
Preprints
Long Story Short: Disentangling Compositionality and Long-Caption Understanding in VLMs
Israfel Salazar, Desmond Elliott, Yova Kementchedjhieva
Investigates the bidirectional relationship between compositional training and long-caption understanding in vision-language models, revealing that these capabilities can be jointly learned through training on dense, grounded descriptions.
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
Israfel Salazar, Manuel Fernández Burda, [...], Sara Hooker, Marzieh Fadaee Israfel Salazar, Manuel Fernández Burda, Shayekh Bin Islam, Arshia Soltani Moakhar, Shivalika Singh, Fabian Farestam, Angelika Romanou, Danylo Boiko, Dipika Khullar, Mike Zhang, Dominik Krzemiński, Jekaterina Novikova, Luísa Shimabucoro, Joseph Marvin Imperial, Rishabh Maheshwary, Sharad Duwal, Alfonso Amayuelas, Swati Rajwal, Jebish Purbey, Ahmed Ruby, Nicholas Popovič, Marek Suppa, Azmine Toushik Wasi, Ram Mohan Rao Kadiyala, Olga Tsymboi, Maksim Kostritsya, Bardia Soltani Moakhar, Gabriel da Costa Merlin, Otávio Ferracioli Coletti, Maral Jabbari Shiviari, MohammadAmin Farahani Fard, Silvia Fernandez, María Grandury, Dmitry Abulkhanov, Drishti Sharma, Andre Guarnier De Mitri, Leticia Bossatto Marchezi, Setayesh Heydari, Johan Obando-Ceron, Nazar Kohut, Beyza Ermis, Desmond Elliott, Enzo Ferrante, Sara Hooker, Marzieh Fadaee [collapse]
A comprehensive exam benchmark covering 18 languages and 14 subjects with 20,911 multiple-choice questions for massively multilingual vision-language model evaluation.
Conference Papers
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
Xiaofu Chen, Israfel Salazar, Yova Kementchedjhieva
EMNLP, 2025
A reference-free metric for evaluating long image captions that emphasizes specificity by rewarding correct details and penalizing incorrect ones.
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation
Emilio Villa-Cueva, Sholpan Bolatzhanova, [...], Atnafu Lambebo Tonja, Thamar Solorio Emilio Villa-Cueva, Sholpan Bolatzhanova, Diana Turmakhan, Kareem Elzeky, Henok Biadglign Ademtew, Alham Fikri Aji, Vladimir Araujo, Israel Abebe Azime, Jinheon Baek, Frederico Belcavello, Fermin Cristobal, Jan Christian Blaise Cruz, Mary Dabre, Raj Dabre, Toqeer Ehsan, Naome A Etori, Fauzan Farooqui, Jiahui Geng, Guido Ivetta, Thanmay Jayakumar, Soyeong Jeong, Zheng Wei Lim, Aishik Mandal, Sofia Martinelli, Mihail Minkov Mihaylov, Daniil Orel, Aniket Pramanick, Sukannya Purkayastha, Israfel Salazar, Haiyue Song, Tiago Timponi Torrent, Debela Desalegn Yadeta, Injy Hamed, Atnafu Lambebo Tonja, Thamar Solorio [collapse]
Findings EMNLP, 2025
A benchmark corpus with over 5,800 triples across 19 languages investigating whether images can act as cultural context in multimodal translation.