| CARVIEW |
Raúl Vázquez
Postdoctoral researcher at University of Helsinki in the Multimodality research group
Email: raul.vazquez@helsinki.fi
ABOUT ME
I am a posdoctoral researcher and teaching staff at the University of Helsink. I specialize in advancing natural language understanding and processing (NLU & NLP) through data-driven methodologies.
My current research focuses on uncertainty modelling, hallucination detection, and representation learning in neural language models.
I obtained my PhD in Computational Linguistics in 2023 at University of Helsinki, where my research centered on representation learning using massively multilingual parallel corpora, and the development
of models for multilingual and multimodal machine translation. Prior to my PhD, I earned an MSc in Mathematical Modelling from Università degli Studi dell'Aquila in 2017 and a BSc in Applied Mathematics
from ITAM (Mexico's Autonomous Institute of Technology) in 2015.
Building on this foundation, my current research aims at building language technology that meets the challenges of real-world applications by traversing four key avenues:
- Multilingual hallucination detection
- Improving the reliability and decision-making of NLP systems
- Data contamination in knowledge distillation frameworks
- Methods for improving NLP for under-resourced scenarios and underrepresented communities
EDUCATION
PhD. in Language Technology
University of Helsinki
2018-2022
Supervisor: Jörg Tiedemann
co-Supervisor: Mathias Creutz
Projects: FoTran and MeMAD
Focus Areas: Natural Language Processing, Neural Machine Translation, Mathematical Modelling
M. Sc. Mathematical Engineering
Universitá degli Studi dell'Aquila
2015-2017
Scholarship award from the EU Commission.
Thesis: The Vehicle Routing Problem - An Application and Proposed Heuristics
Adviser: Claudio Arbib
co-Adviser: Stefano Smriglio
Focus Area: ODEs, PDEs, Optimization
M. Sc. Technomathematik
Hamburg Universität
2016-2017
Joint degree awarded for completing the MathMods Joint M.Sc.
Focus Areas: non-smooth Convex Optimization and Numerical Approximation methods for PDEs
B.Sc. in Applied Mathematics
Instituto Tecnológico Autónomo de México (ITAM)
2008-2015
Thesis: A Bayesian Approach to Time Series Smoothing via the Hodrick-Prescott Filter
Modelling
Adviser: Enrique de Alba Guerra
Focus Areas: Statistics, Probability, Linear and non-Linear Optimization
EXPERIENCE
Researcher
University of Helsinki
2018-2023
Development of models for natural language understanding (NLU) with a data-driven approach that incorporates massively multilingual parallel corpora and the use of diverse input signals.
Project Coordinator and Research Assistant
INEGI - Mexico's Official Statistics Bureau
2014-2015
I assisted the Institute’s vice-president of Economical Information, Enrique de Alba Guerra. Duties involved statistical analysis of time series and results visualization, as well as application and implementation of research articles.
Data Science Lab Collaborator
Sedesol - Secretatiat of Social Development
2017-2018
Directly involved in the development of a monitor of agricultural production via remote-sensing data as part of a hunger early-warning system for the Mexican government.
Logistics Coordinator Volunteer
TECHO | Mexico
2012
I collaborated with the planning and implementation of a project for the detection of irregular settlements, and the establishment of a first contact between said settlements and the NGO.
Research Collaboration
LFondry and Universitá degli Studi dell'Aquila
2017
Development of an optimization model for improving LFoundry’s production line. Customization of the Vehicle Routing Problem through Gurobi and GLPK software to improve microchips pickup and delivery time.
Vehicle Inspection Intern
Volkswagen Hannover
2012
Created a graphical interface to improve the production line operation. I also prepared automated infrared signal sender and receiver for facilitating the interface usage and information extraction.
PUBLICATIONS
2025
Timothee Mickus, Aman Sinha, Raúl Vázquez (2025). Your Model is Overconfident, and Other Lies We Tell Ourselves
Proceedings of ACL 2025 (Volume 1: Long Papers)
Bryan Eikema, Raúl Vázquez, Jonathan Berant, Marie-Catherine de Marneffe, Barbara Plank, Artem Shelmanov, Swabha Swayamdipta, Jörg Tiedemann, Chrysoula Zerva, Wilker Aziz (2025). Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)
EMNLP 2025.
Ona de Gibert, Joseph Attieh, Teemu Vahtola, Mikko Aulamo, Zihao Li, Raúl Vázquez,, Tiancheng Hu, Jörg Tiedemann (2025). Scaling Low-Resource MT via Synthetic Data Generation with LLMs
Proceedings of EMNLP 2025
Ona De Gibert, Robert Pugh, Ali Marashian, Raúl Vázquez, Abteen Ebrahimi, Pavel Denisov, Enora Rice, Edward Gow-Smith, Juan Prieto, Melissa Robles, Rubén Manrique, Oscar Moreno, Angel Lino, Rolando Coto-Solano, Aldo Alvarez, Marvin Agüero-Torales, John E. Ortega, Luis Chiruzzo, Arturo Oncevay, Shruti Rijhwani, Katharina Von Der Wense, Manuel Mager (2025). Findings of the AmericasNLP 2025 Shared Tasks on Machine Translation, Creation of Educational Material, and Translation Metrics for Indigenous Languages of the Americas
Proceedings of AmericasNLP 2025
Raúl Vázquez, Timothee Mickus, Elaine Zosa, Teemu Vahtola, Jörg Tiedemann, Aman Sinha, Vincent Segonne, Fernando Sanchez-Vega, Alessandro Raganato, Jindřich Libovický, Jussi Karlgren, Shaoxiong Ji, Jindřich Helcl, Liane Guillou, Ona De Gibert, Jaione Bengoetxea, Joseph Attieh, Marianna Apidianaki (2025). SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes
Proceedings of SemEval-2025
Hengyu Luo, Zihao Li, Joseph Attieh, Sawal Devkota, Ona de Gibert, Xu Huang, Shaoxiong Ji, Peiqin Lin, Bhavani Sai Praneeth Varma Mantina, Ananda Sreenidhi, Raúl Vázquez, Mengjie Wang, Samea Yusofi, Fei Yuan, Jörg Tiedemann (2025). GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models
Proceedings EMNLP 2025 (System Demonstrations)
2024
Timothee Mickus, Elaine Zosa, Raúl Vázquez, Teemu Vahtola, Jörg Tiedemann, Vincent Segonne, Alessandro Raganato, Marianna Apidianaki (2024). SemEval-2024 Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes
Proceedings of SemEval-2024
Raúl Vázquez, Hande Celikkanat, Dennis Ulmer, Jörg Tiedemann, Swabha Swayamdipta, Wilker Aziz, Barbara Plank, Joris Baan and Marie-Catherine de Marneffe (2024). Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)
Proceedings of UncertaiNLP2024
Timothee Mickus, Raúl Vázquez, Joseph Attieh (2024). I Have an Attention Bridge to Sell You: Generalization Capabilities of Modular Translation Architectures
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
Raúl Vázquez, Timothee Mickus, Jörg Tiedemann, Ivan Vulić and Ahmen Üstün (2024). Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)
Proceedings of MOOMIN2024
Timothee Mickus, Stig-Arne Grönroos, Joseph Attieh, Michele Boggia, Ona De Gibert, Shaoxiong Ji, Niki Andreas Loppi, Alessandro Raganato, Raúl Vázquez, and Jörg Tiedemann (2024). MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki
Proceedings of EACL2024
Abteen Ebrahimi, Ona de Gibert, Raúl Vázquez, Rolando Coto-Solano, Pavel Denisov, Robert Pugh, Manuel Mager, Arturo Oncevay, Luis Chiruzzo, Katharina von der Wense, Shruti Rijhwani Findings of the AmericasNLP 2024 Shared Task on Machine Translation into Indigenous Languages
Proceedings AmericasNLP 2024
2023
Raúl Vázquez. (2023). Representation Learning in Multilingual Neural Machine Translation. [Doctoral Thesis, University of Helsinki].
Helsingin yliopisto. https://hdl.handle.net/10138/566538
Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raúl Vázquez, Sami Virpioja. 2023. Democratizing neural machine translation with OPUS-MT
Language Resources and Evaluation, Volume 58, Issue 2
Ona de Gibert, Raúl Vázquez, Mikko Aulamo, Sami Virpioja and Jörg Tiedemann (2023). Four Approaches to Low-Resource Multilingual NMT: The Helsinki Submission to the AmericasNLP 2023 Shared Task
Proceedings of AmericasNLP2023
Timothee Mickus and Raúl Vázquez (2023). Why Bother with Geometry? On the Relevance of Linear Decompositions of Transformer Embeddings
Proceedings of BlackboxNLP2023
Michele Boggia, Stig-Arne Grönroos, Niki Loppi, Timothee Mickus, Alessandro Raganato, Jörg Tiedemann and Raúl Vázquez, (2023). Dozens of Translation Directions or Millions of Shared Parameters? Comparing Two Types of Multilinguality in Modular Machine Translation
Proceedings of NoDaLiDa2023
Jörg Tiedemann, Mikko Aulamo, D. Bakshandaeva, Michele Boggia, Stig-Aarne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raúl Vázquez , and Sami Virpioja (2023). Democratizing neural machine translation with OPUS-MT.
In: Language Resources and Evaluation
2022
Raúl Vázquez, Michele Boggia, Alessandro Raganato, Niki A. Loppi, Stig-Arne Grönroos and Jörg Tiedemann (2022). Latest Development in the FoTran Project – Scaling Up Language Coverage in Neural Machine Translation Using Distributed Training with Language-Specific Components.
Proceedings of EAMT2022.
Raúl Vázquez, Hande Celikkanat, Vinit Ravishankar, Mathias Creutz and Jörg Tiedemann (2022). A Closer Look at Parameter Contributions When Training Neural Language and Translation Models.
Proceedings of COLING2022
2021
Alessandro Raganato, Raúl Vázquez, Mathias Creutz and Jörg Tiedemann. (2021). An Empirical Investigation of Word Alignment Supervision for Zero-shot Multilingual Neural Machine Translation.
Proceedings of EMNLP2021.
Raúl Vázquez, Hande Celikkanat, Mathias Creutz and Jörg Tiedemann (2021). On the differences between BERT and MT encoder spaces and how to address them in translation tasks.
ACL-SRW2021.
Raúl Vázquez, Yves Scherer, Sami Virpioja and Jörg Tiedemann (2021). The Helsinki Submission to the AmericasNLP shared task.
Proceedings of NAACL2021 (AmericasNLP)
2020
Raúl Vázquez, Alessandro Raganato, Mathias Creutz, and Jörg Tiedemann (2020). A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation.
Computational Linguistics. Volume 46, Issue 2.
Raúl Vázquez, Mikko Aulamo, Umut Sulubacak, & Jörg Tiedemann (2020). The University of Helsinki Submission to the IWSLT2020 Offline Speech Translation Task.
Proceedings of IWSLT2020
2019
Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann and Mathias Creutz. 2019. Multilingual NMT with a language-independent attention bridge.
Proceedings of RepL4NLP2019.
Alessandro Raganato, Raúl Vázquez, Mathias Creutz and Jörg Tiedemann. 2019. An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation.
Proceedings of RepL4NLP2019.
Raúl Vázquez, Umut Sulubacak and Jörg Tiedemann. 2019. The University of Helsinki Submission to the WMT2019 Parallel Corpus Filtering Task.
Proceedings of WMT2019.
Yves Scherrer, Raúl Vázquez, and Sami Virpioja. 2019. The University of Helsinki Submissions to the WMT2019 Similar Languages Translation Task.
Proceedings of WMT2019.
Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, and Jörg Tiedemann. 2019. The University of Helsinki Submission to the WMT2019 News Translation Task.
Proceedings of WMT2019.
2018
Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann and Mathias Creutz. 2018. Multilingual NMT with a language-independent attention bridge.
arXiv preprint arXiv:1811.00498.
Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphael Troncy and Raúl Vázquez.
2018. The MeMAD Submission to the WMT18 Multimodal Translation Task. Proceedings of WMT2018. Belgium Brussels
CONTACT
University email: raul.vazquez@helsinki.fi
Unioninkatu 40, Saali 36 (B602)
00014 - Helsinki, Finland.