PhD, University of EdinburghNov 2016 - Jun 2020 Thesis: Approximating Neural Machine Translation for Efficiency. Supervised by Kenneth Heafield and Rico Sennrich. Examiner: Graham Neubig and Barry Haddow
MSc Artificial Intelligence, University of EdinburghSep 2014 - Aug 2015 With distinction. Final project: Haiku generator with word vector model.
BSc Computer Science, Universitas IndonesiaAug 2010 - Jul 2014 Final project: Earthquake detector from phone’s accelerometer reading.
Working Experience
Visiting Research Scientist, Google Research Sep 2024 - Current
Adjunct Assistant Professor, Monash Indonesia Jan 2024 - Current
Assistant Professor, MBZUAI Jan 2023 - Current
Applied Scientist, Amazon Alexa AI Oct 2021 - Jan 2023
Postdoctoral Research Associate, University of Edinburgh Jun 2020 - Jul 2021
Research Scientist, Kata.ai Nov 2019 - Sep 2021
Engineering Intern, Google Research Jul 2017 - Nov 2017
Language Engineer, Apple Siri Oct 2015 - Oct 2016
Awards
MBZUAI’s “Early Career Researcher Award” 2025
Best Resource Paper Award, ACL 2025
Best Theme Paper Award, NAACL 2025
Best Resource Paper Award, EACL 2024
Best Resource Paper Award, AACL 2023
Outstanding Paper Award, EACL 2023
Outstanding Contribution Award, WNGT 2019
World Finalists, ACM-ICPC 2014
Silver Medalists, International Olympiad of Informatics (IOI) 2010
Professional Activities
Services to Scientific Communities
Advisory Board
The ACL Special Interest Group on SEA NLP (SIGSEA) (2025 - present)
Muhammad Ravi Sulthan Habibi
(Universitas Indonesia)
2022 - 2023
BSc
| Co-advised with Rahmad Mahendra
Now: AI Engineer at Mekari
Grants and Funding
Project
Awarding Body
Amount
Dates
Info
Persuasive Booking Agent Chatbot
Etihad
$450,000
2025-2026
Nils Lukas (PI), Salem Lahlou, Alham Fikri Aji, Mingming Gong, Martin Takac
Question Answering for Arabic Dialects
IBM-MBZUAI Collaboration
~$150,000
2023–2027
Part of IBM-MBZUAI collaboration. No money change between parties, so fund comes from MBZUAI for postdoc hiring and data-annotation cost.
Token-Order Prediction
Manifold Labs
~$70,000
2025
Alham Fikri Aji and Zayd Zuhri (Research Engineer, MBZUAI). Unlimited acess to 8xH200 node.
Sink-free Attention in Transformers
Fal.ai
$12,500
2025
Alham Fikri Aji and Zayd Zuhri (Research Engineer, MBZUAI).
Lambda Multimodal AI Grand Challenge
Lambda
$10,000
2025
Genta Indra Winata (Capital One), Patrick Amadeus Irawan (PhD student, MBZUAI), Alham Fikri Aji
Google Cloud Research Credit
Google
$55,000
2024-2025
PI: Alham Fikri Aji.
SEACrowd: Consolidating South-east Asia NLP dataset
Cohere For AI
$3,000
2024
Together in collaboration with SEACrowd communities
Teachings
Algorithm and Data Structure (UG)Spring 2026 MBZUAI (Co-instructor). Algorithm and data structure for first-year undergraduate students
NLP702/NLP806: Advanced Natural Language Processing (for MSc and PhD)Spring 2026 MBZUAI (Co-instructor). Covered advanced NLP topics, including LLMs, distributed training, multilinguality, interpretability, and multimodality in NLP.
NLP702/NLP806: Advanced Natural Language Processing (for MSc and PhD)Spring 2025 MBZUAI (Main instructor). Covered advanced NLP topics, including LLMs, distributed training, multilinguality, interpretability, and multimodality in NLP.
FIT5145: Intro to Data Science (for MSc)Term 4 2024 Monash Indonesia (Main instructor). Introduction to Python, data science, and AI.
NLP702: Advanced Natural Language Processing (for MSc)Spring 2024 MBZUAI (Co-instructor). Covered efficient and large-scale NLP, including LLM, distributed training, distillation, parameter-efficient fine-tuning, and linear Transformers.
NLP801: Deep Learning for Language Processing (for PhD)Fall 2023 MBZUAI (Main instructor). Designed and taught the module, covering various recent research topics and trends in NLP.
Talks
Code-Switching Thought Patterns in Multilingual Language Models3rd May 2025 Keynote at CALCS, Co-located with NAACL 2025
On Grassroots Effort for Low-Resource Data Collection20th Jan 2025 Keynote at CLTW, Co-located with COLING 2025
Collaborative Multilingual Data Collection15th Nov 2024 Keynote at WiNLP, Co-located with EMNLP 2024
Insights from Language Resource Collection in Linguistically Diverse Southeast Asian Languages16th Aug 2024 Keynote at Field Matter Workshop, Co-located with ACL 2024
Training Lightweight Model via Knowledge Distillation and Parameter Efficient Finetuning14-15th Jun 2024 Mexican NLP Summer School, Co-located with NAACL 2024
Consolidating NLP Resources for South-East Asian Languages27th May 2024 Google Singapore, Invited Talk
Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages21st Nov 2023 Google Singapore, Invited Talk
Building Multilingual & Multicultural LLMs: Methods and Challenges20th Nov 2023 AI Singapore, Invited Talk
Q2AI: A Quick Course to Quick AI17th Nov 2023 PRICAI, Tutorial
Current Status of NLP in South East Asia with Insights from Multilingualism and Language Diversity1st Nov 2023 AACL, Tutorial
Surviving your PhD Study2nd Aug 2023 Telkom University, Invited Talk
Generative AI with Large Language Models Workshop1st Aug 2023 Institut Teknologi Bandung, Invited Talk
Multilingual and Low-Resource NLP25th May 2023 Universitas Indonesia & Tokopedia AI Center, Invited Talk
Can AI Complete My Academic Writings?14th May 2023 Doctrine UK, Online Talk
Multilingual NLP through Collaborative Research23rd Feb 2023 The 2nd Composable, Automatic and Scalable Learning Workshop (CASL), Invited Talk
Sequence-to-Sequence and Neural Machine Translation Model28th Apr 2021 Universitas Indonesia, Guest Lecture
Publications
I mainly publish at ACL conferences. You may also refer to my Google Scholar for an updated list of publications. ● denotes my role as (Co-)senior author(s), whereas ■ denotes my role as main author(s). Underline denotes MBZUAI students and researchers (including visiting/interns).
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation. Emilio Villa-Cueva, Sholpan Bolatzhanova, Diana Turmakhan, Kareem Elzeky, Henok Biadglign Ademtew, Alham Fikri Aji, Israel Abebe Azime, Jinheon Baek, Frederico Belcavello, Fermin Cristobal, Jan Christian Blaise Cruz, Mary Dabre, Raj Dabre, Toqeer Ehsan, Naome A Etori, Fauzan Farooqui, Jiahui Geng, Guido Ivetta, Thanmay Jayakumar, Soyeong Jeong, Zheng Wei Lim, Aishik Mandal, Sofía Martinelli, Mihail Minkov Mihaylov, Daniil Orel, Aniket Pramanick, Sukannya Purkayastha, Israfel Salazar, Haiyue Song, Tiago Timponi Torrent, Debela Desalegn Yadeta, Injy Hamed, Atnafu Lambebo Tonja, Thamar Solorio (Findings of the Association for Computational Linguistics: EMNLP 2025, 2025)
MoMentS: A Comprehensive Multimodal Benchmark for Theory of Mind. Emilio Villa-Cueva, S M Masrur Ahmed, Rendi Chevi, Jan Christian Blaise Cruz, Kareem Elzeky, Fermin Cristobal, Alham Fikri Aji, Skyler Wang, Rada Mihalcea, Thamar Solorio (Findings of the Association for Computational Linguistics: EMNLP 2025, 2025)
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages. Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine de Kock, Nirmal Surange, Daniela Teodorescu, Ibrahim Said Ahmad, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino D. M. A. Ali, Ilseyar Alimova, Vladimir Araujo, Nikolay Babakov, Naomi Baes, Ana-Maria Bucur, Andiswa Bukula, Guanqun Cao, Rodrigo Tufiño, Rendi Chevi, Chiamaka Ijeoma Chukwuneke, Alexandra Ciobotaru, Daryna Dementieva, Murja Sani Gadanya, Robert Geislinger, Bela Gipp, Oumaima Hourrane, Oana Ignat, Falalu Ibrahim Lawan, Rooweither Mabuya, Rahmad Mahendra, Vukosi Marivate, Alexander Panchenko, Andrew Piper, Charles Henrique Porto Ferreira, Vitaly Protasov, Samuel Rutunda, Manish Shrivastava, Aura Cristina Udrea, Lilian Diana Awuor Wanzare, Sophie Wu, Florian Valentin Wunderlich, Hanif Muhammad Zhafran, Tianhui Zhang, Yi Zhou, Saif M. Mohammad (ACL, 2025)
-- Best Resource Paper🏅
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia. Samuel Cahyawijaya, Holy Lovenia, Joel Ruben Antony Moniz, Tack Hwa Wong, Mohammad Rifqi Farhansyah, Thant Thiri Maung, Frederikus Hudi, David Anugraha, Muhammad Ravi Shulthan Habibi, Muhammad Reza Qorib, Amit Agarwal, Joseph Marvin Imperial, Hitesh Laxmichand Patel, Vicky Feliren, Bahrul Ilmi Nasution, Manuel Antonio Rufino, Genta Indra Winata, Rian Adam Rajagede, Carlos Rafael Catalan, Mohamed Fazli Mohamed Imam, Priyaranjan Pattnayak, Salsabila Zahirah Pranida, ... , Jan Christian Blaise Cruz, Ming Shan Hee, Ikhlasul Akmal Hanif, M.Alif Al Hakim, Muhammad Rizky Sya'ban, Kun Kerdthaisong, Lester James Validad Miranda, Fajri Koto, Tirana Noor Fatyanosa, Alham Fikri Aji, Jostin Jerico Rosal, Jun Kevin, Robert Wijaya, Onno P. Kampman, Ruochen Zhang, Börje F. Karlsson, Peerat Limkonchotiwat (ACL, 2025)
From Multiple-Choice to Extractive QA: A Case Study for English and Arabic. Teresa Lynn, Malik H. Altakrori, Samar M. Magdy, Rocktim Jyoti Das, Chenyang Lyu, Mohamed Nasr, Younes Samih, Kirill Chirkunov, Alham Fikri Aji, Preslav Nakov, Shantanu Godbole, Salim Roukos, Radu Florian and Nizar Habash (COLING, 2025)
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages. Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James Validad Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Jann Railey Montalan, ... , Peerat Limkonchotiwat, Alham Fikri Aji, Sedrick Keh, Genta Indra Winata, Ruochen Zhang, Fajri Koto, Zheng Xin Yong, Samuel Cahyawijaya (EMNLP, 2024)
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages. Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Rifki Afina Putri, Emmanuel Dave, Jhonson Lee, Nuur Shadieq, Wawan Cenggoro, Salsabil Maulana Akbar, Muhammad Ihza Mahendra, Dea Annisayanti Putri, Bryan Wilie, Genta Indra Winata, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung (ACL, 2024)
Nusawrites: Constructing high-quality corpora for underrepresented and extremely low-resource languages. Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung (AACL, 2023)
-- Best Resource Paper🏅
Crosslingual Generalization through Multitask Finetuning. Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff and Colin Raffel (ACL, 2023)
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. Zheng-Xin Yong, Hailey Schoelkopf, Niklas Muennighoff, Alham Fikri Aji, David Ifeoluwa Adelani, Khalid Almubarak, M Saiful Bari, Lintang Sutawika, Jungo Kasai, Ahmed Baruwa, Genta Indra Winata, Stella Biderman, Edward Raff, Dragomir Radev, Vassilina Nikoulina (ACL, 2023)
Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation. Haryo Akbarianto Wibowo, Tatag Aziz Prawiro, Muhammad Ihsan, Alham Fikri Aji, Radityo Eko Prasojo, Rahmad Mahendra, Suci Fitriany (IALP, 2020)
Marian: Fast neural machine translation in C++. Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Grundkiewicz, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, Andre Martins, Alexandra Birch (ACL, 2018)
Toward a standardized and more accurate Indonesian part-of-speech tagging. Kemal Kurniawan, Alham Fikri Aji (IALP, 2018)
Efficient Machine Translation with Model Pruning and Quantization. Maximiliana Behnke, Nikolay Bogoychev, Alham Fikri Aji, Kenneth Heafield, Graeme Nail, Qianqian Zhu, Svetlana Tchistiakova, Jelmer van der Linde, Pinzhen Chen, Sidharth Kashyap, Roman Grundkiewicz (WMT at EMNLP, 2021)
Can smartphones be used to detect an earthquake? Using a machine learning approach to identify an earthquake event. Alham Fikri Aji, I Putu Edy Suardiyana Putra, Petrus Mursanto, Setiadi Yazid (SysCon, 2014)