CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Sat, 13 Dec 2025 07:29:18 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"693d15ce-69bc" expires: Tue, 30 Dec 2025 08:21:00 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: F1A9:3ABDEF:9C069B:AF5EE1:69538914 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 08:11:00 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210045-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767082261.746091,VS0,VE218 vary: Accept-Encoding x-fastly-request-id: 20063ece83d857a21c1d7c71e2e16d3f98dc0911 content-length: 7144 Patrick Amadeus Irawan | MBZUAI

Patrick Amadeus Irawan

PhD Student
MBZUAI
patrick.irawan@mbzuai.ac.ae

About Me

I am a PhD student at MBZUAI 🤖, where I focus on multimodal NLP topics, particularly vision-language interaction. I also work on vision-language alignment, unified (any) multimodal models, and large-scale evaluations. I am advised by Alham Fikri Aji and Yova Kementchedjhieva.
Previously, I was a Research Engineer at SMU 🇸🇬 working at the intersection of multilingual and multimodal interpretation under Chong-Wah Ngo. Before that, I earned my bachelor’s degree in CS from Institut Teknologi Bandung, where I worked under Ayu Purwarianti on explainable multimodal synthetic data generation.
In addition to research, I also do AI engineering for various use cases. You can see my other experiences here.
Further, I plan to specialize my studies in developing methods to better align different modalities, with the goal of mitigating modality imbalance, especially in the avenue of unified multimodal models.

Research Interests

During my studies and prior experience, I have often worked on topics including, but not limited to, the following:

Multimodal Imbalance: I believe that imbalanced learning is a significant bottleneck that prevents us from obtaining reliable multimodal models, as modality shortcuts and biases can harm both performance and the objectivity of evaluation. My work focuses on discovering its root causes and exploring methods to better align models to prevent such issues.
LLM/VLM Alignment: I also work on both architectural and non-architectural adaptations (knowledge enrichment, data reformulation, RL) to address above issues and/or improve multimodal language modeling in general.
Large-Scale Evaluations: I often question model robustness in scenarios with varying resource levels; however, probing this requires designing both broad and specific evaluation coverage. My work in this area aims to design benchmarks that assess the inclusivity of multimodal models, specifically by addressing concept underrepresentation through targeted data curation in multilingual and multicultural domains.

Updates

[Nov. 2025] Our study exposing the confusion of VLMs in cultural-conflict visual scenario is up on arXiv!
[Dec. 2025] M4-RAG is out on arXiv! We present an evaluation of how multimodal knowledge enrichment helps model in tackling multilingual query. Spoiler, it does not always help… 🤯
[Oct. 2025] Entropy2Vec got accepted into MRL Workshop @ EMNLP 2025 🌐🇨🇳!
[July 2025] Seeing Culture Benchmark is accepted to EMNLP 2025 🇨🇳! On to the next one with SMU Multimedia team 💪
[May. 2025] DataRubrics is now on arXiv! We propose a unified scorecard to evaluate data quality on multi-faceted metrics.
[Apr. 2025] WorldCuisines receives Best Theme Paper award at NAACL 2025! 🎉🌏🍽️
[Mar. 2025] Admitted to the Fall 2025 cohort of the MBZUAI PhD program in NLP! 📚
[Jan. 2025] WorldCuisines and ProxyLM are accepted to NAACL 2025 🇺🇸 🎖️
[Nov. 2024] My first first-author paper, a VL synthetic data generation framework, is accepted to COLING 2025 🎉
[Oct. 2024] WorldCuisines, the largest multicultural VL food benchmark, is released. Honored to co-lead the project 🥘
[Sep. 2024] SEACrowd is accepted to EMNLP 2024! 🇺🇸

Publications

2025

Vision Language Models are Confused Tourists

Patrick Amadeus Irawan, Ikhlasul Akmal Hanif, Muhammad Dehan Al Kautsar, Genta Indra Winata, Fajri Koto, Alham Fikri Aji

PDF

M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

David Anugraha, Patrick Amadeus Irawan, Anshul Singh, En-Shiun Annie Lee, Genta Indra Winata

PDF

EMNLP

Seeing Culture: A Benchmark for Visual Reasoning and Grounding

Burak Satar, Zhixin Ma, Patrick Amadeus Irawan, Wilfried A. Mulyawan, Jing Jiang, Ee-Peng Lim, Chong-Wah Ngo

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025.

PDF Code Poster

MRL @ EMNLP

Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations

Patrick Amadeus Irawan, Ryandito Diandaru, Belati Jagad Bintang Syuhada, Randy Zakya Suchrady, Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya

Multilingual Representation Learning (MRL) Workshop @ EMNLP 2025

PDF Poster

NAACL

Worldcuisines: A massive-scale benchmark for multilingual and multicultural visual question answering on global cuisines

Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, and others

North American Chapter of the Association for Computational Linguistics (NAACL), 2025.

PDF Code 🏆 Best Theme Paper 🏆

NAACL

ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

David Anugraha, Genta Indra Winata, Chenyue Li, Patrick Amadeus Irawan, En-Shiun Annie Lee

North American Chapter of the Association for Computational Linguistics (NAACL), 2025.

PDF Code Poster

COLING

Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models

Patrick Amadeus Irawan, Genta Indra Winata, Samuel Cahyawijaya, Ayu Purwarianti

International Conference on Computational Linguistics (COLING), 2025.

PDF Code Main (Oral)

Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability

Genta Indra Winata, David Anugraha, Emmy Liu, Alham Fikri Aji, Shou-Yi Hung, Aditya Parashar, Patrick Amadeus Irawan, and others

PDF

2024

EMNLP

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V Miranda, Jennifer Santoso, Elyanah Aco, ..., Patrick Amadeus Irawan, and others

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

PDF Code Poster

APSIPA ASC

Leveraging IoT and Machine Learning for Efficient Rice Stock Monitoring and Prediction

Nana Sutisna, Aditya Prawira Nugroho, Christopher Jeffrey, Patrick Amadeus Irawan, Rizky Ramadhana, Ronggur Mahendra, Michael Jonathan, Infall Syafalni, Trio Adiono

2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

PDF Main (Oral)

Misc.

Experience

Research Engineer @ Singapore Management University (2025 - Now)
Research Engineer Intern @ ai&you (2024 -2024)
Data Scientist Intern @ Supertype (2023 - 2023)
Software Engineer Intern @ Blibli (2022 - 2022)
Software Engineer Intern @ Ruangguru (2022 - 2022)

Conference Reviewers

Original Source | Taken Source