| CARVIEW |
Mohammad Aflah Khan
Hi, I’m Aflah, a research software engineer at the Max Planck Institute for Software Systems. My work centers on deepening our understanding of large language models (LLMs) and rigorously evaluating their capabilities. I’m also passionate about the systems side of LLMs, with hands-on experience in large-scale pretraining and inference. In the past, I’ve contributed to projects targeting hate speech reduction and other NLP applications for social good.
Open to roles in research, research engineering, or backend engineeringExperience
Ongoing first (rest sorted by end date)
Max Planck Institute for Software Systems (MPI-SWS)
Research Software Engineer • April, 2024 — Present [Full Time] | Nov, 2023 — March, 2024 [Part time] | Aug, 2023 — Oct, 2023 [Intern]
Working under Dr Krishna Gummadi to explore different aspects of LLMs. Some areas we've explored/are exploring are
- Optimizing pre-training and inference for LLMs
- LLM memorization and the impact of Parameter-Efficient Fine-Tuning (PEFT) on memorization
- Knowledge acquisition and evaluation of factual knowledge in LLMs
- Built and currently maintain key internal tools OpenChat (An internal chatbot), MaxCast (A research paper-to-podcast conversion service) & MaxChat (A document-based chat service). These services were developed from scratch, including hosting models on-premises and fine-tuning for optimal performance.
- Published and submitted research to top-tier (A*) conferences
EleutherAI
Open Source Contributor • Dec, 2022 — Present
Currently working on the Multilingual Natural Instructions project to build a massive instruction tuning corpus for Hindi. Previously worked on -
- Pythia - A Suite for Analyzing Large Language Models Across Training and Scaling (Accepted ICML'23) - Majorly contributed to the gender bias evals and intervention case study. The models have over 18 million downloads (as of April 2025)
- Recite, Reconstruct, Recollect - Memorization in LMs as a Multifaceted Phenomenon (Accepted ICLR'25) - An intuitive taxonomy to classify memorized sequences and then build predictors based on these classes
Laboratory for Computational Social Systems (LCS2)
Undergraduate Student Researcher • June, 2021 — May, 2024
I've worked on a variety of projects, from hate speech normalization to designing recommendations for fine-tuning improved hate speech detectors. I also led the QUENCH project, a benchmark aimed at evaluating advanced reasoning abilities in large language models, with a particular emphasis on Indic contexts.
Goldman Sachs
Summer Analyst • May, 2023 — July, 2023
Worked in the Finance, Planning & Analysis Engineering division towards revamping the central hub of the department. Also built POCs based on user feedback to improve the search and access experience on the webapp. Also recieved a return offer to join full time as an Analyst.
Google Summer of Code - TensorFlow
Open Source Developer • May, 2022 — Sept, 2022
Worked with Matthew Watson & Chen Qian towards adding support for data augmentation layers to KerasNLP a library under the Keras/TensorFlow Ecosystem which aims to build industry oriented NLP Solutions. I also contributed to several bug fixes and other utilities such as tokenizers and transformer encoder & decoder.
Publications
* indicates equal contribution
2026
In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations
Mohammad Aflah Khan, Mahsa Amani, Soumi Das, Bishwamittra Ghosh, Qinyuan Wu, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander
IASEAI 2026 - The International Association for Safe & Ethical AI (An earlier version of this work was presented at R2-FM, ICML 2025)
Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
Qinyuan Wu, Soumi Das, Mahsa Amani, Bishwamittra Ghosh, Mohammad Aflah Khan, Krishna P. Gummadi, Muhammad Bilal Zafar
IASEAI 2026 - The International Association for Safe & Ethical AI (An earlier version of this work was presented at MemFM, ICML 2025)
Revisiting Privacy, Utility, and Efficiency Trade-offs when Fine-Tuning Large Language Models
Soumi Das, Camila Kolling, Mohammad Aflah Khan, Mahsa Amani, Bishwamittra Ghosh, Qinyuan Wu, Till Speicher, Krishna P. Gummadi
IASEAI 2026 - The International Association for Safe & Ethical AI
2025
TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability
Mohammad Aflah Khan*, Ameya Godbole*, Johnny Tian-Zheng Wei, Ryan Wang, James Flemings, Krishna Gummadi, Willie Neiswanger, Robin Jia
EMNLP 2025 System Demonstrations - The 2025 Conference on Empirical Methods in Natural Language Processing
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra
ICLR 2025 - The Thirteenth International Conference on Learning Representations
Towards Reliable Latent Knowledge Estimation in LLMs: Zero-Prompt Many-Shot Based Factual Knowledge Extraction
Qinyuan Wu, Mohammad Aflah Khan, Soumi Das, Vedant Nanda, Bishwamittra Ghosh, Camila Kolling, Till Speicher, Laurent Bindschaedler, Krishna P Gummadi, Evimaria Terzi
WSDM 2025 - Proceedings of the 18th ACM International Conference on Web Search and Data Mining
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs
Mohammad Aflah Khan*, Neemesh Yadav*, Sarah Masud, Md Shad Akhtar
COLING 2025 - Proceedings of the 31st International Conference on Computational Linguistics
In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations
Mohammad Aflah Khan, Mahsa Amani, Soumi Das, Bishwamittra Ghosh, Qinyuan Wu, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander
R2-FM @ ICML 2025 - Workshop on Reliable and Responsible Foundation Models
Rethinking Memorization Measures in LLMs: Recollection vs. Counterfactual vs. Contextual Memorization
Bishwamittra Ghosh, Soumi Das, Qinyuan Wu, Mohammad Aflah Khan, Krishna P. Gummadi, Evimaria Terzi, Deepak Garg
MemFM @ ICML 2025 - The Impact of Memorization on Trustworthy Foundation Models
Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
Qinyuan Wu, Soumi Das, Mahsa Amani, Bishwamittra Ghosh, Mohammad Aflah Khan, Krishna P. Gummadi, Muhammad Bilal Zafar
MemFM @ ICML 2025 - The Impact of Memorization on Trustworthy Foundation Models
Preprints
Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE
Mohammad Aflah Khan, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander
Under Review
Hubble: a Model Suite to Advance the Study of LLM Memorization
Johnny Tian-Zheng Wei*, Ameya Godbole*, Mohammad Aflah Khan*, Ryan Wang, Xiaoyuan Zhu, James Flemings, Nitya Kashyap, Krishna P. Gummadi, Willie Neiswanger, Robin Jia
Under Review
Understanding the Mechanics and Dynamics of Memorisation in Large Language Models: A Case Study with Random Strings
Till Speicher, Mohammad Aflah Khan, Qinyuan Wu, Vedant Nanda, Soumi Das, Bishwamittra Ghosh, Krishna P. Gummadi, Evimaria Terzi
2024
The Duality of Hope: A Critical Examination of Controversial Annotations in HopeEDI
Mohammad Aflah Khan*, Neemesh Yadav*, Diksha Sethi*, Raghav Sahni*
The Second Tiny Papers Track at ICLR 2024
Probing Critical Learning Dynamics of PLMs for Hate Speech Detection
Sarah Masud*, Mohammad Aflah Khan*, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty
EACL 2024 - Findings of the Association for Computational Linguistics
2023
Overview of the HASOC Subtracks at FIRE 2023: Detection of Hate Spans and Conversational Hate-Speech
Shrey Satapara, Sarah Masud, Hiren Madhu, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty, Sandip Modha, Thomas Mandl
FIRE 2023 - Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation
Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection
Sarah Masud, Mohammad Aflah Khan, Md. Shad Akhtar, Tanmoy Chakraborty
In Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal
ICML 2023 - The Fortieth International Conference on Machine Learning
The Art of Embedding Fusion: Optimizing Hate Speech Detection
Mohammad Aflah Khan*, Neemesh Yadav*, Mohit Jain, Sanyam Goyal
The First Tiny Papers Track at ICLR 2023
Beyond Negativity: Re-Analysis and Follow-Up Experiments on Hope Speech Detection
Neemesh Yadav*, Mohammad Aflah Khan*, Diksha Sethi, Raghav Sahni
The First Tiny Papers Track at ICLR 2023
2022
Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization
Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty
KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Talks
[Paper Reading & Discussion] Metadata Conditioning Accelerates Language Model Pre-training (MeCo)
Max Planck Institute for Software Systems: MPI-SWS (Internal Paper Reading Group) • September, 2025
[Talk] LLMs at Scale
Max Planck Computing and Data Facility: MPCDF (AI Kick-off Workshop) • April, 2025
Max Planck Institute for the Science of Light (Hosted by Florian Marquardt) • April, 2025
[Talk] An Overview of DeepSeek-{V3/R1}
Max Planck Institute for Software Systems: MPI-SWS (Part of AI, Computing & Society Initiative) • February, 2025
[Talk] Democratizing and Accelerating Research with LLMs: Making Science More Accessible Whilst Finding Interesting Research Problems
Max Planck Institute for Security and Privacy: MPI-SP (Hosted by Meeyoung Cha) • December, 2024
[Demo + Lightning Talk] Empowering Research with Open-Access LLMs: From Tools to Copilots
AI, Computing & Society Initiative Launch Event (At Max Planck Institute for Software Systems: MPI-SWS) • December, 2024
[Paper Reading & Discussion] Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Max Planck Institute for Software Systems: MPI-SWS (Internal Paper Reading Group) • July, 2024
[Paper Reading & Discussion] Deduplicating Training Data Makes Language Models Better
Max Planck Institute for Software Systems: MPI-SWS (Internal Paper Reading Group) • May, 2024
[Talk] Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Max Planck Institute for Software Systems: MPI-SWS (Hosted by Krishna Gummadi) • July, 2023
Goldman Sachs (Internal NLP/IR Reading Group) • June, 2023
Organizing, Reviewing & Volunteering
The First Workshop on Large Language Model Memorization (L2M2) @ ACL'25
Program Committee Member • 2025
ACL Rolling Review (ARR), International Conference on Computational Linguistics (COLING), Workshop on Online Abuse and Harms (WOAH), The Technical Symposium on Computer Science Education (SIGCSE TS)
Served as a reviewer for the above mentioned conferences • 2023 Onwards
FIRE HASOC Task 3: Identification of Tokens Contributing to Explicit Hate in Text by Span Detection
Organizer • 2023
19th International Conference on Natural Language Processing (ICON)
Volunteer • 2022
Other Involvements
- Teaching Assistant - Machine Learning under Dr. Anubha Gupta
- Teaching Assistant - Data Structures and Algorithms under Dr. Piyus Kedia
Education
Indraprastha Institute of Information Technology (IIIT-D)
B.Tech. in Computer Science and Engineering • 2020 — 2024
- Dean's List for Academic Excellence (2022-23)
- Dean's List for Innovation in Research and Development (2022-23)
- Dean's List for Academic Excellence (2021-22)
GPA - 9.63/10 [Dept. Rank 2 & Batch Rank 3]
Awards, Achievements, and Recognitions
Featured in the EleutherAI Community Spotlight
EleutherAI • September, 2023
Selected for Amazon ML Summer School
Amazon • 2022
All India Rank 491
JEE Mains Paper 2 • 2020
Top 0.66 Percentile
JEE Mains Paper 1 • 2020
All India Rank 130
Undegraduate Entrance Examination (UGEE) • 2020
Outside Interests
- Chess
- Webtoons, Mangas, Manhwas, Manhuas and Anime
Social Links
- Github: https://github.com/aflah02
- Twitter: https://twitter.com/aflah02101
- LinkedIn: https://www.linkedin.com/in/mohammad-aflah-khan
- Website: https://aflah02.github.io/