| CARVIEW |
MAXIMILIAN MOZES
X  /  BLUESKY  /  EMAIL  /  GOOGLE SCHOLAR  /  LINKEDIN
ABOUT
I'm a Senior Research Scientist at Cohere. I completed my PhD in Computer Science at University College London in 2024, under the supervision of Lewis Griffin and Bennett Kleinberg. My research focuses on the intersection of adversarial machine learning and natural language processing.
I have previously interned at Google Research, working with the PAIR Team on measuring dialog safety using large language models. Prior to that, I was a Research Scientist Intern at Spotify Research, where I focused on NLP-based content moderation in podcasts.
I obtained a Bachelor's degree in Computer Science (minor in Mathematics) from the Technical University of Munich (TUM) in March 2019. During my undergraduate studies, I have worked as a visiting research scholar at the Language and Information Technologies Group of the University of Michigan's Artificial Intelligence Lab and as a research intern in the Department of Psychology at the University of Amsterdam.
PUBLICATIONS
Reverse Engineering Human Preferences with Reinforcement Learning
Alazraki, L., Tan, Y. C., Campos, J. A., Mozes, M., Rei., M. and Bartolo, M.
NeurIPS, 2025 (spotlight)No Need for Explanations: LLMs can Implicitly Learn from Mistakes In-Context
Alazraki, L., Mozes, M., Campos, J. A., Tan, Y. C., Rei., M. and Bartolo, M.
EMNLP, 2025 (oral)Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Ruis, L., Mozes, M., Bae, J., Kamalakara, S. R., Talupuru, D., Locatelli, A., Kirk, R., Rocktäschel, T., Grefenstette, E. and Bartolo, M.
ICLR, 2025Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge
Arora, A., He, X., Mozes, M., Swain, S., Dras, M. and Xu, Q.
Findings of ACL, 2024Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Mozes, M., He, X., Kleinberg, B., Griffin, L. D.
arXiv, 2023Challenges and Applications of Large Language Models
Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R. and McHardy, R.
arXiv, 2023Towards Agile Text Classifiers for Everyone
Mozes, M., Hoffmann, J., Tomanek, K., Kouate, M., Thain, N., Yuan, A., Bolukbasi, T. and Dixon, L.
Findings of EMNLP, 2023Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning
Mozes, M., Bolukbasi, T., Yuan, A., Liu, F., Thain, N. and Dixon, L.
arXiv, 2023Large Language Models Respond to Influence like Humans
Griffin, L.D., Kleinberg, B., Mozes, M., Mai, K., Vau, M., Caldwell, M. and Mavor-Parker, A.
First Workshop on Social Influence in Conversations (SICon), ACL, 2023Textwash -- Automated Open-Source Text anonymization
Kleinberg, B., Davies, T. and Mozes, M.
arXiv, 2022Identifying Human Strategies for Generating Word-Level Adversarial Examples
Mozes, M., Kleinberg, B. and Griffin, L. D.
Findings of EMNLP, 2022Scene Graph Generation for Better Image Captioning?
Mozes, M., Schmitt, M., Golkov, V., Schütze, H. and Cremers, D.
arXiv, 2021A Repeated-Measures Study on Emotional Responses After a Year in the Pandemic
Mozes, M., van der Vegt, I. and Kleinberg, B.
Scientific Reports, 11(1), 1-11, 2021Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification
Mozes, M., Bartolo, M., Stenetorp, P., Kleinberg, B. and Griffin, L. D.
EMNLP, 2021Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples
Mozes, M., Stenetorp, P., Kleinberg, B. and Griffin, L. D.
EACL, 2021No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving Text Anonymization
Mozes, M. and Kleinberg, B.
arXiv, 2021The Grievance Dictionary: Understanding Threatening Language Use
van der Vegt, I., Mozes, M., Kleinberg, P. and Gill, P.
Behavior Research Methods, 2021Online Influence, Offline Violence: Linguistic Responses to the "Unite the Right" Rally
van der Vegt, I., Mozes, M., Gill, P. and Kleinberg, B.
Journal of Computational Social Science, 2020Measuring Emotions in the COVID-19 Real World Worry Dataset
Kleinberg, B., van der Vegt, I. and Mozes, M.
NLP COVID-19 Workshop, ACL 2020Uphill From Here: Sentiment Patterns in Videos from Left- and Right-Wing YouTube News Channels
Soldner, F., Ho, J., Makhortykh, M., van der Vegt, I., Mozes, M. and Kleinberg, B.
Third Workshop on NLP and CSS, NAACL-HLT, 2019Identifying the Sentiment Styles of YouTube's Vloggers
Kleinberg, B., Mozes, M. and van der Vegt, I.
EMNLP, 2018Using Named Entities for Computer-Automated Verbal Deception Detection
Kleinberg, B., Mozes, M., Arntz, A. and Verschuere, B.
Journal of Forensic Sciences, 63(3), 714-723, 2018Web-based Text Anonymization with Node.js: Introducing NETANOS
Kleinberg, B. and Mozes, M.
Journal of Open Source Software, 2(14), 293, 2017NETANOS - Named Entity-based Text Anonymization for Open Science
Kleinberg, B., Mozes, M., van der Toolen, Y. and Verschuere, B.
OSF preprint, 2017INVITED TALKS
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Machine Behaviour, University of Tilburg, November 2024
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Responsible AI Seminar Series, Nokia Bell Labs, January 2024
Adversarial Examples in Machine Learning
Crime Science, University of Amsterdam, November 2021
Examining Word-Level Adversarial Examples for Text Classification
AI Seminar Series, UCL Centre for Artificial Intelligence, September 2021
Recording available here.
Adversarial Examples in Machine Learning
Data Science for Crime Scientists and Applied Data Science, University College London, March 2021
Detecting Deception with AI: Promises and Pitfalls?
Current Topics: Psychology of AI, University of Amsterdam, November 2020
On the Robustness of Intelligent Systems
Data Science for Crime Scientists and Applied Data Science, University College London, March 2020
On the Robustness of Intelligent Systems
Foundations of Crime Science, University College London, December 2019
Assessing Potential Vulnerabilities of Emerging Artificial Intelligence Technologies
Crime Science, University of Amsterdam, May 2019
MEDIA COVERAGE
Podcast interview with Data Skeptic
Podcast discussion focussing on the paper "Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities", September 2023.
Available here.
News article Google's Jigsaw was trying to fight toxic speech with AI. Then the AI started talking
Fast Company, July 2023.
Available here.
WORKSHOPS
9th Workshop on Representation Learning for NLP (RepL4NLP-2024)
Chen Zhao, Marius Mosbach, Pepa Atanasova, Seraphina Goldfarb-Tarrant, Peter Hase, Arian Hosseini, Maha Elbayad, Sandro Pezzelle, Maximilian Mozes
8th Workshop on Representation Learning for NLP (RepL4NLP-2023)
Burcu Can, Maximilian Mozes, Samuel Cahyawijaya, Naomi Saphra, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Chen Zhao
7th Workshop on Representation Learning for NLP (RepL4NLP-2022)
Spandana Gella, He He, Bodhisattwa Prasad Majumder, Burcu Can, Eleonora Giunchiglia, Samuel Cahyawijaya, Siwon Min, Maximilian Mozes, Xiang Lorraine Li
A Gentle Introduction to Word Embeddings for the Computational Social Sciences
Maximilian Mozes and Bennett Kleinberg
Linguistic Temporal Trajectory Analysis - a Dynamic Approach to Text Data
Bennett Kleinberg, Maximilian Mozes, and Isabelle van der Vegt
TEACHING
- Teaching assistant: Statistical Natural Language Processing, University College London, Academic year 2022/23
- Teaching assistant: Theory of Computation, University College London, Academic year 2021/22
- Teaching assistant: Introduction to Machine Learning, University College London, Academic year 2021/22
- Teaching assistant: Theory of Computation, University College London, Academic year 2020/21
- Teaching assistant: Introduction to Deep Learning, University College London, Academic year 2020/21
- Tutor: Analysis for Computer Science, Technical University of Munich, Winter term 2018/19