| CARVIEW |
We're a group of MIT students conducting research to reduce risk from advanced AI.
We think that reducing risks from advanced artificial intelligence may be one of the most important problems of our time. We also think it's a highly interesting and exciting problem, with open opportunities for many more researchers to make progress on it.
MAIA supports undergraduate and graduate students in conducting research relevant to reducing risks from advanced AI.
We also run a semester-long introductory reading group on AI safety, including both a technical machine learning track and a policy track.
Selected Research

Robust Feature-Level Adversaries are Interpretability Tools
Stephen Casper
NeurIPS 2022Adversarial Attacks
Open Problems and Fundamental Limitations of RLHF
Stephen Casper
Tony Wang
Eric J. Michaud
RLHF
Adversarial Policies Beat Superhuman Go AIs
Tony Wang
Adversarial Attacks
The Quantization Model of Neural Scaling
Eric J. Michaud
Uzay Girit
NeurIPS 2023Scaling Laws
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
Tony Wang
Kaivu Hariharan
NeurIPS 2023Mech Interp
Red Teaming Deep Neural Networks with Feature Synthesis Tools
Stephen Casper
Kaivu Hariharan
NeurIPS 2023Adversarial Attacks
Building an early warning system for LLM-aided biological threat creation
Neil Choudhury
OpenAIGovernance
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Stephen Casper
Gatlen Culp
Adversarial Attacks
Model Manipulation Attacks Enable More Rigorous Evaluations of LLM Capabilities
Stephen Casper
Stewy Slocum
Adversarial AttacksICLR 2025
Diverse Preference Learning for Capabilities and Alignment
Stewy Slocum
Asher Parker-Sartori
RLHFICLR 2025
An Assessment of Model-on-Model Deception
Laker Newhouse
Mech InterpICLR 2024
Harmonic Loss Trains Interpretable AI Models
Riya Yagi
David Baek
Mech Interp
Alignment faking in large language models
Benjamin Wright (MAIA Alum)
Adversarial Attacks
Organizations we work with
This is a list of some of the organizations our members have worked with. Not all organisations listed endorse or are affiliated with MAIA.







