| CARVIEW |
Sean Welleck
Assistant ProfessorCMU
wellecks {@ | at} cmu.edu
Office: GHC 6513
About Me
Assistant Professor
Carnegie Mellon University
School of Computer Science
Language Technologies Institute
PI: L3 Lab
I received a PhD at New York University, advised by Kyunghyun Cho, and did a postdoc at the University of Washington, advised by Yejin Choi.
I host the Thesis Review Podcast.
I am recruiting PhD students in the 2025-2026 application cycle. Please see joining the L3 Lab for more info.
Research
An overarching theme of my research is bridging informal and formal reasoning with AI. My work spans multiple areas within deep learning and generative models, including:
- Machine learning for mathematics and code
- Learning, inference, and evaluation algorithms
- AI reasoning and agents
- Science of neural language models
Some recent interests are combining AI and formal methods for mathematics [1][2][3] and code generation [1][2][3], inference algorithms and test-time scaling [1][2][3], and algorithms that enable AI to improve over time [1][2][3].
Group (L3 Lab)
PhD
- Pranjal Aggarwal
- Weihua Du (co-advised with Yiming Yang)
- Andre He (co-advised with Daniel Fried)
- Seungone Kim (co-advised with Graham Neubig)
MS
Undergraduate
Please see the CMU L3 Lab to learn more about our research.
Recent & Upcoming
- 08.2025: New NSF Institute for Computer-Aided Reasoning in Mathematics announced
- 08.2025: Teaching Advanced NLP in Fall 2025
- 06.2025: LeanHammer, a sledgehammer for Lean [project page][tool][paper]
- 05.2025: Invited talk at SciFM 2025 [slides]
- 05.2025: Invited tutorial on Test-Time Scaling at SciFM 2025 [slides]
- 04.2025: Guest lecture at Berkeley Advanced LLM Agents on Bridging Informal and Formal Mathematical Reasoning with AI [slides]
- 04.2025: Tutorial on Transformers for Mathematics at Simons Institute Workshop on AI for Mathematics [slides][code]
- 04.2025: Co-organizing Autoformalization for the Working Mathematician at ICERM
- 04.2025: Co-organizing VerifAI: AI Verification in the Wild at ICLR
- 02.2025: Invited talk at NVIDIA
- 01.2025: Invited talk at UCLA NLP Seminar
- 01.2025: Teaching Advanced NLP in Spring 2025
- 12.2024: NeurIPS tutorial on Inference-time Algorithms for Large Language Models
- 11.2024: Invited talk at Simons Institute [talk page]
- 09.2024: Invited talk on Reasoning with inference-time compute at CMU [slides]
- 09.2024: Invited talk on A few open problems in neural theorem proving at AITP [slides]
- 06.2024: Invited talk at Neurosymbolic Programming Summer School
- 05.2024: Invited talk at PNNL Mathematics for Artificial Reasoning in Science
- 04.2024: Version II of Neural Theorem Proving Tutorial [github][slides]
- 04.2024: LLMLean: LLMs+Lean on your laptop or in the cloud [github]
- 04.2024: Invited talk on Scaling laws of formal reasoning at SciFM [slides]
- 04.2024: Invited tutorial at Scientific Foundation Models Conference [slides][github]
- 01.2024: New Neural code generation course with Daniel Fried at CMU
Preprints
- RefineBench: Evaluating Refinement Capability of Language Models via Checklists
Y. Lee, S. Kim, B. Lee, M. Moon, Y. Hwang, J. M. Kim, G. Neubig, S. Welleck, H. Choi
arXiv 2025 - Training Proactive and Personalized LLM Agents
W. Sun, X. Zhou, W. Du, X. Wang, S. Welleck, G. Neubig, M. Sap, Y. Yang
arXiv 2025 - OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
P. Aggarwal, S. Kim, J. Lanchantin, S. Welleck, J. Weston, I. Kulikov, S. Saha
arXiv 2025
[code] - Premise Selection for a Lean Hammer
T. Zhu*, J. Clune*, J. Avigad+, A. Q. Jiang+, S. Welleck+
arXiv 2025
[project page][code] - The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
S. Lee*, S. Kim* et al., …, S. Welleck, G. Neubig, M. Lee, M. Seo
arXiv 2025 - Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
S. Kim, I. Wu, J. Lee, X. Yue, S. Lee, M. Moon, K. Gashteovski, C. Lawrence, J. Hockenmaier, G. Neubig, S. Welleck
arXiv 2025
[code] - Programming with Pixels: Can Computer-Use Agents do Software Engineering?
P. Aggarwal, S. Welleck
arXiv 2025
[project page][code/agent environment]
Publications
- Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
A. He, D. Fried, S. Welleck
EMNLP 2025 (Oral)
[code] - Agentic-R1: Distilled Dual-Strategy Reasoning
W. Du, P. Aggarwal, S. Welleck, Y. Yang
EMNLP 2025
[code][dataset] - L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
P. Aggarwal, S. Welleck
COLM 2025
[project page][code][models][dataset] - AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement
P. Aggarwal, B. Parno, S. Welleck
ICML 2025
[project page][code] - Optimizing Temperature for Language Models with Multi-Sample Inference
W. Du, Y. Yang, S. Welleck
ICML 2025
[code] - Evaluating Language Models as Synthetic Data Generators
S. Kim, J. Suk, X. Yue, V. Viswanathan, S. Lee, Y. Wang, K. Gashteovski, C. Lawrence, S. Welleck, G. Neubig
ACL 2025
[code] - Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Y. Wu, Z. Sun, S. Li, S. Welleck, Y. Yang
ICLR 2025
[project page][code] - miniCTX: Neural Theorem Proving with (Long-)Contexts
J. Hu, T. Zhu, S. Welleck
ICLR 2025 (Oral, (Top 2%))
[project page][eval code][ntp-toolkit] - ImProver: Agent-Based Automated Proof Optimization
R. Ahuja, J. Avigad, P. Tetali, S. Welleck
ICLR 2025
[project page][code] - Lean-STaR: Learning to Interleave Thinking and Proving
H. Lin, Z. Sun, Y. Yang, S. Welleck
ICLR 2025 (Spotlight, (Top 5%))
[project page][code] - The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
S. Kim, J. Suk, J. Cho, S. Longpre et al., …, S. Welleck, G. Neubig, M. Lee, K. Lee, M. Seo
NAACL 2025 (Best Paper)
[data][leaderboard] - From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
S. Welleck, A. Bertsch*, M. Finlayson*, H. Schoelkopf*, A. Xie, G. Neubig, I. Kulikov, Z. Harchaoui
TMLR 2024
[tutorial] - miniCodeProps: a Minimal Benchmark for Proving Code Properties
E. Lohn, S. Welleck
NeurIPS Safe Generative AI 2024
[code] - Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Z. Sun*, L. Yu*, Y. Shen, W. Liu, Y. Yang+, S. Welleck+, C. Gan+
NeurIPS 2024
[code] - Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
S. Kim, J. Suk, S. Longpre, B. Lin, J. Shin, S. Welleck, G. Neubig, M. Lee, K. Lee, M. Seo
EMNLP 2024
[code] - Llemma: An Open Language Model for Mathematics
Z. Azerbayev, H. Schoelkopf, K. Paster, M. Dos Santos, S. McAleer, A. Jiang, J. Deng, S. Biderman, S. Welleck
ICLR 2024
[code][data][models][blog][sample explorer][poster] - LLMstep: LLM proofstep suggestions in Lean
S. Welleck, R. Saha
NeurIPS Math-AI Workshop 2023
[code][poster] - Inference-Time Policy Adaptors
X. Lu, F. Brahman, P. West, J. Jung, K. Chandu, A. Ravichander, L. Qin. P. Ammanabrolu, L. Jiang, S. Ramnath, N. Dziri, J. Fisher, B. Lin, S. Hallinan, X. Ren, S. Welleck, Y. Choi
EMNLP 2023 - Self-refine: Iterative refinement with self-feedback
A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, No. Dziri, S. Prabhumoye, Y. Yang, B. Prasad Majumder, S. Gupta, S. Welleck, A. Yazdanbakhsh, P. Clark
NeurIPS 2023 - Limits of Transformers on Compositionality
N. Dziri*, X. Lu*, M. Sclar*, X. Lorraine Li, L. Jiang, B. Lin, P. West, C. Bhagavatula, R. Le Bras, J. D Hwang, So. Sanyal, S. Welleck, X. Ren, A. Ettinger, Z. Harchaoui, Y. Choi
NeurIPS 2023 (Spotlight) - STEER: Unified Style Transfer with Expert Reinforcement
S. Hallinan, F. Brahman, X. Lu, J. Jung, S. Welleck, Y. Choi
EMNLP 2023 - A Survey of Deep Learning for Mathematical Reasoning
P. Lu, L. Qiu, W. Yu, S. Welleck*, K. Chang*
ACL 2023 - Generating Sequences by Learning to [Self-]Correct
S. Welleck*, X. Lu*, P. West+, F. Brahman+, T. Shen, D. Khashabi, Y. Choi
ICLR 2023.
[poster] - Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
A. Jiang*, S. Welleck*, J. Zhou*, T. Lacroix, J. Liu, W. Li, M. Jamnik, G. Lample+, Y. Wu+
ICLR 2023(Oral).
[data] - NaturalProver: Grounded Mathematical Proof Generation with Language Models
S. Welleck, J. Liu, X. Lu, H. Hajishirzi, Y. Choi.
NeurIPS 2022.
[data][code][slides][poster] - Quark: Controllable Text Generation with Reinforced [Un]learning
X. Lu, S. Welleck, L. Jiang, J. Hessel, L. Qin, P. West, P. Ammanabrolu, Y. Choi.
NeurIPS 2022 (Oral).
[poster] - COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics
L. Qin, S. Welleck, D. Khasabi, Y. Choi.
NeurIPS 2022 (Oral).
[code][poster] - Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
J. Jung, L. Qin, S. Welleck, F. Brahman, C. Bhagavatula, R. Le Bras, Y. Choi.
EMNLP 2022.
[code][slides] - Lila: A Unified Benchmark for Mathematical Reasoning
S. Mishra, M. Finlayson, P. Lu, L. Tang, S. Welleck, C. Baral, T. Rajpurohit, O. Tafjord, A. Sabharwal, P. Clark, A. Kalyan
EMNLP 2022.
[data][project page] - Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering
J. Liu, S. Hallinan, X. Lu, P. He, S. Welleck, H. Hajishirzi, Y. Choi.
EMNLP 2022. - NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics
X. Lu, S. Welleck, P. West, L. Jiang, J. Kasai, D. Khasabi, R. Le Bras, L. Qin, Y. Yu, R. Zellers, N. Smith, Y. Choi.
NAACL 2022.
[code][slides] - Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
P. West, C. Bhagavatula, J. Hessel, J. Hwang, L. Jiang, R. Le Bras, X. Lu, S. Welleck, Y. Choi.
NAACL 2022.
[code] - Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts
D. Khasabi, S. Lyu, S. Min, L. Qin, K. Richardson, S. Singh, S. Welleck, H. Hajishirzi, T. Khot, A. Sabharway, Y. Choi.
NAACL 2022. - Generated Knowledge Prompting for Commonsense Reasoning
J. Liu, A. Liu, X. Lu, S. Welleck, P. West, R. Le Bras, Y. Choi, H. Hajishirzi.
ACL 2022.
[code] - Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics
S. Welleck, P. West, J. Cao, Y. Choi.
AAAI 2022.
[code][slides][talk] - Towards Grounded Natural Language Proof Generation
S. Welleck, J. Liu, J. Han, Y. Choi.
MathAI4Ed Workshop at NeurIPS 2021 (Contributed Talk).
[poster][slides] - NaturalProofs: Mathematical Theorem Proving in Natural Language
S. Welleck, J. Liu, R. Le Bras, H. Hajishirzi, Y. Choi, K. Cho.
NeurIPS 2021 Datasets and Benchmarks (Oral (Top 1%)).
[data/code][talk][related data] - MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
K. Pillutla, S. Swayamdipta, R. Zellers, J. Thickstun, S. Welleck, Y. Choi, Z. Harchaoui.
NeurIPS 2021 (Oral, Outstanding Paper Award (Top 0.1%)).
[code][press] - Divergence Frontiers for Generative Models: Sample Complexity, Quantization Level, and Frontier Integral
L. Liu, K. Pillutla, S. Welleck, S. Oh, Y. Choi, Z. Harchaoui.
NeurIPS 2021. - Mode recovery in neural autoregressive sequence modeling
I. Kulikov, S. Welleck, K. Cho.
SPNLP 2021.
[code] - Order and Learning in Sequential Neural Structured Prediction
S. Welleck.
PhD Thesis, New York University.
[slides] - MLE-guided parameter search for task loss minimization in neural sequence modeling
S. Welleck, K. Cho.
AAAI 2021.
[code][poster][talk] - Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
S. Welleck, I. Kulikov, J. Kim, R. Pang, K. Cho.
EMNLP 2020.
[code][talk] - A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models
E. Mansimov, A. Wang, S. Welleck, K. Cho.
arXiv pre-print 2020.
[code] - Making Inconsistent Dialogue Unlikely with Unlikelihood Training
M. Li, S. Roller, I. Kulikov, S. Welleck, Y.L. Boureau, K. Cho, J. Weston.
ACL 2020. - Neural Text Generation with Unlikelihood Training
S. Welleck, I. Kulikov, S. Roller, E. Dinan, K. Cho, J. Weston.
ICLR 2020.
[code] - Non-Monotonic Sequential Text Generation
S. Welleck, K. Brantley, H. Daume III, K. Cho.
ICML 2019.
[code] [slides] [poster] - Sequential Graph Dependency Parser
S. Welleck, K. Cho.
RANLP 2019.
[slides] - Dialogue Natural Language Inference
S. Welleck, J. Weston, A. Szlam, K. Cho.
ACL 2019.
[dataset][poster][press] - Loss Functions for Multiset Prediction
S. Welleck, Z. Yao, Y. Gai, J. Mao, Z. Zhang, K. Cho.
NeurIPS 2018.
NVIDIA AI Labs Pioneering Research Award 2018.
[poster] - Saliency-based Sequential Image Attention with Multiset Prediction
S. Welleck, J. Mao, K. Cho, Z. Zhang.
NeurIPS 2017.
NVIDIA AI Labs Pioneering Research Award 2017.
[poster][press] - Efficient AUC Optimization for Information Ranking Applications
S. Welleck.
ECIR 2016.
Selected Talks
- Bridging Informal and Formal Mathematical Reasoning [slides]
Scientific Foundation Models (SciFM) Conference
05.2025 - Test-Time Scaling for Mathematical Reasoning [slides]
Scientific Foundation Models (SciFM) Conference
05.2025 - Bridging Informal and Formal Mathematical Reasoning with AI [slides]
Berkeley Advanced LLM Agents
04.2025 - Transformers for Mathematics [slides][code]
Simons Institute Workshop on AI for Mathematics
04.2025 - Reasoning with inference-time compute [slides]
UCLA NLP Seminar
01.2025
NVIDIA
02.2025 - NeurIPS 2024 Tutorial: Inference Algorithms for Large Language Models [slides]
NeurIPS 2024
12.2024 - Beyond Decoding: Meta-Generation Algorithms for Large Language Models [talk page]
Simons Institute for the Theory of Computing
11.2024 - Reasoning with inference-time compute [slides]
CMU
09.2024 - A few open problems in neural theorem proving [slides]
AITP 2024 Invited Talk
09.2024 - Scaling laws of formal reasoning [slides]
MICDE Scientific Foundation Models Conference Invited Talk
04.2024 - A tutorial on neural theorem proving VII [slides][github]
MICDE Scientific Foundation Models Conference Invited Tutorial
04.2024 - Language models and formal mathematics [slides]
UW Mathematics
11.2023 - Llemma [slides]
Harvard New Technologies in Mathematics
10.2023 - A tutorial on neural theorem proving [slides]
OIST
09.2023
IJCAI
08.2023 - Data for AI + mathematics [slides]
AI to Assist Mathematical Reasoning Workshop
06.2023 - Draft, Sketch, Prove [slides]
Hoskinson Center for Formal Mathematics
05.2023 - Bridging Informal and Formal Mathematical Reasoning [slides]
Microsoft Research
11.2022
NSF Expeditions: Understanding the World Through Code seminar
10.2022 - Maeutic Inference [slides]
Google
09.2022 - Constrained text generation [slides]
IFDS
05.2022
Cohere.ai
05.2022
Teaching
- Advanced Natural Language Processing [code][youtube]
Carnegie Mellon University
Fall 2025. - Advanced Natural Language Processing [code][youtube]
Carnegie Mellon University
Spring 2025. - Neural Code Generation
Carnegie Mellon University
Spring 2024. - Guest Lecture: Neural sequence generation (DATA 598) [slides]
University of Washington
March 2023. - Guest Lecture: Reliable text generation through graph search (CSE 373) [slides]
University of Washington
November 2022. - Guest Lecture: Neural sequence generation (DATA 598) [slides]
University of Washington
March 2022. - Deep Learning (DS-GA 1008)
New York University
Fall 2020 - Deep Learning for NLP
African Master’s Program in Machine Intelligence
March 2020 - Introduction to Machine Learning (CSCI-UA 0473)
New York University
Spring 2020 - NLP with Representation Learning (DS-GA 1011)
New York University
Fall 2019
Tutorials
- Test-Time Scaling for Mathematical Reasoning [slides]
SciFM 2025. - Beyond Decoding: Meta-Generation Algorithms for Large Language Models [tutorial site][slides][code]
NeurIPS 2024. - Neural theorem proving II [slides][github]
SciFM 2024. - Neural theorem proving [slides][github]
In Deep Learning in Mathematical Reasoning [tutorial site]
IJCAI 2023. - Neurosymbolic NLP: Modularity & Constraints for Neural Language Models [slides][tutorial site]
COLING 2022. - Denoising Diffusion Models [slides]
July 2022. - Generative Modeling with (W)GAN [slides]
NYU Shanghai
April 2018.
Workshops
- 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR)
COLM 2025 - Autoformalization for the Working Mathematician
ICERM 2025 - VerifAI: AI Verification in the Wild
ICLR 2025 - AI for Math Workshop
ICML 2024 - Math-AI: 3rd Workshop on Mathematical Reasoning and AI
NeurIPS 2023 - Math-AI: Toward Human-Level Mathematical Reasoning
NeurIPS 2022 - MathNLP: 1st Workshop on Mathematical Natural Language Processing
EMNLP 2022 - Math AI for Education: Bridging the Gap Between Research and Smart Education
NeurIPS 2021
Past
- NYU (PhD), Sep. 2016 - Jan. 2021
- Facebook, AI Research Team (FAIR), May. 2019 - Sep. 2019
- Facebook, AI Research Team (FAIR), May. 2018 - Jan. 2019
- Primer AI, Feb. 2016 - Aug. 2016
- IBM, Sep. 2014 - Feb. 2016
- University of Pennsylvania (Computer Science, MSE), May. 2013 - May. 2014
- University of Pennsylvania (Computer Science, BSE), Sep. 2009 - Feb. 2013