Carview!

HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 02 Dec 2025 05:21:28 GMT access-control-allow-origin: * etag: W/"692e7758-6fe3" expires: Sun, 28 Dec 2025 13:44:42 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 99F6:2680BD:7A0712:88DAC9:695131F1 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 13:34:42 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210037-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766928882.421542,VS0,VE200 vary: Accept-Encoding x-fastly-request-id: 3aa37e0fbe6b126167206bbf751a69552b1d831a content-length: 8815 David Heineman

David Heineman

Hey! I'm David 👋

I'm a pre-doctoral young investigator at the Allen Institute for AI, working to improve language model pre-training and evaluation.

🍁 This fall I am applying to Ph.D. programs. I'm currently interested in the science of language modeling. I will be supported by the NSF CS Graduate Fellowship!

email resume scholar github huggingface twitter

About Me

Building language models can, and should, be a rigorous science: I believe our field’s biggest bottleneck in doing so is the quality of our experimentation methodology [1] and the power of our evaluation signal [2]. This requires better measures of capability [3], new tools for observing how language models express behavior [4], and connecting tasks that are meaningful to our ability to learn, and generate, language [5, 6]. More in my statement ⇒

I work on these problems at Ai2 as part of the Open Language Model (OLMo) project, advised by Kyle Lo and Jesse Dodge. Previously, I completed my undergrad at Georgia Tech 🐝, where I was fortunate to be advised by Prof. Wei Xu and work with Yao Dou and Mounica Maddela. I've also spent a few summers as an intern at AWS and at a healthcare startup Patientco. I enjoy reading, hiking, and making homebrew nitrogen cold brew. ☕️ ⛰️

Publications & Preprints

Olmo 3 [blog, code, models, data]

Olmo Team (incl. David Heineman)
preprint, 2025

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation [code, data]

David Heineman, Valentin Hofmann, Ian Magnusson, Yuling Gu, Noah A. Smith, Hannaneh Hajishirzi, Kyle Lo, Jesse Dodge
NeurIPS, 2025 (Spotlight, Top 5%)

Fluid Language Model Benchmarking [code, models]

Valentin Hofmann, David Heineman, Ian Magnusson, Kyle Lo, Jesse Dodge, Maarten Sap, Pang Wei Koh, Chun Wang, Hannaneh Hajishirzi, Noah A. Smith.
COLM, 2025 (Oral, Top 5%)

Establishing Task Scaling Laws via Compute-Efficient Model Ladders [code]

Akshita Bhagia*, Jiacheng Liu*, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi
COLM, 2025

2 OLMo 2 Furious [blog, code, models, data]

Pete Walsh*, Luca Soldaini*, Dirk Groeneveld*, Kyle Lo*, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, ..., David Heineman, ..., Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi
COLM, 2025

Evaluating LLMs on Chinese Idiom Translation

Cai Yang, Yao Dou, David Heineman, Xiaofeng Wu, Wei Xu
COLM, 2025

DataDecide: How to Predict Best Pretraining Data with Small Experiments [code, models]

Ian Magnusson*, Nguyen Tai*, Ben Bogin*, David Heineman, Jena D. Hwang, Luca Soldaini, Akshita Bhagia, Jiacheng Liu, Dirk Groeneveld, Oyvind Tafjord, Noah A. Smith, Pang Wei Koh, Jesse Dodge
ICML, 2025

Improving Minimum Bayes Risk Decoding with Multi-Prompt [code]

David Heineman, Yao Dou, Wei Xu
EMNLP, 2024

Towards a Path Dependent Account of Category Fluency [code]

David Heineman, Reba Koenen, Sashank Varma
CogSci, 2024

Thresh: Unified, Customizable and Deployable Fine-Grained Text Evaluation [live tool]

David Heineman, Yao Dou, Wei Xu
EMNLP Demo, 2023

Edit-level Simplification Evaluation using SALSA 💃 [code/data, metric]

David Heineman, Yao Dou, Mounica Maddela, Wei Xu
EMNLP, 2023

LENS: A Learnable Evaluation Metric for Text Simplification [code/data, metric]

Mounica Maddela*, Yao Dou*, David Heineman, Wei Xu
ACL, 2023

* = equal contribution

Some past work

Winning submission to the Berghain challenge! [code] (🏆 1st of 1300 submissions)
Participated in Thinking Machines' Tinker Beta. I experimented with RL training with terminal environments for reproducing empirical findings in ACL papers [code].
I'm trying a new system for keeping up with fresh papers in our field [code] that updates every morning. It might be helpful for others, let me know if it is for you!
Contributed to Terminal-Bench [leaderboard, docs], a challenging benchmark for language model agents using the CLI. I believe tbench's tmux envs are unique, new construct for our field!
A few mini-projects: a 500 line GRPO implmenetation; showing LLM benchmark scores can improve +2 pts on MATH simply by changing vLLM version; a reproduction of branching factor; custom PyTorch kernels for Fast FFNs; and eval'ing LLMs on quant puzzles.
Spent Summer '24 in the first US cohort of Entrepreneurs First in South Park, SF 🌉 as part of a residency program. I briefly worked on a few ideas with RL for tool use, before moving to Ai2.
Maintaining the Thresh 🌾 platform, an all-purpose tool for fine-grained text generation evaluation, including an annotation tool builder and Python library.
Built a search engine [code] for ML / NLP conferences, indexed with ColBERT.
Wrote a LLM-based Rubiks cube solver as a demonstration of explore/exploit behavior for reasoning (🏆 2nd place at AGI House open source hackathon).
Awarded the GT College of Computing Outstanding Undergraduate Research Award (1 of 3000+ CS students) for my undergradaute thesis work on fine-grained evaluation of LLMs.
Designed new programming assignments for CS 4650, Natural Language Processing as a teaching assistant (sampling algorithms & LLaMA fine-tuning with LoRA).
Built an air pollution complaint tracker and classifier [code] for the Georgia Environmental Protection Divison (part of a larger collaboration at GT).
Awarded the PURA research grant to work on open problems in generation & evaluation (check out my Huggingface decoding vizualizer extension).
Thoughts on approaching reasoning evaluation in LLMs using theories of human cognition.
pip install lens-metric - A simple library to evalute text simplification using our LENS and LENS-SALSA LLMs on HuggingFace using only 5 lines of Python [demo].
Interned in AWS EC2 Enterprise Services, developing a prototype language model service, addressing problems in inference cost and deployment of open-source LLMs.
Earned 4th place in Georgia Tech's Wrek CTF (one of the largest greyhat hackathons in the southern US) [answers].
Helped lead Georgia Tech's CS 3510, Design and Analysis of Algorithms as a teaching asssistant in Fall '21 and '22.
Interned at AWS CloudWatch Application Insights, built infrastructure to monitor and group telemetry data from processes running on EC2 instances to identify the root causes of problems on customers' AWS infrastructure.
Interned at Patientco (now part of Waystar), invented and deployed new sequence-based prediction models to predict when a patient pays their healthcare bill using their payment history (used to customize ~5% of U.S. healthcare bills).
Deployed an API to allow researchers to segment Twitter hashtags using a new segmentation model from Georgia Tech's NLP Lab.
In the pre GPT-3 times, worked on methods for automatically grading student essays [code].

Recommendations

A few interesting corners of the internet I find worth checking out!

... to read

PG, Siboehm, MIY

...

Katherine Lee et al.

Florian Ederer et al.

Nicholas Bloom et al.

... to `clone`

davidheineman/dotfiles

... to listen

Compressor Head

Generally Intelligent

Artem Kirsanov

Mary Wootters

Undefined Behavior

... to flip through

Games, Puzzles, and Computation by Erik Demaine

The Corrections by Jonathan Franzen

Society Must be Defended by Michel Foucault

Oblivion by David Foster Wallace

I also enjoy trying new coffee shops. Here's some recommendations across Atlanta, that I visited during my undergrad, and a growing list across Seattle.

David Heineman
Last updated November 2025 [view source]

curl -s https://davidheineman.com/rick | bash

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source