HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Fri, 21 Nov 2025 21:28:21 GMT
access-control-allow-origin: *
etag: W/"6920d975-52e6"
expires: Sun, 28 Dec 2025 18:21:34 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: B98F:2F7ECD:7E4AD6:8D9B8B:695172D5
accept-ranges: bytes
date: Sun, 28 Dec 2025 20:12:07 GMT
via: 1.1 varnish
age: 0
x-served-by: cache-bom-vanm7210091-BOM
x-cache: HIT
x-cache-hits: 1
x-timer: S1766952728.703021,VS0,VE222
vary: Accept-Encoding
x-fastly-request-id: d79289c9f02d5d4a0e0a8bebc7724be0d69e14a5
content-length: 5539
▸ We are a cross-lab MIT AI graduate student collective focusing on Algorithms That Learn and Scale . ▸ The group is open to all with an academic email - however if you are still interested shoot us an email or message us via Twitter . We currently host bi-weekly seminars and will have hands on sessions and research socials in the future. ▸ We are funded by generous donations from Pulkit Agrawal , Yoon Kim and BVP . ▸ Please contact the organizers for inquires
▸ Join our next seminar on Zoom or in-person: Click here to join the mailing list
▸ Check out our summer bootcamp: (link to the playlist)
11/25
LaTex: Interleave Latent and Text Chain-of-Thought for Efficient Reasoning
Shannon Shen (MIT)
11/17
Understanding Optimization in Deep Learning with Central Flows
Alex Damian (Harvard)
10/14
Minimally Training Video Foundation Model in the wild, from scratch
Simo Ryu (Fal)
9/30
Model Scaling: Part 1
Kevin Wang, Kristine Lu (MIT)
9/15
Challenges Facing ML Compilers in Practice - Numerics and Other Stuff
Horace He (Thinking Machines)
8/29
TunderKittens
Simran Arora (Caltech)
8/29
GPU Programming Fundamentals
William Brandon (Anthropic)
8/28
Positional Encodings and PaTH Attention
Songlin Yang (MIT)
8/27
Quantization in Large Models
Chris De Sa (Cornell)
8/26
Efficient & Effective Long-Context Modeling for Large Language Models
Guangxuan Xiao (MIT)
8/25
FlexOlmo: Open Language Models for Flexible Data Use
Sewon Min (UC Berkeley)
4/9
Biomolecular Modeling with Boltz-1
Jeremy Wohlwend (MIT)
3/26
Self-improvement of LLM agents through Reinforcement Learning at Scale
Yifei Zhou (BAIR)
3/12
Design and Optimization of Large-Scale Inference Systems at Kimi AI
Heyi Tang (Kimi)
3/5
What’s Next for Mamba? Towards More Expressive Recurrent Update Rules
Songlin Yang (MIT)
2/26
Towards End-to-end Cost-effective Pre-training for Large Language Model
Yikang Shen (IBM)
2/5
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Boyuan Chen (MIT)
1/22
Hymba: A Hybrid-head Architecture for Small Language Models
Xin Dong (NVIDIA)
12/4
Machete: A Mixed-Input GEMM Kernel Optimized for NVIDIA Hopper GPUs
Lucas Wilkinson (Neural Magic)
11/20
StreamingLLM and DuoAttention: Efficient and Effective Long Sequence Modeling for Large Language Models
Guangxuan Xiao (MIT)
11/13
Exocompilation for Productive Programming of Hardware Accelerators
Yuka Ikarashi (MIT)
10/30
u-μP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake, Constantin Eichenberg (Graphcore)
10/28
ZipNN - A Lossless Compression Library tailored for AI models
Moshik Hershcovitch (IBM Research)
10/16
Transformers and Turing Machines
Eran Malach (Harvard)
09/04
A New Perspective on Shampoo's Preconditioner
Nikhil Vyas (Harvard)
08/22
1B parameter model training. (hands on session)
Aniruddha Nrusimha (MIT)
08/12
How to scale models with Modula in NumPy. (hands on session)
Jeremy Bernstein (MIT)
07/24
FineWeb: Creating a large dataset for pretraining LLMse
Guilherme Penedo (Hugging Face)
07/17
Hardware-aware Algorithms for Language Modeling
Tri Dao (Princeton)
07/10
LLM360: Towards Fully Transparent Open-Source LLMs
Hongyi Wang (CMU)
07/3
DeciMamba: Exploring the Length Extrapolation Potential of Mamba.
Assaf Ben-Kish (Tel-Aviv)
04/17
Adapting LLMs with Reinforcement Learning
Idan Shenfeld
04/03
The Quest to build an (O)pen (L)anguage (Mo)del
Luca Soldaini (AI2)
03/20
Efficient Deep Learning with Sparsity: Algorithms, Systems, and Applications
Zhijian Liu
03/12
Building and Deploying Large Language Model Applications Efficiently and Verifiably
Ying Sheng (Stanford)
03/06
In-Context Language Learning and N-gram Heads
Ekin Akyürek
02/21
Neurons, norms and number systems
Jeremy Bernstein
11/28
Sparsity in Transformers
Shobhita Sundaram
10/18
Large-Scale RNNs in the era of Transformers
Bailin Wang
11/01
Critical batch-size in deep learning
Minyoung Huh (Jacob)
10/18
Tensor Program Synthesis
Han Guo
10/04
Mixture of Experts (MOEs)
Jyo Pari
09/13
Speculative Decoding
Aniruddha Nrusimha