| CARVIEW |
Ulyana Piterbarg
I'm a visiting researcher at Meta FAIR and a final-year Ph.D. at NYU CILVR, where I'm co-advised by Rob Fergus and Lerrel Pinto, and supported by the NSF GRFP and a scholarship from Google DeepMind.
My research focuses on the intersection of scaling and imitation/reinforcement learning for long & open-ended agent tasks. I'm especially interested in training sandboxes and modeling recipes that can enable LMs & VLMs to solve and/or provide assistance on tasks that take humans 10-1000s of hours to complete.
Previously, as a research scientist intern, I worked on agent post-training for production MoE LLMs (Meta Llama Team), efficient distillation algorithms for Phi models (Microsoft), and neural nets for solving PDEs (Google Research). Before that, I did my undergrad in mathematics and computer science at MIT, during which I was exceptionally lucky to be mentored by Kelsey R. Allen and Josh Tenenbaum.
Once upon a time, I was a design assistant to the director of the Exhibitions Lab of the American Museum of Natural History.
You can reach me at up2021 [at] nyu.edu.
Recent News
Research
LLMs are becoming increasingly capable agents. I'm interested in the data, algorithms, and environments that will enable models to autonomously complete and/or collaborate with humans on tasks that lie at the "edge of simulation," e.g. solving open problems in mathematics, developing maintainable & reliable software, or ascending in NetHack.
Unlike question-answering and short-horizon tool-use, it is difficult (and in some cases, impossible) to collect demonstration data or train models with "vanilla" online RL for such tasks.
Throughout my Ph.D., I've worked towards this setting by:
- developing methods for efficiently training production MoEs on multi-agent tool-use
- studying hierachical policy learning & data scaling laws for tiny VLMs/LMs on the ultra long-horizon videogame NetHack (HiHack, diff History)
- showing that training code LMs to act and explore diff-by-diff can improve pass@ scaling laws (LintSeq)
- contributing to benchmarks and platforms for agent evaluations + RL on long tasks (BALROG, Gaia2/ARE)
- mid-training LMs for more human-like code exploration (D3, FAIR CodeGen)