HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 25 Jul 2024 00:37:09 GMT
access-control-allow-origin: *
etag: W/"66a19e35-343b"
expires: Mon, 29 Dec 2025 16:13:00 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: DD97:2680BD:90405C:A1E14B:6952A632
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 16:03:00 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210039-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767024180.962351,VS0,VE204
vary: Accept-Encoding
x-fastly-request-id: 7b7ea72cee048c9db66fa632cfd013eff30c9717
content-length: 3569
Audrey Huang
I care about developing principled and implementable algorithms with provable guarantees.
Current/previous threads include:
Online finetuning (e.g., of large language models)
Imitation learning
Tractable online exploration
Offline RL and evaluation
I believe that theoretical insights into fundamental questions will lead to real-world algorithmic improvements, and vice versa.
Correcting the Mythos of KL-Regularization:
Direct Alignment without Overoptimization via Chi-Squared
Preference Optimization
(Preprint, 2024) Audrey Huang*, Wenhao Zhan, Tengyang Xie, Jason D. Lee, Wen Sun, Akshay Krishnamurthy, Dylan J. Foster
A one-line change to DPO derived from chi-squared regularization provably mitigates overoptimization.
Non-adaptive Online Finetuning for Offline Reinforcement Learning
(RLC, 2024) Audrey Huang*, Mohammad Ghavamzadeh, Nan Jiang, Marek Petrik.
Given an offline dataset, how should online data be collected in order to maximize policy improvement?
Reinforcement Learning in Low-Rank MDPs with Density Features
(ICML 2023) Audrey Huang*, Jinglin Chen, Nan Jiang.
Offline and online RL via the occupancy functions is sample-efficient in low-rank MDPs. A clean inductive error analysis tames error exponentiation.
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions
(NeurIPS 2022) Audrey Huang*, Nan Jiang.
Regularization is key for accurate offline value and density-ratio estimation from general function approximators.
Offline Reinforcement Learning with Realizability and Single-policy Concentrability
(COLT 2022) Wenhao Zhan, Baihe Huang, Audrey Huang*, Nan Jiang, Jason Lee.
With proper regularization, offline RL is sample-efficient given only realizable function classes, and data with single-policy coverage.