| CARVIEW |
Xiang Lisa Li
Hi! I am a final year Ph.D. student at Stanford University, coadvised by Percy Liang and Tatsunori Hashimoto. My research is supported by Stanford Graduate Fellowship and Two Sigma PhD Fellowship.
I work on developing methods to overcome structural limitations of language models. My research encompasses many stages of language model development, including architecture (Diffusion-LM), adaptation (Prefix-Tuning), self-supervision (GV-consistency), decoding (Contrastive Decoding) and evaluation (AutoBencher).
Previously, I received undergraduate degrees from Johns Hopkins University, majoring in Computer Science and Applied Mathmatics and Statistics. I am fortunate to be advised by Prof. Jason Eisner.
If you're interested in getting started in research and think it'd be useful to chat, please feel free to email me.
Email: xlisali [at] stanford.edu
Links: [Github] [Google Scholar]
Selected Publications
(for full publication list please checkout my [Google Scholar])
-
Diffusion-LM Improves Controllable Text Generation
Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori B. Hashimoto
In Neurips 2022
[bib] [abstract] [arxiv]Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM. Building upon the recent successes of diffusion models in continuous domains, Diffusion-LM iteratively denoises a sequence of Gaussian vectors into word vectors, yielding a sequence of intermediate latent variables. The continuous, hierarchical nature of these intermediate variables enables a simple gradient-based algorithm to perform complex, controllable generation tasks. We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.
@article{Li-2022-DiffusionLM, title={Diffusion-LM Improves Controllable Text Generation}, author={Xiang Lisa Li and John Thickstun and Ishaan Gulrajani and Percy Liang and Tatsunori Hashimoto}, journal={ArXiv}, year={2022}, volume={abs/2205.14217} } -
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li and Percy Liang
In ACL 2021
[bib] [abstract]Fine-tuning is the de facto way of leveraging large pretrained language models for downstream tasks. However, fine-tuning modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix. Prefix-tuning draws inspiration from prompting for language models, allowing subsequent tokens to attend to this prefix as if it were ``virtual tokens''. We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We show that by learning only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics that are unseen during training.
@inproceedings{li-liang-2021-prefix, title = "Prefix-Tuning: Optimizing Continuous Prompts for Generation", author = "Li, Xiang Lisa and Liang, Percy", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.353", doi = "10.18653/v1/2021.acl-long.353", pages = "4582--4597", } -
Benchmarking and Improving Generator-Validator Consistency of Language Models
Xiang Lisa Li, Vaishnavi Shrivastava, Siyan Li, Tatsunori Hashimoto, and Percy Liang
In ICLR 2023
[bib] [abstract]As of September 2023, ChatGPT correctly answers "what is 7+8" with 15, but when asked "7+8=15, True or False" it responds with "False". This inconsistency between generating and validating an answer is prevalent in language models (LMs) and erodes trust. In this paper, we propose a framework for measuring the consistency between generation and validation (which we call generator-validator consistency, or GV-consistency), finding that even GPT-4, a state-of-the-art LM, is GV-consistent only 76% of the time. To improve the consistency of LMs, we propose to finetune on the filtered generator and validator responses that are GV-consistent, and call this approach consistency fine-tuning. We find that this approach improves GV-consistency of Alpaca-30B from 60% to 93%, and the improvement extrapolates to unseen tasks and domains (e.g., GV-consistency for positive style transfers extrapolates to unseen styles like humor). In addition to improving consistency, consistency fine-tuning improves both generator quality and validator accuracy without using any labeled data. Evaluated across 6 tasks, including math questions, knowledge-intensive QA, and instruction following, our method improves the generator quality by 16% and the validator accuracy by 6.3% across all tasks.
@inproceedings{li2023gvconsistency, title = "Benchmarking and Improving Generator-Validator Consistency of Language Models", author = "Li, Xiang Lisa and Shrivastava, Vaishnavi and Li, Siyan and Hashimoto, Tatsunori and Liang, Percy", booktitle = "Proceedings of the International Conference on Learning Representations", month = apr, year = "2023", address = "Online", publisher = "International Conference on Learning Representations", url = "https://arxiv.org/pdf/2310.01846", } -
Contrastive Decoding: Open-ended Text Generation as Optimization
Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, and Mike Lewis
In ACL 2023
[bib] [abstract]Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The contrastive objective returns the difference between the likelihood under a large LM (called the expert, e.g., OPT-13B) and a small LM (called the amateur, e.g., OPT-125M), and the constraint ensures that the outputs are plausible. CD is inspired by the fact that the failures of larger LMs (e.g., repetition, incoherence) are even more prevalent in smaller LMs, and that this difference signals which texts should be preferred. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone. It also works across model scales (OPT-13B and GPT2-1.5B) and significantly outperforms four strong decoding algorithms (e.g., nucleus, top-k) in automatic and human evaluations across wikipedia, news, and story domains.
@inproceedings{li2023contrastivedecoding, title = "Contrastive Decoding: Open-ended Text Generation as Optimization", author = "Li, Xiang Lisa and Holtzman, Ari and Fried, Daniel and Liang, Percy and Eisner, Jason and Hashimoto, Tatsunori and Zettlemoyer, Luke and Lewis, Mike", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/pdf/2210.15097", } -
AutoBencher: Towards Declarative Benchmark Construction
Xiang Lisa Li, Farzaan Kaiyom, Evan Zheran Liu, Yifan Mai, Percy Liang, and Tatsunori Hashimoto
ArXiv 2024
[bib] [abstract]We present AutoBencher, a declarative framework for automatic benchmark construction, and use it to scalably discover novel insights and vulnerabilities of existing language models. Concretely, given a few desiderata of benchmarks (e.g., question difficulty, topic salience), we operationalize each desideratum and cast benchmark creation as an optimization problem. Specifically, we experiment with two settings with different optimization objectives: (i) for capability evaluation, we declare the goal of finding a salient, difficult dataset that induces novel performance patterns; (ii) for safety evaluation, we declare the goal of finding a dataset of unsafe prompts that existing LMs fail to decline. To tackle this type of optimization problem, we propose to use a language model to automatically construct datasets and iteratively revise the dataset to optimize for the declared desiderata. We use AutoBencher (powered by GPT-4) to create datasets for math, multilinguality, knowledge, and safety. The scalability of AutoBencher allows it to test fine-grained categories and tail knowledge, creating datasets that are on average 27% more novel and 22% more difficult than existing benchmarks. AutoBencher also helps identify specific gaps not captured by existing benchmarks: e.g., Gemini-Pro has knowledge gaps on Permian Extinction and Fordism while GPT-4 fails to decline harmful requests about cryptocurrency scams.
@article{li2024autobencher, title = "AutoBencher: Towards Declarative Benchmark Construction", author = "Li, Xiang Lisa and Kaiyom, Farzaan and Liu, Evan Zheran and Mai, Yifan and Liang, Percy and Hashimoto, Tatsunori", journal = "arXiv preprint arXiv:2407.08351", year = "2024", url = "https://arxiv.org/abs/2407.08351", } -
Posterior Control of Blackbox Generation
Xiang Lisa Li and Alexander Rush
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) , 2020.
[bib] [abstract] [appendix]Text generation often requires high-precision output that obeys task-specific rules. This fine-grained control is difficult to enforce with off-the-shelf deep learning models. In this work, we consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach. Under this formulation, task-specific knowledge can be encoded through a range of rich, posterior constraints that are effectively trained into the model. This approach allows users to ground internal model decisions based on prior knowledge, without sacrificing the representational power of neural generative models. Experiments consider applications of this approach for text generation. We find that this method improves over standard benchmarks, while also providing fine-grained control.
@inproceedings{li-rush-2020, author = {Xiang Lisa Li and Alexander M. Rush}, title = {Posterior Control of Blackbox Generation}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, year = {2020}, month = jul, address = {Online}, url = {https://xiangli1999.github.io/pdf/control_gen.pdf} } -
Specializing Word Embeddings (for Parsing) by Information Bottleneck
Xiang Lisa Li and Jason Eisner
In Conference on Empirical Methods in Natural Language Processing (EMNLP-IJCNLP), 2019.
Best Paper Award at EMNLP-IJCNLP 2019
[bib] [abstract] [appendix]Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic information, resulting in state-of-the-art performance on various tasks. We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, keeping only the information that helps a discriminative parser. We compress each word embedding to either a discrete tag or a continuous vector. In the discrete version, our automatically compressed tags form an alternative tag set: we show experimentally that our tags capture most of the information in traditional POS tag annotations, but our tag sequences can be parsed more accurately at the same level of tag granularity. In the continuous version, we show experimentally that moderately compressing the word embeddings by our method yields a more accurate parser in 8 of 9 languages, unlike simple dimensionality reduction.
@inproceedings{li-eisner-2019, author = {Xiang Lisa Li and Jason Eisner}, title = {Specializing Word Embeddings (for Parsing) by Information Bottleneck}, booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing}, year = {2019}, month = nov, address = {Hong Kong}, url = {https://cs.jhu.edu/~jason/papers/#li-eisner-2019} }
Honors & Awards
-
(April. 2023)
Two Sigma PhD Fellowship
-
(Sep. 2020)
Stanford Graduate Fellowship
-
(May. 2020)
Outstanding Senior Award
- (Dec. 2019) Outstanding Undergraduate Researcher Award (Computing Research Association)
-
(Nov. 2019)
Best Paper Award at EMNLP-IJCNLP
Teaching Experience
- (Spring 2023) TA @ CS 224U at Stanford
- (Winter 2023) TA @ CS 224N at Stanford
- (Spring 2020) TA @ Introduction to Statistics (AMS 553.430/630)
- (Spring 2019) TA @ Introduction to Probability (AMS 553.420/620)
- (Fall 2018) TA @ Introduction to Probability (AMS 553.420/620)
- (Spring 2017) TA @ Introduction to Probability (AMS 553.420/620)
- (Fall 2017) TA @ Introduction to Probability (AMS 553.420/620) Interestingly, a perpetual prob TA is switching to stats... Hope we can have fun in 430 :)