| CARVIEW |
Wei Xu
[phonetic pronunciation: way shoo ]
Associate ProfessorCollege of Computing
Georgia Institute of Technology
wei.xu@cc.gatech.edu
@cocoweixu
I am a faculty member in Computer Science at Georgia Tech’s School of Interactive Computing (one of four schools in College of Computing) and Machine Learning Center. My research focuses on advancing large language models across three areas:
- (1) reinforcement learning & post-training: multilinguality, cultural adaptation, reasoning, temporal robustness;
- (2) evaluation: long-context, multi-turn interactions, user simulation, agents, and personalization;
- (3) interdisciplinary AI+X applications: education, privacy, law, healthcare, and beyond.
I plan to recruit 1–2 PhD students for Fall 2026 (please apply to the Machine Learning or CS PhD program and list me as a potential advisor). I also recruit research-oriented MS students (apply to the MSCS program and email me) and motivated undergraduates with sufficient time to commit to research. Although I do not normally respond to admission inquiries given the volume, a brief email after you submit your application can help ensure I don’t miss it in the system.
What's New
- Sep 2025, paper on LLM probabilistic reasoning accepted to NeurIPS 2025.
- Aug 2025, 4 papers accepted to EMNLP 2025 main conference
- Aug 2025, talk at Apple ML research (virtual) "Probabilistic Reasoning and Multicultural Alignment in LLMs"
- Jul 2025, talk at JPMorgan AI Research (virtual)
- May 2025, talk at Sungkyunkwan University, South Korea
- May 2025, keynote at PrivateNLP@NAACL "Empowering Everyday Users to Protect Their Privacy in the Age of AI".
- May 2025, co-organize the 10th Workshop for Noisy User-generated Text (WNUT) at NAACL 2025.
- Apr 2025, co-organize Human-centered Evaluation and Auditing of Language Models Workshop at CHI 2025.
- Mar 2025, talk at University of Pennsylvania
- Feb 2025, talk at University of Massachusetts, Lowell (virtual)
- Feb 2025, talk at Google Research, Mountain View
- Feb 2025, talk at University of California, Berkeley
- Dec 2024, Chao Jiang successfully defended his phd thesis and will join Apple AI/ML Research
- Oct 2024, received an NIH R01 grant!
- Oct 2024, talk at Bloomberg's CTO Data Science Speaker Series
- Oct 2024, talk at Stony Brook University, New York
- Oct 2024, received the Google Academic Research Award 🏆 !
- Oct 2024, talk at Tokyo Institute of Technology on "Enhancing Multilingual Capabilities in LLMs" (slides)
- Sep 2024, 4 long papers and 1 short paper accepted to EMNLP main conference.
- Sep 2024, talk at MIT on "Cultural Biases, World Languages, and Privacy in Large Language Models" (slides).
- Sep 2024, talk at Northeastern on "Human-AI Collaboration in Evaluating LLMs".
- Aug 2024, 🏆 our paper on multicultural LLMs won the Best Social Impact Award at ACL 2024!
- Aug 2024, tutorial at ACL 2024 on "Automatic and Human-AI Interactive Text Generation" (slides)
- Aug 2024, my PhD advisor Ralph Grishman won the ACL Lifetime Achievement Award
- Aug 2024, talk at Megagon (virtual)
- July 2024, Yang Chen successfully defended his phd thesis, and will join NVIDIA as a research scientist.
- June 2024, talk at NSF workshop on AI Text Production (virtual)
- May 2024, 6 long papers accepted to ACL 2024 main conference!
- May 2024, keynote at CHI 2024 HEAL Workshop on "Human-AI Collaboration in Evaluating LLMs" (slides).
- May 2024, Yao Dou will start his summer internship at Microsoft Research; Chao Jiang will intern at Apple.
- Apr 2024, 🏆 David Heineman won the CoC Outstanding Undergraduate Research Award!
- Mar 2024, press coverage by VentureBeat on our new research about cultural biases in LLMs
- Mar 2024, talk at USC and UCLA on "Amazing Multilingual Capabilities and Concerning Cultural Biases in LLMs"
- Oct 2023, demo of Thresh 🌾 has been accepted to EMNLP 2023 -- a customizable tool for fine-grained human evaluation of LLM generated texts (e.g., MT, summarization, text revision, + more)
- Aug 2023, I was quoted in Business Insider about AI-generated content online.
- Aug 2023, Mounica Maddela defended her PhD thesis and will join Bloomberg AI's LLM group
- July 2023, our paper on multilingual text simplification received Honorable Mention Award at ACL 2023!
Research Highlights
Multilingual Multicultural LLMs
While LLMs have demonstrated impressive performance, their success is largely concentrated in English and other high-resource languages. In contrast, many non-English languages remain underrepresented and underserved. Moreover, these models often reflect Western cultural biases and struggle to capture the nuances of non-Western cultural contexts (Naous et al., ACL 2024; Naous et al., NAACL 2025). We work on identifying and closing these gaps in performance and cultural adaptation. Addressing these challenges calls for a deeper analysis of pre-training data to identify and mitigate representational gaps, as well as alignment (Guo et al., arXiv 2025) and inference-time algorithms (Le at al., ICLR 2024) that can dynamically adapt model behavior to diverse linguistic and cultural contexts.
Robustness and Reasoning of LLMs
Artificial General Intelligence (AGI) benchmarks seek to assess an AI system’s capacity to perform tasks that require human-level intelligence, including reasoning, learning, and adapting to novel situations (Zheng et al., ACL 2024; Mendes et al., EMNLP 2024). While current systems fall short of true AGI, there is growing interest in moving beyond static benchmarks toward more realistic, dynamic evaluations. Our research focuses on designing real-world tasks that better reflect practical challenges faced by LLMs, and on developing innovative methods (Zheng et al., arXiv 2025) to enhance their robustness and performance in these complex settings.
Interdisciplinary NLP+X Research
We actively collaborate with researchers to explore impactful real-world applications of large language models in Human-Computer Interaction, Security and Privacy, Healthcare, and Law (Jiang et al., EMNLP 2024; Dou et al., ACL 2024). As LLMs continue to advance, they offer exciting new capabilities across specialized domains. There are a lot of opportunities, as LLMs often exhibit promising but inconsistent performance in domain-specific tasks, where precision, context sensitivity, and domain knowledge are critical.
NLP X Lab
Tarek Naous (ECE ML PhD; multilingual multicultural LLM)
Duong Minh Le (CS PhD; multilingual LLM -- co-advisor: Alan Ritter)
Jonathan Zheng (ML PhD; reasoning, robustness of LLM -- co-advisor: Alan Ritter)
Geyang Guo (CS PhD; LLM alignment -- co-advisor: Alan Ritter)
Junmo Kang (CS PhD; efficiency -- co-advisor: Alan Ritter)
Usneek Singh (CS MS, autumn 2025 -- )
Yiren Wang (CS MS, autumn 2025 -- )
Zicong He (ECE MS, summer 2025 -- )
Govind Ramesh (BSMS, winter 2022 -- ; LLM safety)
Jerry Zheng (BSMS, autumn 2025 -- )
Julie Young (BSMS, autumn 2025 -- )
Rachel Choi (part-time, summer 2022 -- )
Oleksandr Lavreniuk (Undergrad, summer 2024 -- )
Sara Takagi (Undergrad, summer 2025 -- )
Katerina Addington (Undergrad, autumn 2025 -- )
Eric Kim (Undergrad, autumn 2025 -- )
Frank Chang (Undergrad, autumn 2025 -- )
Guanjun Yan (Undergrad, autumn 2025 -- )
Alexey Plagov (Undergrad, autumn 2025 -- )
Benjamin Mamut (Undergrad, autumn 2025 -- )
Jiayu Liu (Undergrad intern from UIUC, summer 2025 -- )
Alumni (with theses)
Chao Jiang (PhD 2025 → Apple AI/ML research)Yang Chen (PhD 2024, co-advisor: Alan Ritter → Research Scientist at NVIDIA)
Mounica Maddela (PhD 2023 → Bloomberg AI)
Wuwei Lan (PhD 2021 → Applied Scientist at Amazon)
Xiaofeng Wu (MS 2025 → Baidu)
Marcus Ma (MS 2024 → PhD student at USC)
Anton Lavrouk (MS 2024 → Lockheed Martin)
David Heineman (BS 2024, CoC Outstanding Undergrad Research Award → Predoctoral young investigator at AI2)
Jonathan Zheng (BS 2023 → PhD student at Georgia Tech)
Michael Ryan (BS 2023 → PhD student at Stanford)
Publications
-
Flipping the Dialogue: Training and Evaluating User Language Models
Tarek Naous, Philippe Laban, Wei Xu, Jennifer Neville
arXiv, 2025 -
Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages
Tarek Naous, Anagha Savit, Carlos Rafael Catalan, Geyang Guo, Jaehyeok Lee, Kyungdon Lee, Lheane Marie Dizon, Mengyu Ye, Neel Kothari, Sahajpreet Singh, Sarah Masud, Tanish Patwa, Trung Thanh Tran, Zohaib Khan, Alan Ritter, JinYeong Bak, Keisuke Sakaguchi, Tanmoy Chakraborty, Yuki Arase, Wei Xu
arXiv, 2025 -
Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges
Xiaofeng Wu, Alan Ritter, Wei Xu
arXiv, 2025 -
Probabilistic Reasoning with LLMs for Privacy Risk Estimation
Jonathan Zheng, Sauvik Das, Alan Ritter, Wei Xu
NeurIPS 2025 -
CARE: Multilingual Human Preference Learning for Cultural Awareness
Geyang Guo, Tarek Naous, Hiromi Wakaki, Yukiko Nishimura, Yuki Mitsufuji, Alan Ritter, Wei Xu
EMNLP 2025 -
SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?
Yao Dou, Michel Galley, Baolin Peng, Chris Kedzie, Weixin Cai, Alan Ritter, Chris Quirk, Wei Xu, Jianfeng Gao
EMNLP 2025 -
What are Foundation Models Cooking in the Post-Soviet World?
Anton Lavrouk, Tarek Naous, Alan Ritter, Wei Xu
EMNLP 2025 -
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Ruohao Guo, Wei Xu, Alan Ritter
EMNLP 2025 -
Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge
Agam Shah, Liqin Ye, Sebastian Jaskowski, Wei Xu, Sudheer Chava
COLM 2025 -
Evaluating LLMs on Chinese Idiom Translation
Cai Yang, Yao Dou, David Heineman, Xiaofeng Wu, Wei Xu
COLM 2025 -
On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena
Tarek Naous, Wei Xu
NAACL 2025 -
The Impact of Visual Information in Chinese Characters
Xiaofeng Wu, Karl Stratos, Wei Xu
NAACL 2025 -
Generating CAD Code with Vision-Language Models for 3D Designs
Kamel Alrashedy*, Pradyumna Tambwekar*, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, Matthew Gombolay
(* equal contribution)
ICLR 2025 -
CROSSNEWS: A Cross-Genre Authorship Verification and Attribution Benchmark
Marcus Ma, Duong Minh Le, Junmo Kang, Yao Dou, John Cadigan, Dayne Freitag, Alan Ritter, Wei Xu
AAAI 2025 -
Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI
Isadora Krsek, Anubha Kabra, Yao Dou, Tarek Naous, Laura A. Dabbish, Alan Ritter, Wei Xu, Sauvik Das
CSCW 20252024
-
Granular Privacy Control for Geolocation with Vision Language Models
Ethan Mendes, Yang Chen, James Hays, Sauvik Das, Wei Xu, Alan Ritter
EMNLP 2024 -
MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain
Chao Jiang, Wei Xu
EMNLP 2024 -
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment
Tarek Naous, Michael J. Ryan, Anton Lavrouk, Mohit Chandra, Wei Xu
EMNLP 2024 -
Improving Minimum Bayes Risk Decoding with Multi-Prompt
David Heineman, Yao Dou, Wei Xu
EMNLP 2024 -
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
Govind Ramesh, Yao Dou, Wei Xu
EMNLP 2024 -
ChatHF: Collecting Rich Human Feedback from Real-time Conversations [video]
Andrew Li, Zhenduo Wang, Ethan Mendes, Duong Minh Le, Wei Xu, Alan Ritter
EMNLP 2024 (Demo) -
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Tarek Naous, Michael J. Ryan, Alan Ritter, Wei Xu
ACL 2024 🏆 Best Social Impact Award
Press Coverage by VentureBeat -
Reducing Privacy Risks in Online Self-Disclosures with Language Models
Yao Dou, Isadora Krsek, Tarek Naous, Anubha Kabra, Sauvik Das, Alan Ritter, Wei Xu
ACL 2024 -
NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms
Jonathan Zheng, Alan Ritter, Wei Xu
ACL 2024 -
Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
Ruohao Guo, Wei Xu, Alan Ritter
ACL 2024 -
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence
Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa Göke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li
ACL 2024 -
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Jan Trienes, Sebastian Joseph, Jörg Schlötterer, Christin Seifert, Kyle Lo, Wei Xu, Byron C. Wallace, Junyi Jessy Li
ACL 2024 -
Automatic and Human-AI Interactive Text Generation (slides)
Yao Dou*, Philippe Laban*, Claire Gardent, Wei Xu (* equal contribution)
ACL 2024 (Tutorial) -
Constrained Decoding for Cross-lingual Label Projection
Duong Minh Le, Yang Chen, Alan Ritter, Wei Xu
ICLR 2024 -
Design and Evaluation of an Automatic Text Simplification Prototype with Deaf and Hard-of-hearing Readers
Oliver Alonzo, Sooyeon Lee, Akhter Al Amin, Mounica Maddela, Wei Xu, Matt Huenerfauth
ASSETS 2024 -
Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation
Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu
EACL 2024 Workshop on Noisy User-generated Text -
Thresh 🌾: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation [code/demo]
David Heineman, Yao Dou, Wei Xu
EMNLP 2023 (Demo) -
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
David Heineman, Yao Dou, Mounica Maddela, Wei Xu
EMNLP 2023 -
Multilingual Simplification of Medical Texts
Sebastian Joseph, Kathryn Kazanas, Keziah Reina, Vishnesh Ramanathan, Wei Xu, Byron Wallace, Junyi Jessy Li
EMNLP 2023 -
A Computational Interface to Infer Strategic Intent from Unstructured Language in a Low-Data Setting
Pradyumna Tambwekar, Lakshita Dodeja, Nathan Vaska, Wei Xu, Matthew Gombolay
EMNLP 2023 (Findings) -
LENS 🔎 - A Learnable Evaluation Metric for Text Simplification [code/demo]
Mounica Maddela*, Yao Dou*, David Heineman, Wei Xu (* equal contribution)
ACL 2023 -
Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models
Junmo Kang, Wei Xu, Alan Ritter
ACL 2023 -
Revisiting non-English Text Simplification: A Unified Multilingual Benchmark
Michael J. Ryan, Tarek Naous, Wei Xu
ACL 2023 🏆 Best Paper Award Honorable Mention -
Improved Instruction Ordering in Recipe-Grounded Conversation
Duong Minh Le, Ruohao Guo, Wei Xu, Alan Ritter
ACL 2023 Press Coverage by GT News -
Human-in-the-loop Evaluation for Early Misinformation Detection
Ethan Mendes, Yang Chen, Wei Xu, Alan Ritter
ACL 2023 -
Frustratingly Easy Label Projection for Cross-lingual Transfer
Yang Chen, Chao Jiang, Alan Ritter, Wei Xu
ACL 2023 (Findings) -
Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification
Renliang Sun, Wei Xu, Xiaojun Wan
ACL 2023 (Findings) -
Can Language Models be Instructed to Protect Personal Information?
Yang Chen*, Ethan Mendes*, Sauvik Das, Wei Xu, Alan Ritter (* equal contribution)
arXiv 2310.02224 -
Improving Large-scale Paraphrase Acquisition and Generation [data/leaderboard]
Yao Dou, Chao Jiang, Wei Xu
EMNLP 2022 -
🦕 Stanceosaurus: Classifying Stance Towards Multicultural Misinformation [data]
Jonathan Zheng, Ashutosh Baheti, Tarek Naous, Wei Xu, Alan Ritter
EMNLP 2022 -
arXivEdits: Understanding the Human Revision Process in Scientific Writing [data]
Chao Jiang, Wei Xu, Sam Stevens
EMNLP 2022 -
A Dataset of Word-Complexity Judgements from Deaf and Hard-of-Hearing Adults for Text Simplification
Oliver Alonzo, Sooyeon Lee, Mounica Maddela, Wei Xu, Matt Huenerfauth
EMNLP TSAR Workshop 2022 -
Extracting a Knowledge Base of COVID-19 Events from Social Media [data]
Shi Zong, Ashutosh Baheti, Wei Xu, Alan Ritter
COLING 2022 -
BiSECT: Learning to Split and Rephrase Sentences with Bitexts [data/code]
Joongwon Kim*, Mounica Maddela*, Reno Kriz, Wei Xu, Chris Callison-Burch (* equal contribution)
EMNLP 2021 -
Pre-train or Annotate? Domain Adaptation with a Constrained Budget [data/code]
Fan Bai, Alan Ritter, Wei Xu
EMNLP 2021 -
WIKIBIAS: Detecting Multi-Span Subjective Biases in Language [data] [code]
Yang Zhong, Jingfeng Yang, Wei Xu, Diyi Yang
EMNLP 2021 (Findings) -
Neural semi-Markov CRF for Monolingual Word Alignment [code/data][slides][video]
Wuwei Lan*, Chao Jiang*, Wei Xu (* equal contribution)
ACL 2021 -
Controllable Text Simplification with Explicit Paraphrasing [data/code][slides] [poster]
Mounica Maddela, Fernando Alva-Manchego, Wei Xu
NAACL 2021 -
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics [project website]
Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou
arXiv:2102.01672, ACL GEM Workshop 2021 -
An Empirical Study of Pre-trained Transformers for
Arabic Information Extraction [pre-trained GigaBERT]
Wuwei Lan, Yang Chen, Wei Xu, Alan Ritter
EMNLP 2020 -
WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab Protocols [data]
Jeniya Tabassum, Sydney Lee, Wei Xu, Alan Ritter
EMNLP 2020 Workshop on Noisy User-generated Text (shared-task overview) -
Neural CRF Model for Sentence Alignment in Text Simplification [code/data][slides][video]
Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, Wei Xu
ACL 2020 -
Code and Named Entity Recognition in StackOverflow [code/data][slides][video]
Jeniya Tabassum, Mounica Maddela, Wei Xu, Alan Ritter
ACL 2020 -
Generalizing Natural Language Analysis through Span-relation Representations [code/data]
Zhengbao Jiang, Wei Xu, Jun Araki, Graham Neubig
ACL 2020 -
Learning Relation Entailment with Structured and Textual Information
Zhengbao Jiang, Jun Araki, Donghan Yu, Ruohong Zhang, Wei Xu, Yiming Yang, Graham Neubig
AKBC 2020 -
Discourse Level Factors for Sentence Deletion in Text Simplification [poster][slides][data - email me]
Yang Zhong, Chao Jiang, Wei Xu, Junyi Jessy Li
AAAI 2020 -
Multi-task Pairwise Neural Ranking for Hashtag Segmentation [code/data][poster][bib][live demo]
Mounica Maddela, Wei Xu, Daniel Preoţiuc-Pietro
ACL 2019 -
A Word-Complexity Lexicon and a Neural Readability Ranking Model for Lexical Simplification [code/data][slides][video][bib]
Mounica Maddela, Wei Xu
EMNLP 2018 -
Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering [bib][code][slides]
Wuwei Lan, Wei Xu
COLING 2018 🏆 Best Paper Award -
Character-based Neural Networks for Sentence Pair Modeling [bib][code][poster]
Wuwei Lan, Wei Xu
NAACL 2018 -
An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols [bib][data (improved version)][poster]
Chaitanya Kulkarni, Wei Xu, Alan Ritter, Raghu Machiraju
NAACL 2018 -
A Continuously Growing Dataset of Sentential Paraphrases [bib][data][slides]
Wuwei Lan, Siyu Qiu, Hua He, Wei Xu
EMNLP 2017 -
From Shakespeare to Twitter: What are Language Styles all about? [bib][slides]
Wei Xu
EMNLP 2017 Workshop on Stylistic Variation -
A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter [bib][slides]
Jeniya Tabassum, Alan Ritter, Wei Xu
EMNLP 2016 -
Results of the WNUT16 Named Entity Recognition Shared Task [bib]
Benjamin Strauss, Bethany Toma, Alan Ritter, Marie-Catherine de Marneffe, Wei Xu
COLING 2016 Workshop on Noisy User-generated Text (shared-task overview) -
Optimizing Statistical Machine Translation for Text Simplification [bib][data/code][slides][video]
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch
TACL 2016, oral presentation at ACL 2016 -
Discovering User Attribute Stylistic Differences via Paraphrasing [bib] [data]
Daniel Preoţiuc-Pietro, Wei Xu, Lyle Ungar
AAAI 2016 -
Problems in Current Text Simplification Research: New Data Can Help [bib][data][slides][video]
Wei Xu, Chris Callison-Burch, Courtney Napoles
TACL 2015, oral presentation at EMNLP 2015 -
Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition [bib]
Timothy Baldwin, Marie-Catherine de Marneffe, Bo Han, Young-Bum Kim, Alan Ritter, Wei Xu
ACL 2015 Workshop on Noisy User-generated Text (shared-task overview) -
Cost Optimization for Crowdsourcing Translation [bib]
Mingkun Gao, Wei Xu, Chris Callison-Burch
NAACL 2015 -
SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT) [bib][data & code - email me]
Wei Xu, Chris Callison-Burch, William B. Dolan
SemEval 2015 (shared-task overview) -
Extracting
Lexically Divergent Paraphrases from Twitter [bib][code][video][data - email me]
Wei Xu, Alan Ritter, Chris Callison-Burch, William B. Dolan, Yangfeng Ji
TACL 2014, oral presentation at NAACL 2015 -
Poetry of the
Crowd: A Human Computation Algorithm to Convert Prose into
Rhyming Verse [bib]
Quanze Chen, Chenyang Lei, Wei Xu, Ellie Pavlick, Chris Callison-Burch
HCOMP 2014 (work-in-progress) -
Infusion of
Labeled Data into Distant Supervision for Relation
Extraction [bib]
Maria Pershina, Bonan Min, Wei Xu, Ralph Grishman
ACL 2014 -
Data-driven
Approaches for Paraphrasing Across Language Variations
[bib]
Wei Xu
PhD Thesis -
Filling
Knowledge Base Gaps for Distant Supervision of Relation
Extraction [bib][data]
Wei Xu, Raphael Hoffmann, Le Zhao, Ralph Grishman
ACL 2013 -
Gathering and
Generating Paraphrases from Twitter with Application to
Normalization [bib][data]
Wei Xu, Alan Ritter, Ralph Grishman
ACL 2013 Workshop on Building and Using Comparable Corpora -
A
Preliminary Study of Tweet Summarization using Information
Extraction [bib][data]
Wei Xu, Ralph Grishman, Adam Meyers, Alan Ritter
NAACL 2013 Workshop on Language Analysis in Social Media -
Paraphrasing
for Style [bib][data/code]
Wei Xu, Alan Ritter, Bill Dolan, Ralph Grishman, Colin Cherry
COLING 2012 -
Exploiting
Syntactic and Distributional Information for Spelling
Correction with Web-Scale N-gram Models [bib]
Wei Xu, Joel Tetreault, Martin Chodorow, Ralph Grishman, Le Zhao
EMNLP 2011 -
Passage
Retrieval for Information Extraction using Distant
Supervision
Wei Xu, Ralph Grishman, Le Zhao
IJCNLP 2011 -
New York
University 2011 System for KBP Slot Filing
Ang Sun, Ralph Grishman, Wei Xu, Bonan Min
TAC 2011 -
Who,
What, When, Where, Why? Comparing Multiple Approaches to
the Cross-Lingual 5W Task
Kristen Parton, Kathleen R. McKeown, Bob Coyne, Mona T. Diab, Ralph Grishman, Dilek Hakkani-Tür, Mary Harper, Heng Ji, Wei Yun Ma, Adam Meyers, Sara Stolbach, Ang Sun, Gokhan Tur, Wei Xu, Sibel Yaman
ACL 2009 -
A
Parse-and-Trim Approach with Information Significance for
Chinese Sentence Compression
Wei Xu, Ralph Grishman
ACL Workshop on Language Generation and Summarisation 2009 -
Transducing
Logical Relations from Automatic and Manual
Annotation
Adam Meyers, Michiko Kosaka, Heng Ji, Nianwen Xue, Mary Harper, Ang Sun, Wei Xu, Shasha Liao
ACL Workshop on Linguistic Annotation 2009 -
Automatic
Recognition of Logical Relations for English, Chinese and
Japanese in the GLARF Framework
Adam Meyers, Michiko Kosaka, Nianwen Xue, Heng Ji, Ang Sun, Shasha Liao, Wei Xu
SemEval 2009 -
Using Non-Local
Features to Improve Named Entity Recognition Recall
Xinnian Mao, Wei Xu, Yuan Dong, Haila Wang
PACLIC 2007 -
Domain
Extension of Chinese Named Entity Recognition
Wei Xu, Bin Fu, Liu Liu, Chunfa Yuan, Wenjie Li
Frontiers of Content Computing 2007 -
Extractive
Summarization using Inter- and Intra- Event
Relevance
Wenjie Li, Wei Xu, Mingli Wu, Chunfa Yuan, Qin Lu
ACL 2006 -
Deriving Event
Relevance from the Ontology Constructed with Formal Concept
Analysis
Wei Xu, Wenjie Li, Mingli Wu, Wei Li, Chunfa Yuan
CICLing 2006 -
Building
Document Graphs for Multiple News Articles Summarization:
An Event-Based Approach
Wei Xu, Wenjie Li, Mingli Wu, Wei Li, Chunfa Yuan, Kam-Fai Wong
ICCPOL 2006 -
The
Hong Kong Polytechnic University at ACE2005
Wenjie Li, Wei Li, Mingli Wu, Wei Xu
ACE 2005
Preprints
2025
2023
2022 and before
Teaching
Current Offering:- CS 7650 (Georgia Tech) - Natural Language Processing (graduate level - Autumn 2025)
- CS 8803-LLM (Georgia Tech) - Large Language Models (a new research-oriented class - Autumn 2024)
- CS 8803-NLP (Georgia Tech) - Advanced NLP (a research-oriented class - Autumn 2023)
- CS 7650 (Georgia Tech) - Natural Language Processing (graduate level - Spring 2024, Autumn 2022, 2021)
- CS 4650 (Georgia Tech) - Natural Language Processing (undergraduate level - Spring 2025, 2023, 2022, 2021))
- Speech and Language Processing (Spring 2020, 2017)
- Social Media and Text Analytics (Autumn 2019, 2017, 2016)
Service
- Executive board member: NAACL (2023-2024)
- Best paper award committee: EMNLP (2024, 2022)
- Senior area chair: NAACL (2025, 2022, 2021); EMNLP (2024, 2022); ACL (2020)
- Area chair: COLM (2024); ACL (2023, 2019); EMNLP (2021, 2020, 2018, 2016); AAAI (2020); NAACL (2019); COLING (2018)
- Workshop chair: ACL (2017)
- Publicity chair: ACL (2026), EMNLP (2019), NAACL (2018, 2016)
Miscellaneous
When I have spare time, I enjoy visiting art museums, hiking, biking, and snowboarding.
I wrote a biography of my phd advisor Ralph Grishman along with some early history of Information Extraction research in 2017. Ralph was named an ACL Fellow and later received the ACL Lifetime Achievement Award.
I also photographed and made a list of the best dressed NLP researchers in 2016/17 , 2015 and 2014.



photos together with 
