Tutorials of ACL, ADL, NeurIPS 2024

Watermarking for Large Language Model

Xuandong Zhao

UC Berkeley

Yu-Xiang Wang

UC San Diego

Lei Li

CMU

Join us for the Q&A on Slido

Abstract

Generative AI has significantly advanced, particularly in natural language processing, exemplified by models like ChatGPT, but these advancements have raised concerns about misuse, such as generating fake news or plagiarizing content. This tutorial explores text watermarking as a solution, embedding detectable patterns within AI-generated text to verify its origin. We will cover the evolution of text watermarking, its modern techniques, and challenges, along with model watermarking for copyright protection. Participants will gain a solid understanding of watermarking methods, their practical applications, and future research directions in this critical field.

Schedule

Part I: Introduction [slides]

Presenter: Xuandong Zhao

This section provides the background for the tutorial, presenting the challenges posed by machine-written text and the potential ethical issues arising from Large Language Models (LLMs). We will introduce two primary approaches to addressing these issues: post-hoc detection and watermarking methods.

Part II: Text Watermarking [slides]

Presenter: Xuandong Zhao, Yu-Xiang Wang

We delve into the process and evolution of watermarking for natural languages. We further explore watermarking methods specifically designed for Large Language Models. We will also cover the theoretical analysis of each watermarking method.

Early stages of text watermarking
Watermarking for Large Language Models:
- KGW (Green-Red) Watermark: Kirchenbauer et al. (2023) A Watermark for Large Language Models
- Unigram (Green-Red) Watermark: Zhao et al. (2023) Provable Robust Watermarking for AI-Generated Text
- Gumbel Watermark: Aaronson (2023) Watermarking of Large Language Models
- Undetectable Watermark: Christ et al. (2023) Undetectable Watermarks for Language Models
- Distortion-free Watermark: Kuditipudi et al. (2023) Robust Distortion-free Watermarks for Language Models
- PF Watermark: Zhao et al. (2024) Permute-and-Flip: An Optimally Robust and Watermarkable Decoder for LLMs
- Unbiased Watermark: Hu et al. (2023) Unbiased Watermark for Large Language Models
- Mark My Words: Piet et al. (2023) Mark My Words: Analyzing and Evaluating Language Model Watermarks
- PRC Watermark: Christ and Gunn. (2024) Pseudorandom Error-Correcting Codes
- Other methods...
Trade-offs among watermarking methods:

————————— Break (30 mins) —————————

Part III: Model Watermarking [slides]

Presenter: Lei Li

This part explores other related watermarking methods for AI models. We will discuss various methods used to safeguard intellectual property using watermarks.

Copyright protection against model extraction attack:
- CATER: He et al. (2022) CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
- DRW: Zhao et al. (2022) Distillation-Resistant Watermarking for Model Protection in NLP
- Ginsew: Zhao et al. (2023) Protecting Language Generation Models via Invisible Watermarking
- Radioactivity: Sander et al. (2024) Watermarking Makes Language Models Radioactive
Model detection against finetuning or pruning:
- DeepJudge: Chen et al. (2021) Copy, Right? A Testing Framework for Copyright Protection of Deep Learning Models
- Instructional Fingerprinting: Xu et al. (2024) Instructional Fingerprinting of Large Language Models

Part IV: Post-Hoc Detection [slides]

Presenter: Lei Li

This part focuses on post-hoc detection methods, detailing the use of binary classifiers and statistical outlier detection techniques. We will discuss the theoretical and empirical limitations.

Part V: Conclusion and Future Directions [slides]

In the final section, we will provide a summary of the tutorial, discuss the best practices for implementing text watermarking, and explore potential future developments in the field of text watermarking.

Part VI: Q&A

BibTeX

@article{zhao2024tutorials,
  author    = {Zhao, Xuandong and Wang, Yu-Xiang and Li, Lei},
  title     = {Watermarking for Large Language Model},
  journal   = {Tutorials of ACL},
  year      = {2024},
}

Original Source | Taken Source