| CARVIEW |
Select Language
HTTP/2 200
content-security-policy: frame-ancestors 'none'
x-cloud-trace-context: 37a15af64b11149022e3c64cf56fe928
server: Google Frontend
via: 1.1 google, 1.1 varnish, 1.1 varnish, 1.1 varnish
cache-control: max-age=3600
content-type: text/html; charset=utf-8
x-frame-options: SAMEORIGIN
last-modified: Fri, 08 Aug 2025 00:42:38 GMT
accept-ranges: bytes
age: 132798
date: Thu, 01 Jan 2026 00:31:38 GMT
x-served-by: cache-lga21993-LGA, cache-lga21970-LGA, cache-bom-vanm7210035-BOM
x-cache: MISS, HIT, MISS
x-timer: S1767227498.381051,VS0,VE200
content-length: 46944
[2504.04377] PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Skip to main content
[v1] Sun, 6 Apr 2025 06:09:21 UTC (7,556 KB)
[v2] Thu, 7 Aug 2025 14:05:19 UTC (1,413 KB)
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors.
Donate
Computer Science > Computation and Language
arXiv:2504.04377 (cs)
[Submitted on 6 Apr 2025 (v1), last revised 7 Aug 2025 (this version, v2)]
Title:PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Authors:Priyanshu Kumar, Devansh Jain, Akhila Yerukola, Liwei Jiang, Himanshu Beniwal, Thomas Hartvigsen, Maarten Sap
View a PDF of the paper titled PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, by Priyanshu Kumar and 6 other authors
View PDF
HTML (experimental)
Abstract:Truly multilingual safety moderation efforts for Large Language Models (LLMs) have been hindered by a narrow focus on a small set of languages (e.g., English, Chinese) as well as a limited scope of safety definition, resulting in significant gaps in moderation capabilities. To bridge these gaps, we release POLYGUARD, a new state-of-the-art multilingual safety model for safeguarding LLM generations, and the corresponding training and evaluation datasets. POLYGUARD is trained on POLYGUARDMIX, the largest multilingual safety training corpus to date containing 1.91M samples across 17 languages (e.g., Chinese, Czech, English, Hindi). We also introduce POLYGUARDPROMPTS, a high quality multilingual benchmark with 29K samples for the evaluation of safety guardrails. Created by combining naturally occurring multilingual human-LLM interactions and human-verified machine translations of an English-only safety dataset (WildGuardMix; Han et al., 2024), our datasets contain prompt-output pairs with labels of prompt harmfulness, response harmfulness, and response refusal. Through extensive evaluations across multiple safety and toxicity benchmarks, we demonstrate that POLYGUARD outperforms existing state-of-the-art open-weight and commercial safety classifiers by 5.5%. Our contributions advance efforts toward safer multilingual LLMs for all global users.
| Comments: | Accepted to COLM 2025 Main Conference |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2504.04377 [cs.CL] |
| (or arXiv:2504.04377v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2504.04377
arXiv-issued DOI via DataCite
|
Submission history
From: Priyanshu Kumar [view email][v1] Sun, 6 Apr 2025 06:09:21 UTC (7,556 KB)
[v2] Thu, 7 Aug 2025 14:05:19 UTC (1,413 KB)
Full-text links:
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
View a PDF of the paper titled PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages, by Priyanshu Kumar and 6 other authors
References & Citations
export BibTeX citation
Loading...
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.