PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Kumar, Priyanshu; Jain, Devansh; Yerukola, Akhila; Jiang, Liwei; Beniwal, Himanshu; Hartvigsen, Thomas; Sap, Maarten

Computer Science > Computation and Language

arXiv:2504.04377 (cs)

[Submitted on 6 Apr 2025 (v1), last revised 7 Aug 2025 (this version, v2)]

Title:PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Authors:Priyanshu Kumar, Devansh Jain, Akhila Yerukola, Liwei Jiang, Himanshu Beniwal, Thomas Hartvigsen, Maarten Sap

View PDF HTML (experimental)

Abstract:Truly multilingual safety moderation efforts for Large Language Models (LLMs) have been hindered by a narrow focus on a small set of languages (e.g., English, Chinese) as well as a limited scope of safety definition, resulting in significant gaps in moderation capabilities. To bridge these gaps, we release POLYGUARD, a new state-of-the-art multilingual safety model for safeguarding LLM generations, and the corresponding training and evaluation datasets. POLYGUARD is trained on POLYGUARDMIX, the largest multilingual safety training corpus to date containing 1.91M samples across 17 languages (e.g., Chinese, Czech, English, Hindi). We also introduce POLYGUARDPROMPTS, a high quality multilingual benchmark with 29K samples for the evaluation of safety guardrails. Created by combining naturally occurring multilingual human-LLM interactions and human-verified machine translations of an English-only safety dataset (WildGuardMix; Han et al., 2024), our datasets contain prompt-output pairs with labels of prompt harmfulness, response harmfulness, and response refusal. Through extensive evaluations across multiple safety and toxicity benchmarks, we demonstrate that POLYGUARD outperforms existing state-of-the-art open-weight and commercial safety classifiers by 5.5%. Our contributions advance efforts toward safer multilingual LLMs for all global users.

Comments:	Accepted to COLM 2025 Main Conference
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2504.04377 [cs.CL]
	(or arXiv:2504.04377v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.04377

Submission history

From: Priyanshu Kumar [view email]
[v1] Sun, 6 Apr 2025 06:09:21 UTC (7,556 KB)
[v2] Thu, 7 Aug 2025 14:05:19 UTC (1,413 KB)

Computer Science > Computation and Language

Title:PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators