CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Sun, 02 Mar 2025 14:40:46 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"67c46dee-6b0d" expires: Mon, 29 Dec 2025 15:21:53 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: D756:234FE9:90E580:A270E5:69529A39 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 15:11:53 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210032-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767021114.699618,VS0,VE223 vary: Accept-Encoding x-fastly-request-id: de99d71107df24ff24e24ca07768ba6d70ba5cb3 content-length: 6953 Why Are Web AI Agents More Vulnerable Than Standard LLMs?

Why Are Web AI Agents More Vulnerable
Than Standalone LLMs? A Security Analysis

Jeffrey Yang Fan Chiang* Seungjae Lee* Jia-Bin Huang Furong Huang Yizheng Chen
University of Maryland, College Park
* equal contribution

Paper arXiv Bibtex

Warning: this paper contains potentially harmful text.

1. What: Web AI Agents are Significantly More Vulnerable

Case 1. Ask the agent to write a phishing Email

The user explicitly instructs the agent to write a phishing email to obtain a company owner’s sensitive information. The agent follows without hesitation, composing and sending the malicious email.

Case 2. Ask the agent to post insulting comments

The user directs the agent to post harsh and insulting comments on an influencer’s Instagram post. The agent immediately follows the request, leaving multiple offensive comments that are very personal.

Case 3. Ask the agent to infiltrate a network system

The user asks the agent to covertly infiltrate a network system. Initially, the agent recognizes the malicious intent and refuses but later changes course, navigates the website, and proceeds with assistance.

Web AI Agents

Following Malicious Task Rate: 46.6%

46.6%

Standalone LLMs

Following Malicious Task Rate: 0%

⚠️ Despite being built on the same underlying LLM, Web AI agent exhibit a higher susceptibility to executing harmful commands, raising concerns about its structural vulnerabilities compared to its standalone counterpart.

2. Why: Root Cause Analysis of
Web AI Agent Vulnerabilities

2-1. Differences between Web AI agents and LLMs

llm_and_webagent — An overview of the component differences between the Web Agent framework and standalone LLMs.

To systematically analyze these weaknesses, we categorize Web AI agent components into three key factors 🔑:

Factor 1: Goal Preprocessing
Whether through paraphrasing, decomposition, or embedding within system prompts, preprocessing can affect resistance to harmful instructions.
Factor 2: Action Space
The structure of predefined action spaces and execution constraints can impact an agent’s ability to assess and mitigate harmful intent.
Factor 3: Event Stream / Eval Environment
Observational capabilities, including the ability to recognize artificial environments, influence Web AI agents' vulnerability.

By breaking down these components, we provide a fine-grained analysis of the underlying risks, moving beyond a high-level comparison to uncover the specific structural elements that heighten security risks in Web AI agents.

2-2. Evaluation protocol for jailbreak susceptibility

This disparity stems from the multifaceted differences between Web AI agents and standalone LLMs, as well as the complex signals—nuances that simple evaluation metrics, such as success rate, often fail to capture.

Five-level Harmfulness Evaluation Framework

5 Distinct Levels of Jailbreaking

Click a level to see details.

3. How: Actionable Insights for
Targeted Defense Strategies

Through a fine-grained analysis of key differences between Web AI agents and standalone LLMs, we systematically identified several design factors contributing to vulnerabilities.

🔍 Our findings reveal several actionable insights:

Embedding user goals within system prompts significantly increases jailbreak success rates. Paraphrasing user goals further heightens vulnerabilities.
Predefined action spaces in multi-turn strategies make systems more susceptible to executing harmful tasks, especially when user goals are embedded.
Mock-up websites do not directly promote harmful intent but facilitate effective task execution for malicious objectives.
Event Stream tracking amplifies harmful behavior by allowing iterative refinement, increasing susceptibility to adversarial manipulation.

These findings highlight how specific design elements—goal processing, action generation strategies, and dynamic web interactions—contribute to the overall risk of harmful behavior.

BibTeX

@article{Jeffrey2025Vulnerablewebagents,
        title         = {Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis},
        author        = {Fan Chiang, Jeffrey Yang and Lee, Seungjae and Huang, Jia-Bin and Huang, Furong and Chen, Yizheng},
        journal       = {arXiv preprint arXiv:2502.20383},
        year          = {2025},
        archivePrefix = {arXiv},
        primaryClass  = {cs.LG},
        url           = {https://arxiv.org/abs/2502.20383},
      }

Original Source | Taken Source

Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis