CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Wed, 25 Sep 2024 19:19:13 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"66f46231-622f" expires: Mon, 29 Dec 2025 11:11:06 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: F697:2D64E0:8A5D0C:9B751F:69525F72 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 11:01:06 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210022-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767006066.454432,VS0,VE224 vary: Accept-Encoding x-fastly-request-id: 47c1581f92940c11cd425002c3deb112c4793abf content-length: 6158 Shadowcast

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Yuancheng Xu¹, Jiarui Yao², Manli Shu³, Yanchao Sun⁴, Zichu Wu⁵
Ning Yu⁶, Tom Goldstein¹, Furong Huang¹

¹University of Maryland, College Park    ²University of Illinois Urbana-Champaign
    ³Salesforce Research     ⁴Apple
    ⁵University of Waterloo     ⁶Netflix Eyeline Studios

Neurips, 2024

Paper Code

Responses of the clean and poisoned LLaVA-1.5 models. The poisoned samples are crafted using a different VLM, MiniGPT-v2.

Abstract

Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, yet their versatility raises significant security concerns. This study takes the first step in exposing VLMs' susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack method where poison samples are visually indistinguishable from benign images with matching texts. Shadowcast demonstrates effectiveness in two attack types. The first is Label Attack, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The second is Persuasion Attack, which leverages VLMs' text generation capabilities to craft narratives, such as portraying junk food as health food, through persuasive and seemingly rational descriptions. We show that Shadowcast are highly effective in achieving attacker's intentions using as few as 50 poison samples. Moreover, these poison samples remain effective across various prompts and are transferable across different VLM architectures in the black-box setting. This work reveals how poisoned VLMs can generate convincing yet deceptive misinformation and underscores the importance of data quality for responsible deployments of VLMs.

TL;DR: Shadowcast is the first stealthy data poisoning attack against Vision-Language Models (VLMs). The poisoned VLMs can disseminate misinformation coherently, subtly shifting users’ perceptions.

Method

The attacker’s goal is to manipulate the model into responding to original concept images with texts consistent with a destination concept, using stealthy poison samples that can evade human visual inspection.

A poison sample consists of a poison image that looks like a clean image from the destination concept, and a congruent text description. The text description is generated from the clean destination concept image using any off-the-shelf VLM. The poison image is crafted by introducing imperceptible perturbation to the clean destination concept image, to match an original concept image in the latent feature space.

When training on these poison samples, the VLM learns to associate the the original concept feature (in the poison image) with the destination concept texts, achieving the attacker's goal.

Illustration of how Shadowcast crafts a poison sample with visually matching image and text descriptions.

Below is another example of the poison sample where the original concept is "Junk Food" and the destination concept is "Healthy and Nutritious Food". The poison image looks like the clean destination concept image, and the text visually matches the image.

Experiment

We consider the following four tasks for poisoning attacks exemplifying the practical risks of VLMs, ranging from misidentifying political figures to disseminating healthcare misinformation.

The red ones are Label Attacks, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The green ones Persuasion Attacks, which leverage VLMs’ text generation capabilities to craft narratives, such as portraying junk food as health food, through persuasive and seemingly rational descriptions.

Attack tasks and their associated concepts.

We study both grey-box and black-box scenarios. In the grey-box setting, the attacker only has access to the VLM’s vision encoder (no need to access the whole VLM as in the white-box setting). In the black-box setting, the adversary has no access to the specific VLM under attack and instead utilizes an alternate open-source VLM. We evaluate the attack success rates under different poison ratios.

Grey-box results

Attack success rate of Label Attack for LLaVA1.5.

Attack success rate of Persuasion Attack for LLaVA-1.5.

Shadowcast begins to demonstrate a significant impact (over 60% attack success rate) with a poison rate of under 1% (or 30 poison samples)!

Black-box results

(Architecture transferability) Attack success rate for LLaVA-1.5 when InstructBLIP (left) and MiniGPTv2 (right) are used to craft poison images.

Shadowcast is still effective even across different VLM architectures!

Attack Robustness

What if the VLM uses image data augmentation (as a defense method) during training? Will the poisoned VLM exhibit targeted behaviour when different text prompts are used? Our evaluation shows positive results.

Attack success rate for augmented LLaVA-1.5 with/without poison augmentation

(Data augmentation) Attack success rate for LLaVA-1.5 trained with data augmentation, when poison images are crafted without augmentation (left) and with augmentation (right).

Attack success rates with diverse prompts

(Generalization to diverse prompts) Attack success rates when diverse prompts are used during test time.

Ethics and Disclosure

This study uncovers a pivotal vulnerability in the visual instruction tuning of large vision language models (VLMs), demonstrating how adversaries might exploit data poisoning to disseminate misinformation undetected. While the attack methodologies and objectives detailed in this research introduce new risks to VLMs, the concept of data poisoning is not new, having been a topic of focus in the security domain for over a decade. By bringing these findings to light, our intent is not to facilitate attacks but rather to sound an alarm in the VLM community. Our disclosure aims to elevate vigilance among VLM developers and users, advocate for stringent data examination practices, and catalyze the advancement of robust data cleaning and defensive strategies. In doing so, we believe that exposing these vulnerabilities is a crucial step towards fostering comprehensive studies in defense mechanisms and ensuring the secure deployment of VLMs in various applications.

BibTeX

@article{
        xu2024shadowcast,
        title={Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models},
        author={Xu, Yuancheng and Yao, Jiarui and Shu, Manli and Sun, Yanchao and Wu, Zichu and Yu, Ning and Goldstein, Tom and Huang, Furong},
        journal={arXiv preprint arXiv:2402.06659},
        year={2024}
      }

Original Source | Taken Source