Breaking Poisoned Classifier

This repository contains the code and pretrained models necessary to replicate the results in the following paper:

Poisoned classifiers are not only backdoored, they are fundamentally broken
Mingjie Sun, Siddhant Agarwal, J. Zico Kolter
Paper: https://arxiv.org/abs/2010.09080

In this paper, we demonstrate a novel attack against backdoor poisoned classifiers.

Figure 1: Overview of our attack. Given a poisoned classifier, we construct a *robustified smoothed* classifier using *Denoised Smoothing* (Salman et al., 2020). We then extract colors or cropped patches from adversarial examples of this robust smoothed classifier to construct novel triggers. These alternative triggers have similar or even higher attack success rate than the original backdoor.

Figure 2: Results of attacking a poisoned classifier on ImageNet, where the attack success rate of the original backdoor is 72.60%.

To start with, clone the repository:

git clone git@github.com:Eric-mingjie/breaking-poisoned-classifier.git
cd breaking-poisoned-classifier
git submodule update --init --recursive

Remember to download the pretrained denoisers from Denoised Smoothing (Follow step 4 here).

Overview of the Repository

Our code is based on the open source code of Saha et al (2020) and Salman et al. (2020). The contents of this repository are as follows:

Directory denoised-smoothing is the open source repository of the paper Denoised Smoothing.
Directory code contains the code for breaking poisoned classifiers with three backdoor attack methods: BadNet, HTBA, CLBD, as well as attacking poisoned classifiers from the TrojAI dataset.

In code, there are four directories: badnet, clbd, htba and trojAI, where we attack different poisoned classifiers. In each directory, we provide a notebook breaking-poisoned-classifier.ipynb which contains a demonstration of our attack method.

For detailed instructions, please go to the corresponding directories in code.

Citation

If you find this repository useful for your research, please consider citing our work:

@inproceedings{sun2020poisoned,
  title={Poisoned classifiers are not only backdoored, they are fundamentally broken},
  author={Sun, Mingjie and Agarwal, Siddhant and Kolter, J. Zico},
  booktitle={arXiv:2010.09080},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
code		code
denoised-smoothing @ 22b4447		denoised-smoothing @ 22b4447
github_figures		github_figures
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Breaking Poisoned Classifier

Overview of the Repository

Citation

About

Uh oh!

Releases

Packages

Languages

License

locuslab/breaking-poisoned-classifier

Folders and files

Latest commit

History

Repository files navigation

Breaking Poisoned Classifier

Overview of the Repository

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages