| CARVIEW |
Our paper available at: “Ensemble of Narrow DNN Chains” (my Machine Learning course essay at Oxford).
Our code is publicly available at https://github.com/vtu81/ENDC.
We propose the Ensemble of Narrow DNN Chains (ENDC) framework:
- first train such narrow DNN chains that perform well on one-vs-all binary classification tasks,
- then aggregate them together by voting to predict for the multiclassification task.
Our ensemble framework could:
- utilize the abstract interpretability of DNNs,
- outperform traditional ML significantly on CIFAR-10,
- while being 2-4 orders of magnitude smaller than normal DNN and 6+ times smaller than traditional ML models,
- furthermore compatible with full parallelism in both the training and deployment stage.
Our empirical study shows that a narrow DNN chain could learn binary classifications well. Moreover, our experiments on three MNIST, Fashion-MNIST, CIFAR-10 confirm the potential power of ENDC. Compared with traditional ML models, ENDC, with the smallest parameter number, could achieve similar accuracy on MNIST and Fashion-MNIST, and significantly better accuracy on CIFAR-10.

Results
Overall Accuracy
| Dataset | Accuracy | Arch | #Param |
|---|---|---|---|
| MNIST | 93.40% | 1-channel | 1300 |
| Fashion-MNIST | 80.39% | 1-channel | 1300 |
| CIFAR-10 | 47.72% | 2-channel | 4930 |
- Each binary classifier’s parameter number is even smaller than the input entry (130 < 28x28 for MNIST and Fashion-MNIST, 493 < 3x32x32 for CIFAR-10)!
Comparison
We compare ENDC with traditional ML models:
- Logistic Regression (LR)
- Support Vector Classifier (SVC)
and normal DNNs. Their results are referenced from internet, see our paper for sources and details.
MNIST
| Method | Accuracy (%) | # Param |
|---|---|---|
| ENDC (ours) | 93.4 | 1.3K |
| LR | 91.7 | 7.7K+ |
| SVC | 97.8 | 7.7K+ |
| Normal DNN (LeNet) | 99.3 | 0.41M |
Fashion-MNIST
| Method | Accuracy (%) | # Param |
|---|---|---|
| ENDC (ours) | 80.4 | 1.3K |
| LR | 84.2 | 7.7K+ |
| SVC | 89.7 | 7.7K+ |
| Normal DNN (VGG-16) | 93.5 | 26M |
CIFAR-10
| Method | Accuracy (%) | # Param |
|---|---|---|
| ENDC (ours) | 47.7 | 4.8K |
| LR | 39.9 | 30.0K+ |
| SVC (PCA) | 40.2 | 0.44M+ |
| Normal DNN (VGG-16-BN) | 93.9 | 15M |
Per-class Accuracy
| Dataset | #0 (%) | #1 (%) | #2 (%) | #3 (%) | #4 (%) | #5 (%) | #6 (%) | #7 (%) | #8 (%) | #9 (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| MNIST | 97.04 | 97.53 | 96.51 | 88.91 | 95.52 | 92.38 | 90.29 | 94.55 | 88.71 | 91.67 |
| Fashion-MNIST | 80.60 | 92.90 | 77.60 | 77.60 | 75.50 | 92.30 | 40.70 | 81.30 | 90.00 | 95.50 |
| CIFAR-10 | 48.90 | 55.70 | 43.50 | 31.80 | 41.00 | 45.40 | 61.90 | 42.00 | 49.90 | 57.10 |
This is my on-going project only for demonstration, advised by Prof. Ting Wang at PSU.
Introduction
This is a project diverged from Backdoor Certification, you may first want to read that.
Backdoors within DNN models are dangerous, and an important line of work focus on detecting these potential backdoors. Some of these detection methods (e.g. Neural Cleanse) first reverse engineer (restore) the potential backdoor, then utilize anomaly detaction to tell if there is indeed a backdoor.
We propose an efficient heuristic algorithm that focuses on restoring the potential backdoor trigger in a given DNN. Our algorithm requires NO or very few clean inputs, while supporting both perturbation triggers (add the pattern to an image) and patch triggers (stamp a pattern onto an image). Our restored triggers reach high ASR and match the real trigger well.
Method
Intuitively, for a batch of $N$ inputs, searching for the potential backdoor trigger is similar to the following optimization:
\[\text{trigger} = \text{argmin}_{r} \sum_{i=1}^N \Big(f_{source}(x_i + r) - f_{target}(x_i + r)\Big)\]Nevertheless, directly optimizing the equation by Stochastic Gradient Descent is empirically difficult. As shown in the three following figures, the gradient information (orange) could be quite noisy:

Remember that CROWN relaxes NN to linear function, and as shown in the figures above, we may view the CROWN weight for each input dimension (blue) as an “approximate gradient” in a certain vicinity. And this “approximate gradient” is usually less noisy.
So we simply replace the exact gradients with the “approximate gradients”:
\[\mathbf r_{t+1} = \mathbf r_t - \text{lr} * \sum_{i=1}^N \nabla_{\mathbf x,approx} f(\mathbf x_i + \mathbf r_t)\]This makes the optimization (restoring or searching for triggers) much easier, and our experiments have confirmed this.
Results
Some restoration results:

- I’m still refining both the idea and experiments.
This is my on-going project only for demonstration, advised by Prof. Ting Wang at PSU.
Introduction
In the field of DNN security, adversarial attacks and backdoor attacks are the typical ones.
- Adversarial Attack: For a given input, the attacker adds an imperceptible noise(perturbation), leading to the DNN misclassifying the perturbed input; The adversarial perturbation is input-spcific, and usually obtained via PGD.
- Backdoor Attack: For all inputs, the attacker stamped a trigger pattern to them, leading to the DNN misclassifying all the stamped input; There are a variety of trigger types and implantation strategies, and backdoors are usually injected via data poisoning at training stage.
Certified robustness has been widely discussed, to end the arm race between adversarial attacks and defenses. We aim at taking the first step by introducing certification to stop the arm race of backdoor attacks and defenses.
Method
We first formulate the backdoor certification problem; No (perturbation-)backdoor exists in a norm ball $S$ can be expressed as the inequation:
\[\min_{r\in S}\max_i f_{source}(x_i + r) - f_{target}(x_i + r) > 0\]We base our work on an existing NN verifier, CROWN (LiRPA). As shown in the following figure, CROWN would relax the non-convex NN function $f$ into a linear function $\underline f$ w.r.t. the input dimensions, where $f(x + r) \ge \underline f(x + r)$ for any $r\in S$.

We use the lower bound linear function for certifying backdoor:
\[\min_{r\in S}\max_i \underline f_{source}(x_i + r) - \overline f_{target}(x_i + r) > 0\]Notice that (2) naturally yields a sufficient condition for (1). The following figure shows our backdoor certification process:

Each solid line corresponds to the linear relaxation $\underline f_{source}(x_i + r) - \overline f_{target}(x_i + r)$ of the NN given input $x_i$. After grouping the inputs, we are able to give a certification like: There is no perturbation trigger $r \in S$ that would lead to $\rho\%$ inputs being misclassified.
We could further introduce optimization, Bound and Branch to tighten the bounds.
Results
A metric for certified adversarial robustness is the $\textit{adversarial-attack-free radius}$, under which it’s impossible to perform adversarial attack. Likewise, we extend the metric to $\textit{backdoor-free radius}$ under which it’s impossible to perform backdoor attack.
Obviously: \(\textit{adversarial-attack-free radius} \le \textit{backdoor-free radius}\) and our initial experiment results show that for the same NN, there could be $>15\%$ improvement/gap between the two radius.
- I am still refinining the experiments.
What’s VQA?
Visual Qustion Answering (VQA) is a type of tasks, where given an image and a question about the image, a model is expected to give a correct answer.
For example, a visual image looks like this:

The question is: What color is the girl’s necklace?
Our model would generate the answer ‘white’.
What’s MindSpore?
MindSpore is a new AI framework developed by Huawei.
NaiveVQA: MindSpore & PyTorch Implementations of a Strong VQA Baseline
This repository contains a naive VQA model, which is our final project (mindspore implementation) for course DL4NLP at ZJU. It’s a reimplementation of the paper Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering.
Checkout branch
pytorchfor our pytorch implementation.
git checkout pytorch
Performance
| Framework | Y/N | Num | Other | All |
|---|---|---|---|---|
| MindSpore | 62.2 | 7.5 | 2.4 | 25.8 |
| PyTorch | 66.3 | 24.5 | 25.0 | 40.6 |
-
Per Question Type Accuracy (MindSpore)

-
Per Question Type Accuracy (PyTorch)

File Directory
data/annotations/– annotations data (ignored)images/– images data (ignored)questions/– questions data (ignored)results/– contains evaluation results when you evaluate a model with./evaluate.ipynbclean.py– a script to clean uptrain.jsonin bothdata/annotations/anddata/questions/align.py– a script to sort and align up the annotations and questions
resnet/– resnet directory, cloned from pytorch-resnetlogs/– should contain saved.pthmodel filesconfig.py– global configure filetrain.py– trainingview-log.py– a tool for visualizing an accuracy\epoch figureval_acc.png– a demo for the accuracy\epoch figuremodel.py– the major modelpreprocess-image.py– preprocess the images, using ResNet152 to extract features for further usagespreprocess-image-test.py– to extract images in the test setpreprocess-vocab.py– preprocess the questions and annotations to get their vocabularies for further usagesdata.py– dataset, dataloader and data processing codeutils.py– helper codeevaluate.ipynb– evaluate a model and visualize the resultcover_rate.ipynb– calculate the selected answers’ coverageassets/PythonHelperTools/(currently not used)vqaDemo.py– a demo for VQA dataset APIsvqaTools/
PythonEvaluationTools/(currently not used)vqaEvalDemo.py– a demo for VQA evaluationvaqEvaluation/
README.md
Prerequisite
- Free disk space of at least 60GB
- Nvidia GPU / Ascend Platform
Notice: We have successfully tested our code with MindSpore 1.2.1 on Nvidia RTX 2080ti. Thus we strongly suggest you use MindSpore 1.2.1 GPU version. Since MindSpore is definitely not stable, any version different from 1.2.1 might cause failures.
Also, due to some incompatibility among different versions of MindSpore, we still can’t manage to run the code on Ascend now. Fortunately, people are more possible to have an Nvidia GPU rather than an Ascend chip :)
Quick Begin
Get and Prepare the Dataset
Get our VQA dataset (a small subset of VQA 2.0) from here. Unzip the file and move the subdirectories
annotations/images/questions/
into the repository directory data/.
Prepare your dataset with:
# Only run the following command once!
cd data
# Save the original json files
cp annotations/train.json annotations/train_backup.json
cp questions/train.json questions/train_backup.json
cp annotations/val.json annotations/val_backup.json
cp questions/val.json questions/val_backup.json
cp annotations/test.json annotations/test_backup.json
cp questions/test.json questions/test_backup.json
python clean.py # run the clean up script
mv annotations/train_cleaned.json annotations/train.json
mv questions/train_cleaned.json questions/train.json
python align.py # run the aligning script
mv annotations/train_cleaned.json annotations/train.json
mv annotations/val_cleaned.json annotations/val.json
mv annotations/test_cleaned.json annotations/test.json
mv questions/train_cleaned.json questions/train.json
mv questions/val_cleaned.json questions/val.json
mv questions/test_cleaned.json questions/test.json
The scripts upon would
- clean up your dataset (there are some images whose ids are referenced in the annotation & question files, while the images themselves don’t exist!)
- align the questions’ ids for convenience while training
Preprocess Images
You actually don’t have to preprocess the images yourself. We have prepared the prerocessed features file for you, feel free to download it through here (the passcode is ‘dl4nlp’). You should download the
resnet-14x14.h5(42GB) file and place it at the repository root directory. Once you’ve done that, skip this chapter!
Preprocess the images with:
python preprocess-images.py
- If you want to accelerate it, tune up
preprocess_batch_sizeatconfig.json - If you run out of CUDA memory, tune down
preprocess_batch_sizeataconfig.json
The output should be ./resnet-14x14.h5.
Preprocess Vocabulary
The vocabulary only depends on the train set, as well as the
config.max_answers(the number of selected candidate answers) you choose.
Preprocess the questions and annotations to get their vocabularies with:
python preprocess-vocab.py
The output should be ./vocab.json.
Train
Now, you can train the model with:
python train.py
During training, a ‘.ckpt’ file and a ‘.json’ file would be saved under ./logs. The .ckpt file contains the parameters of your model and can be reloaded. The .json file contains training metainfo records.
View the training process with:
python view-log.py <path to .json train record>
The output val_acc.png should look like these:

(a real train of PyTorch implementation)

(a real train of MindSpore implementation)
To continue training from a pretrained model, set the correct
pretrained_model_pathand thepretrainedto True inconfig.py.
Test Your Model
Likewise, you need to preprocess the test set’s images before testing. Run
python preprocess-images-test.py
to extract features from test/images. The output should be ./resnet-14x14-test.h5.
Likewise, we have prepared the
resnet-14x14-test.h5for you. Download it here (the passcode is ‘dl4nlp’)
We provide evaluatie.ipynb to test/evaluate the model. Open the notebook, and set the correct eval_config, you’re good to go! Just run the following cell one by one, you should be able to visualize the performance of your trained model.
More Things
- To calculate the selected answers’ cover rate (determined by
config.max_answers), checkcover_rate.ipynb.
Acknowledgement
The current version of codes are translated from pytorch branch, where some codes are borrowed from repository pytorch-vqa.
Authors: Haoyang Shi, Tinghao Xie
This repository contains our course project for Compiler Principle at ZJU.
Differences with C
- type system: char, int, double and n-dimensional array type; Pointers and struct type is not supported in this version.
- no controled jumps, gotos and labels , i.e. break, continue and switch statements are not supported.
- pre-compile MARCO not supported
scanfandprintfare automaticly declared and linked with libc in runtime- calling convention of
scanfmodified. e.g. you shall usescanf("%d",i)to read the value into variable i and drop the&symbol. forloop snippet is switched to pascal-likefor(i: 0 to n){}, where i is only seen within the scope of this loop- unary operators not supported
try out the test samples to get a better understanding of the gramma.
Prerequsite
- flex 2.5+
- bison 3.0+
- clang 7.0+
- llvm 7.0+
which is easily accessible via apt and other package managers.
It has been successfully tested with
- flex 2.6.4 + bison 3.0.4 + llvm-12 on Ubuntu 18.04 (x86_64)
- flex 2.5.35 + bison 3.7.6 + llvm-12 on MacOS (x86_64)
Install
Clean the directory with:
make clean
Install with:
make
If you want to install with a specific version of bison, install with:
make BISON=[YOUR-BISON-PATH]
If you are installing RCC with LLVM12 on MacOS, install with:
make DEFINE='-D MACOS'
Usage
./rcc src_file
./a.out
The generated ELF object file and executable are named output.o and a.out respectively by default.
]]>An encrypted (enclave-based) heterogeneous calculation protocol based on Nvidia CUDA and Intel SGX, with a simple sample of matrix multiplication using CUBLAS, designed and implemented by Tinghao Xie, Haoyang Shi, Zihang Li.
Enchecap illustration:

Enchecap illustration (with protected and trusted regions):

Enchecap performance:

To build the project, you’ll need to install and configure:
- SGX SDK
- CUDA Toolkit
- CUDA Samples
, then set your CUDA_PATH and INCLUDES in Makefile, and make sure your SGX environment activated by
source /PATH_OF_SGXSDK/environment
(check SGX SDK official site for more details)
Then build with:
make # SGX hardware mode
make SGX_MODE=SIM # SGX simulation mode
(check README_SGX.txt for more details)
Your linux OS version might be limited by SGX SDK, check https://01.org/intel-software-guard-extensions for more details. We’re using Ubuntu 18.04 x86_64, and cannot guarantee it work successfully on other platforms. We are also compiling with gcc version 7.5.0 and nvcc v11.1, which do not pose such strict limitations compared to Intel SGX.
To run the project, you’ll need to install and configure correctly:
- SGX PSW
- SGX driver, if you build it in hardware mode and that your CPU & BIOS support SGX
- CUDA Driver (of course you must have an Nvidia GPU)
Run with:
./app
TODO
Notice: We have not implemented the user-server code into the library/sample now, since it’s similar to the host-device part of our protocol. For now, we just implement the host-device part. In this repository, we show how to wrap up the cudaMemcpy() into secureCudaMemcpy(), doing implicit en/decryption for handy secure deployment.
Phase I: Initialization
- Create an enclave
- Enclave generates its own keys (generation is yet an empty shell now), then broadcasts its public key to user & device
- GPU generates its own keys (generation is yet an empty shell now), then broadcasts its public key to host & user
Phase II: Calculation
- En/Decrypt in enclave (decrypt with SGX’s private key, encrypt with GPU’s public key)
- En/Decrypt on GPU (decrypt with GPU’s private key, encrypt with SGX’s public key)
Future Work
- The GPU’s and SGX’s keys are both simply welded in the code currently, need FIX
- The current RSA en/decrypt algorithm is yet extremely naive! (further works include regrouping, big number supports…)
- Add the user-server part into the sample, including
- Remote attestation with Intel SGX
- Broadcast his/her public key to the enclave and GPU, meanwhile record their public keys
- Send encrypted data to the server
- Receive encrypted results from the server
- Intergration with real industrial work based on CUDA (like PyTorch)
- Intergration with a real trusted GPU (far from our reach now)
A group project in Computer Graphics course, including a simple but fully-featured 3D engine based on native WebGL and a wonderful flying game demo, live available here. Feel free to check out the source code at GitHub.
A screenshot in navigation mode:
