You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains re-implemented code for the paper "Identity Mappings in Deep Residual Networks" (https://arxiv.org/abs/1603.05027). This work enables training quality 1k-layer neural networks in a super simple way.
Acknowledgement: This code is re-implemented by Xiang Ming from Xi'an Jiaotong Univeristy for the ease of release.
[a] @article{He2016,
author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
title = {Identity Mappings in Deep Residual Networks},
journal = {arXiv preprint arXiv:1603.05027},
year = {2016}
}
[b] @article{He2015,
author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
title = {Deep Residual Learning for Image Recognition},
journal = {arXiv preprint arXiv:1512.03385},
year = {2015}
}
The experiments in the paper were conducted in Caffe, whereas this code is re-implemented in Torch. We observed similar results within reasonable statistical variations.
To fit the 1k-layer models into memory without modifying much code, we simply reduced the mini-batch size to 64, noting that results in the paper were obtained with a mini-batch size of 128. Less expectedly, the results with the mini-batch size of 64 are slightly better:
mini-batch
CIFAR-10 test error (%): (median (mean+/-std))
128 (as in [a])
4.92 (4.89+/-0.14)
64 (as in this code)
4.62 (4.69+/-0.20)
Curves obtained by running this code with a mini-batch size of 64 (training loss: y-axis on the left; test error: y-axis on the right):