卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)

所需积分/C币:50 2018-06-13 11:13:02 1.88MB PDF
收藏 收藏
举报

High computational complexity hinders the widespread usage of Convolutional Neural Networks (CNNs), especially in mobile devices. Hardware accelerators are arguably the most promising approach for reducing both execution time and power consumption. One of the most important steps in accelerator deve
To my family CONTENTS List of Figures List of tables Abstract vii Acknowledgments 1 Introduction 2 Convolutional neural networks 2. 1 Training and Inference 2.2 Layer Types 2.3 Applications m图囧叼叮 2.4 Computational Complexity and Memory Requirements 2.5 ImageNet Competition 2.6 Neural Networks with Limited Numerical precision 3 Related work 19 3.1 Network Approximation 3.2 Accelerators 21 4 Fixed Point Approximation 25 4.1 Baseline Convolution.l Neural Networks 25 4.2 Fixed point format 26 4.3 Dynamic Range of Parameters and Layer Outputs 27 4. Results 5 Dynamic Fixed Point Approximation 32 5.1 Mixed precision fixed point 132 5.2 Dynamic Fixed Point 5.3 Results 6 Minifloat Approximation 38 6. 1 Motivation 6.2 IEEE-754 Single precision standard 6.3 Minifoat number format 6.4 Data Path for Accelerator 6.5 Results 6.6 Comparison to Previous Work 豳四 7 Turning Multiplications Into Bit Shifts 43 7.1 Multiplier-free Arithmetic 43 7.2 Maximal Number of shifts 7.3 Data path for accelerator 7. 4 Results 8 Comparison of Different Approximations 8.1 Fixed Point Approximation 8.2 Dynamic Fixed Point approximation 49 8.3 Minifloat Al pproximation 8.4 Summary 9 Ristretto: An Approximation Framework for Deep CNN 5)1 9.1 From Caffe to ristretto 9.2 Quantization Flow 9. 3 Fine-tuning 9.4 Fast Forward and Backward Propagation 9.5 Ristretto From a User Perspective 54 9.6 Release of ristretto 9.7 Future Work LIST OF FIGURES 2.1 Network architecture of AlexNet 2.2 Convolution between input feature maps and filters 2. 3 Pseudo-code for convolutional layer 2.4 Fully connected layer with activation 2.5 Pooling laver 2.6 Parameter size and arithmetic operations in CaffeNet and VGG-16 2. 7 ImageNet networks: accuracy vs size 2.8 Inception architecture 2.9 Data path with limited numerical precision 2.10 Quantized layer 3.1 AsiC VS FPGa VS GPU 4.1 Dynamic range of values in LeNet 4.2 Dynamic range of values in CaffeNet 4.3 Fixed point results 30 5. 1 Fixed point data path 5.2 Dynamic fixed point representation 134 5. 3 Static vs dynamic fixed point 35 6. 1 Minifloat number representation 6.2 Minifloat data path 6.3 Minifoat results 7.1 Representation for integer-power-of-two parameter 7.2 Multiplier-free data path 8.1 Approximation of leNet 8.2 Approximation of CIFAR-10 8.3 Approximation of cafreNet Quantization flow 9.2 Fine-tuning with full precision weights 9.3 Network brewing with Caffe 9.4 Network quantization with ristretto LIST OF T△BLES 3.1 ASIC VS FPGA VS GPU 4.1 Fixed point results 30 5.1 Dynamic fixed point quantization 5.2 Dynamic fixed point results 1 Minifoat results 7. 1 Multiplier-free arithmetic results 5 AbSTRact Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks Convolutional neural networks(CNn have achieved major breakthroughs in recent years Their performance in computer vision have matched and in some areas even surpassed hu man capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of- art networks require billions of arithmetic operations and millions of parameters To enable enbedded devices such as smart phones, Google glasses and monitoring cameras with the astonishing power of deep learning, dedicated hardware accelerators can be used to decrease both execution time and power consumption. In applications where fast connection to the cloud is not guaranteed or where privacy is important, computation needs to be done locally. Many hardware accelerators for deep neural networks have been proposed recently. A first important step of accelerator design is hardware-oriented approximation of deep networks, which enables energy-efficient inference We present ristretto, a fast and automated framework for Cnn approximation. ristret- to simulates the hardware arithmetic of a custom hardware accelerator The framework reduces the bit-width of network parameters and outputs of resource-intense layers, which reduces the chip area for multiplication units significantly. Alternatively, ristretto can remove the need for multipliers altogether, resulting in an adder-only arithmetic. TH tool fine-tunes trimmed networks to achieve high classification accuracy Since training of deep neural networks can be time-consuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network Given a maximum tolerance of 1%, ristretto can successfully condense CaffeNet and queezenet to 8-bit. The code for Ristretto is available ACKNOWLEdgMENts First and foremost, I want to thank my major advisor Professor Soheil Chiasi for his guidance, inspiration and encouragement that he gave me during my graduate studies Thanks to him, I had the privilege to do research in a dynamic research group with excellent students. He provided ny with all the ideas, equipment and mentorship I needed for writing this thesis Second i would like to thank the graduate students at UC Davis who contributed to my research. I was fortunate to work together with members of the LEPS Group, the Architecture Group as well as the vlsi Computation Lab. Most notably, Mohammad otamedi, Terry O Neill, Dan Fong and Joh Pimentel helped me with advice, technical knowledge and paper editing. I consider myself extremely lucky that I had their support during my graduate studies, and I look forward to continuing our friendship in the years to come. I'm humbled by the opportunity to do research with Mohammad Motamedi a truly bright PhD student. Our early joint research projects motivated me to solve challenging problems and to strive for extraordinary research results Third, I am grateful to all members of my thesis committee: Professor John Owens Venkatesh Akella, and Yong J. Lee. Professor Owens spurred me on to conduct an in-depth analysis of related work; additionally he gave me valuable improvement suggestions for my thesis. Early in this research project, Professor Akella guided me in reading papers on hardware acceleration of neural networks. Professor Lee helped me significantly to improve the final version of this document Finally I'd like to thank my family for supporting my studies abroad, and especially my girlfriend Thirza. I am grateful to my family and friends for always motivating me to pursue my academic goals; without them i would not have come this far

...展开详情
试读 73P 卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)
立即下载 低至0.43元/次 身份认证VIP会员低至7折
一个资源只可评论一次,评论内容不能少于5个字
lxk2017 只是论文,不是代码
2018-06-15
回复
上传资源赚积分or赚钱
最新推荐
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks) 50积分/C币 立即下载
1/73
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第1页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第2页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第3页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第4页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第5页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第6页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第7页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第8页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第9页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第10页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第11页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第12页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第13页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第14页
卷积神经网络模型压缩技术(Hardware-oriented Approximation of Convolutional Neural Networks)第15页

试读结束, 可继续读5页

50积分/C币 立即下载 >