没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Imperial College London
Department of Electrical and Electronic Engineering
Final Year Project Report 2017
Project Title: PYNQ Classification - Python on Zynq FPGA for Neural Networks
Student: Erwei Wang
CID: 00816456
Course: 4T
Project Supervisor: Prof P.Y.K. Cheung
Second Marker: Prof G.A. Constantinides
ABSTRACT
Convolutional Neural Networks (CNNs) have achieved a significant amount of success in solving a wide range
of classification problems. Traditionally, embedded CNN application prototypes have been implemented on
CPU or GPU based machines due to short development time, but sacrificing performance and energy effi-
ciency. However, recent advancements in high level synthesis (HLS) tools and PYNQ development boards
are making the prototyping effort on FPGA comparable to that of CPUs or GPUs, making them a good op-
tion for prototyping embedded CNN applications. This report presents a fast FPGA prototyping framework,
which is an Open Source framework designed to enable fast deployment of embedded CNN applications on
FPGA platforms. My framework provides HLS CNN layers, which can be parameterised for a wide range of
network specifications and provides state-of-the-art performance at low power consumption. By comparing
with PYNQ ARM CPU implementation, my CIFAR-10 prototype shows up to 43x acceleration, while maintain-
ing a 73.7% classification accuracy and 1.953 frames/J energy consumption.
2
ACKNOWLEDGEMENT
I would like to express my sincere gratitude to my supervisor Professor P.Y.K. Cheung, who not only provided
unstinting support and invaluable guidance throughout my four year’s study in Imperial College, but also sparked
my motivation to pursue a career in scientific research.
I would like to thank Dr. Peter Ogden for providing the PYNQ FPGA data transfer API design, which becomes the
backbone of my project’s architecture.
I would also like to thank Michaela Blott, Cathal McCabe, Giulio Gambardella and Andrea Solazzo from Xilinx
Ireland Lab for the warm hospitality on our visit, as well as invaluable guidance on the techniques to optimise
CNN implementation on FPGA. Thanks also to Patrick Lysaght from Xilinx Lab, San Jose, for initiating the Pynq
project and provide all the support I needed to make this a successful project.
Special thanks to Stylianos Venieris and Junyi Liu from Circuits and Systems lab, as well as Aaron Zhao and Daryl
Mah, who provided insightful ideas on the project and report.
3
ACRONYMS AND ABBREVIATIONS
AI Artificial Intelligence
API Application Programming Interface
ASIC Application-specific Integrated Circuit
BLAS Basic Linear Algebra Subprograms
BRAM Blocked Random Access Memory
BNN Binarised Neural Network
CNN Convolution Neural Network
CPU Central Processing Unit
DAG Directed Acyclic Graph
DSP Digital Signal Processor
DMA Direct Memory Access
FPGA Field-programmable Gate Array
GPU Graphics Processing Unit
HLS High Level Synthesis
HPC High-performance Computing
HTC High-throughput Computing
IP Intellectual Property
NIN Network in Network
NN Neural Network
OS Operating System
PYNQ Python Productivity for Zynq
RAM Random Access Memory
ReLU Rectified Linear Unit
RTL Register Transfer Language
SDF Synchronous Dataflow
SDFG Synchronous Dataflow Graph
SoC System on a Chip
4
CONTENTS
1. Introduction 8
2. Project Scope 10
3. Background 12
3.1. What is a Convolution Neural Network (CNN)? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2. Framework High Level Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1. Why Do We Need Another High Level CNN Framework? . . . . . . . . . . . . . . . . . . . . . . 14
3.2.2.
Existing Field-programmable Gate Array (
FPGA
)
CNN
Frameworks and Their High Level
Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.3. Installing CNN Frameworks on Embedded ARM Chipset . . . . . . . . . . . . . . . . . . . . . . 15
3.3. FPGA Layer Intellectual Property (IP) Library Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.1. Why Is FPGA Good at Accelerating CNN? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2. Related Works on Accelerating 2D Convolution on FPGAs . . . . . . . . . . . . . . . . . . . . . 16
3.4. Data Quantisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.1. Why Do We Quantise CNN? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.2. Existing CNN Quantisation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5. PYNQ Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5.1. What is Python Productivity for Zynq (PYNQ) Platform? . . . . . . . . . . . . . . . . . . . . . . 19
3.5.2. Alternative Platforms for CNN Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4. Implementation - Overall Architecture 20
5. Implementation - ARM Linux Operating System (OS) Side 21
5.1. Framework Installation and Setup on PYNQ Linux OS . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.1. Caffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.2. TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.3. Theano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.4. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.
Application Programming Interface (
API
) for
FPGA
-Central Processing Unit (
CPU
) Data Transmission
24
6. Implementation - Zynq FPGA Side 25
6.1. Data Streaming Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2. Quantisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2.1. 32-bit Floating-point Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2.2. Fixed-point Data Quantisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.3. FPGA Layer IPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3.1. Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3.2. Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3.3. Fully-connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5
剩余64页未读,继续阅读
资源评论
sunsanstone
- 粉丝: 0
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功