没有合适的资源?快使用搜索试试~ 我知道了~
BWN-E 预生产1
需积分: 0 0 下载量 22 浏览量
2022-08-08
19:06:22
上传
评论
收藏 923KB DOCX 举报
温馨提示
试读
13页
BWN-E 预生产1
资源详情
资源评论
资源推荐
An Energy-efficient Speech Classification Convolution Neural
Network Accelerator Based on FPGA and Quantization
Abstract
Deep convolution neural networks (CNN) have been shown to own unique advantages in acoustic tasks
over recurrent neural networks (RNN). However, activation data in convolution neural networks is often
indicated in floating format, which is both time-consuming and power-consuming when be computed.
Quantization method can turn activation into fix-point, replacing floating computing into faster and more
energy-saving fix-point computing. Based on this method, this article provides a design space searching
method to quantize a binary weight neural network with least accuracy loss. We also design a specific
accelerator on FPGA platform, which is high-throughput and energy-efficient compared with CPU or
RNN-based accelerators.
Keywords: energy-efficient; reconfigurable computing; FPGA; quantization; sound classification
1. Introduction
Sound classification is a typical information
analyzing task which is widely used in military
and speech controlling. Since Deng, Yu et al
introduced RNN and LSTM (Long-Short Stage
Memory) acoustic model into speech recognition
and sound classification, LSTM has reached
series of excellent performance in this area [1][2].
However, deep neural networks based on RNN
are hard to be trained and parallelized due to
complex structure and recurrent computation.
When applied in actual missions, RNN models
usually demand high performance GPU and CPU
to satisfy computing power need. Such hardware
platforms have up to hundreds of watt power
consumption, which cannot meet requirements
for energy-sensitive circumstance. On the
contrast, CNN has been found to get excellent
performance on acoustic model [3][4][5]. Audio
files can be transferred into feature maps or
feature matrixes by wave-filtering algorithms
(such as Mel Frequency Cepstral Coefficients
algorithm) [6], acoustic CNN models then run on
these maps just like input images for computer
vision models. With tiny 3x3 or 5x5 convolution
kernels, CNNs can be trained and forward faster
than RNN for convoluting operation is easier to
be parallelized and accelerated than recurrent
computation. This advantage makes it possible to
accelerate an acoustic CNN model on specific
power-efficiency hardware platforms.
Gradient descent algorithm [7], which is
sensitive to numerical fluctuation [8], is widely
applied to train deep neural networks (DNN). To
pursue best DNN performance, it is necessary to
store data in full-precision format during training
process. However, floating data format needs
longer word-length to store and more circuit parts
to compute, leading to more energy consumption
and bigger circuit designing area [9]. Also, the
computing complexity of floating-point data
makes it difficult to reduce computing cycle
counts. Floating computation, although can keep
precision well, has become the bottleneck of
power-efficiency high performance computing.
Fortunately, some works [10][11] have
proven that floating-point data is unnecessary to
CNNs’ forwarding tasks, low precision
computing can achieve similar performance as
well. These works provide quantization methods,
turning weight and activation data into fix-point
data, integer data or even binary data with little
accuracy loss. Based on kinds of quantized CNN
models, there comes BNN (Binary Neural
Network) accelerators [12][13][14] and GPUs
supporting 8 bits integer data etc. These designs
reduce power consumption greatly and have up to
hundreds of speedup ratios compared with CPU
platforms. It turns out that hardware with
corresponding quantized CNN models can
achieve excellent computing performance as well
as high energy efficiency.
Compared with CPU and GPU, ASIC and
FPGA are more suitable to accelerate a specific
task for their reconfigurable feature. These
reconfigurable platforms can be customized by
setting pipeline and expanding parallelism degree,
lowering power consumption and raising
computing performance. Although ASIC owns
huge advantages over FPGA on power and speed,
expensive designing and manufacturing cost
limits it’s general application. On the contrast,
FPGA keeps a good balance between
performance, power, flexibility and expense due
to programmable feature and mature industry
design. Now, FPGA has been widely used in
cloud computing and intelligence computing by
Microsoft, Amazon and Alibaba [15][16],
becoming an important part of high- performance
computing.
The sound classification model which
focuses on specific speech instructions or
acoustics signal, is a basic component of
intelligent scenario analysis in both cloud and
edge end. Such applying circumstance needs a
low-power but high-performance computing
platform especially. Typical deep convolution
neural networks can do coarse sound
classification work well. However, there still
exists some space to accelerate CNN model and
reduce computing platform’s energy
consumption by quantization and customized
hardware design. To implement this power-
efficient sound classification computation
platform, we choose a typical CNN-based speech
classification model where weight value is +1 or
-1 and activation data is in full precision floating-
data format [17]. We design an accelerator based
on Xilinx XCKU-115 FPGA platform and run
this BWN (Binary Weight Network) model.
Compared with state-of-art CPU platform, our
accelerator achieves 18-300x throughput speed
up ratio and high energy efficiency. The main
contributions of this work are as follows:
1. We turn float-type feature data into fix-
point data each level by design space
exploration method. Model’s loss on
accuracy will be covered by the
performance of fix-point computing on
FPGA platforms.
2. We design a multi-PE BWN accelerator
on FPGA, which has shared weight storage,
balanced pipeline structure and low-delay
pipeline between CNN’s levels. Also, the
performance, power consumption and
energy efficiency of this accelerator are
discussed.
3. The target speech classification model is
tested under single thread, multi-thread and
multi-node environments to get sufficient
performance baseline. Compared with these
test results, our design has absolute
advantage on performance per watt and
throughput.
2. Neural Network Forwarding Quantization
When training a deep neural network,
researchers usually choose full-precision data
format to ensure best model accuracy. However,
in inference task, these parameters will not be
changed and therefore we can prune them in an
offline method. [17] raises an algorithm to
compress floating-point DNN parameters into
binary data, which is consisted of +1 and -1.
Compared with common DNN with floating-
point weight and activation, this compression
method not only sharply reduces parameter
storage, but also replaces multiplication and
division with add and minus. Less storage space
and multiplication mean less memory-consuming
energy and less computing cycles, leading to
faster lower power consumption and faster
working speed.
[18] brings out a method to turn activation
into binary format. Unlike parameter in neural
networks, activation data fluctuates numerically
with different input (such like input image or
input audio feature map). Although [18] still
keeps a good model accuracy on very deep CNNs
like VGGNet, great numerical precision loss
would be brought out binary activation data,
which may cause vital influence on some small-
size CNNs [19][20]. In this situation, turning
floating data into fix-point format can keep a
good balance between computing performance
and model accuracy: fix-point data computation
needs less computing cycles compared with
floating data, and, fix-point can adapt to data’s
numerical distribution by flexible allocation of
integer bitwise and decimal bitwise.
In Fig 1, when integer part is allocated with
more bitwise, it can indicate bigger data; and
when we give decimal part longer word-length, it
can improve numerical precision correspondingly.
However, increasing the length of fix-point data
format will add cycle counts to computation or
excess hardware’s limitation, so taking speed and
accuracy into account, it is important to find the
relatively best data format.
10.06
10.0625
True value
Fix-point value
1010.00001
Fix-point value in binary format
{
Integer part
{
Decimal part
20.06
20.060546875
True value
Fix-point value with extra
decimal bitwise
10100.0000011111
Fix-point value in binary format
Accuracy loss: 0.025
Accuracy loss: 0.000546875
Fig.1 How Bitwise Influences Numerical Precision
3. Speech Classification Model
3.1 Model Architecture and Weight
Binarization
This CNN-based speech recognition model
is trained on Tensorflow speech command dataset.
It can recognize the six sort of short speech
segment “up”, “down”, “yes”, “right”, “left” and
“unknown words”. This model first uses MFCC
algorithm turning an audio file into a float-type
tensor, whose dimension is 20x49x1. Then this
tensor will be sent into a convolution neural
network, which is consisted of two convolution
layers, three full-connected layers and binary
weight parameters. The detail information of
model architecture is shown in Fig 2. All
convolution kernel size is 3 and convolution
stride is 1. There is no padding and expansion
operation in this network, which is convenient for
us to accelerate. To be noticed that activation is
still in float format at this stage. Via softmax
function, this model outputs the possibility of six
type of labels.
After model parameters being fixed in the
training process, we can transfer float weight
value into (-1, 1). Fig 3 shows how we processing
weight data. We assume the distribution of
primitive parameter is normal distribution, the
numerical distributing range is then modified by
tanh function and a series of scale methods.
Finally, all parameters are discretized to -1 or +1.
This stage’s BWN model (activation data is still
剩余12页未读,继续阅读
郭逗
- 粉丝: 30
- 资源: 318
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0