Network) accelerators [12][13][14] and GPUs
supporting 8 bits integer data etc. [15][16] These
designs reduce power consumption greatly and
have up to hundreds of speedup ratios compared
with CPU platforms. It turns out that hardware
with corresponding quantized CNN models can
achieve excellent computing performance as well
as high energy efficiency.
Compared with CPU and GPU, ASIC and
FPGA are more suitable to accelerate a specific
task for their reconfigurable feature. These
reconfigurable platforms can be customized by
setting pipeline and expanding parallelism degree,
lowering power consumption and raising
computing performance. Although ASIC owns
huge advantages over FPGA on power and speed,
expensive designing and manufacturing cost
limits it’s general application. On the contrast,
FPGA keeps a good balance between
performance, power, flexibility and expense due
to programmable feature and mature industry
design. Now, FPGA has been widely used in
cloud computing and intelligence computing by
Microsoft, Amazon and Alibaba [15][16],
becoming an important part of high- performance
computing.
The sound classification model which
focuses on specific speech instructions or
acoustics signal, is a basic component of
intelligent scenario analysis in both cloud and
edge end. Such applying circumstance needs a
low-power but high-performance computing
platform especially. Typical deep convolution
neural networks can do coarse sound
classification work well. However, there still
exists some space to accelerate CNN model and
reduce computing platform’s energy
consumption by quantization and customized
hardware design. To implement this power-
efficient sound classification computation
platform, we choose a typical CNN-based speech
classification model where weight value is +1 or
-1 and activation data is in full precision floating-
data format [17]. We design an accelerator based
on Xilinx XCKU-115 FPGA platform and run
this BWN (Binary Weight Network) model.
Compared with state-of-art CPU platform, our
accelerator achieves 18-300x throughput speed
up ratio and high energy efficiency. The main
contributions of this work are as follows:
1. We turn float-type feature data into fix-
point data each level by design space
exploration method. Model’s loss on
accuracy will be covered by the
performance of fix-point computing on
FPGA platforms.
2. We design a multi-PE BWN accelerator
on FPGA, which has shared weight storage,
balanced pipeline structure and low-delay
pipeline between CNN’s levels. Also, the
performance, power consumption and
energy efficiency of this accelerator are
discussed.
3. The target speech classification model is
tested under single thread, multi-thread and
multi-node environments to get sufficient
performance baseline. Compared with these
test results, our design has absolute
advantage on performance per watt and
throughput.
2. Neural Network Forwarding Quantization
When training a deep neural network,
researchers usually choose full-precision data
format to ensure best model accuracy. However,
in inference task, these parameters will not be
changed and therefore we can prune them in an
offline method. [17] raises an algorithm to
compress floating-point DNN parameters into
binary data, which is consisted of +1 and -1.
Compared with common DNN with floating-
point weight and activation, this compression
method not only sharply reduces parameter
storage, but also replaces multiplication and
division with add and minus. Less storage space
and multiplication mean less memory-consuming
energy and less computing cycles, leading to
faster lower power consumption and faster
working speed.
评论0