【免费】BWNdocE预生产1资源-CSDN文库

需积分: 0 172 浏览量 2022-08-08 22:51:37 上传评论收藏 922KB DOCX 举报

【BWN doc E 预生产1】是一个与IT领域相关的文档，主要涉及的是在预生产环境中使用C#编程语言的能效优化技术，特别是针对语音分类任务的深度学习算法。文档提到了一种基于FPGA（Field-Programmable Gate Array）的能源效率优化的卷积神经网络（CNN）加速器设计，结合量化方法提高计算效率。 **1. 卷积神经网络（CNN）在声学任务中的优势** 相比于循环神经网络（RNN），如长短期记忆网络（LSTM），CNN在声学任务上表现出独特的优势。CNN能够处理序列数据，且更易于训练和并行化，特别适用于语音识别和声音分类等任务。 **2. 数据量化技术** CNN中通常使用浮点格式表示激活数据，这在计算时既耗时又耗能。量化方法将激活数据转换为定点数，从而将浮点计算转变为更快、更节能的定点计算。该文档提出了一种在最小化精度损失的情况下对二值权重神经网络进行量化的设计空间搜索方法。 **3. FPGA加速器设计** 文档中设计了一个在FPGA平台上运行的特定加速器，与CPU或基于RNN的加速器相比，它具有高吞吐量和能效。FPGA因其可重构性，能根据需求定制硬件逻辑，因此在实现特定计算任务时能提供更高的性能和更低的功耗。 **4. 能效优化** 传统的GPU和CPU虽然计算能力强大，但在能耗方面较高，不适合能源敏感的应用场景。而本文档提出的方案针对这一问题，通过在FPGA上实现专为声音分类设计的加速器，降低了对高性能硬件的需求，满足了低功耗的要求。 **5. 声音分类** 声音分类是广泛应用的信息分析任务，如军事和语音控制。通过梅尔频率倒谱系数（MFCC）等波形过滤算法，音频文件可以转化为特征映射，供CNN模型进行处理，类似于图像输入的处理方式。 **6. LSTM与CNN的对比** 尽管LSTM在语音识别和分类领域取得了优异表现，但其复杂的结构和递归计算导致训练和并行化困难。相比之下，CNN结构简单，更适合在资源有限的环境下运行。【BWN doc E 预生产1】重点探讨了如何利用C#语言和FPGA平台，结合量化技术，优化CNN在语音分类任务中的性能，以实现更高能效的计算。这一方法对于资源受限和能源敏感的环境具有重要应用价值。

资源详情

资源评论

资源推荐

An Energy-efficient Speech Classification Convolution Neural

Network Accelerator Based on FPGA and Quantization

Abstract

Deep convolution neural networks (CNN) have been shown to own unique advantages in acoustic tasks

over recurrent neural networks (RNN). However, activation data in convolution neural networks is often

indicated in floating format, which is both time-consuming and power-consuming when be computed.

Quantization method can turn activation into fix-point, replacing floating computing into faster and more

energy-saving fix-point computing. Based on this method, this article provides a design space searching

method to quantize a binary weight neural network with least accuracy loss. We also design a specific

accelerator on FPGA platform, which is high-throughput and energy-efficient compared with CPU or

RNN-based accelerators.

Keywords: energy-efficient; reconfigurable computing; FPGA; quantization; sound classification

1. Introduction

Sound classification is a typical information

analyzing task which is widely used in military

and speech controlling. Since Deng, Yu et al

introduced RNN and LSTM (Long-Short Stage

Memory) acoustic model into speech recognition

and sound classification, LSTM has reached

series of excellent performance in this area [1][2].

However, deep neural networks based on RNN

are hard to be trained and parallelized due to

complex structure and recurrent computation.

When applied in actual missions, RNN models

usually demand high performance GPU and CPU

to satisfy computing power need. Such hardware

platforms have up to hundreds of watt power

consumption, which cannot meet requirements

for energy-sensitive circumstance. On the

contrast, CNN has been found to get excellent

performance on acoustic model [3][4][5]. Audio

files can be transferred into feature maps or

feature matrixes by wave-filtering algorithms

(such as Mel Frequency Cepstral Coefficients

algorithm) [6], acoustic CNN models then run on

these maps just like input images for computer

vision models. With tiny 3x3 or 5x5 convolution

kernels, CNNs can be trained and forward faster

than RNN for convoluting operation is easier to

be parallelized and accelerated than recurrent

computation. This advantage makes it possible to

accelerate an acoustic CNN model on specific

power-efficiency hardware platforms.

Gradient descent algorithm [7], which is

sensitive to numerical fluctuation [8], is widely

applied to train deep neural networks (DNN). To

pursue best DNN performance, it is necessary to

store data in full-precision format during training

process. However, floating data format needs

longer word-length to store and more circuit parts

to compute, leading to more energy consumption

and bigger circuit designing area [9]. Also, the

computing complexity of floating-point data

makes it difficult to reduce computing cycle

counts. Floating computation, although can keep

precision well, has become the bottleneck of

power-efficiency high performance computing.

Fortunately, some works [10][11] have

proven that floating-point data is unnecessary to

CNNs’ forwarding tasks, low precision

computing can achieve similar performance as

well. These works provide quantization method,

turning weight and activation data into fix-point

data, integer data or even binary data with little

accuracy loss. Based on kinds of quantized CNN

models, there comes BNN (Binary Neural

Network) accelerators [12][13][14] and GPUs

supporting 8 bits integer data etc. [15][16] These

designs reduce power consumption greatly and

have up to hundreds of speedup ratios compared

with CPU platforms. It turns out that hardware

with corresponding quantized CNN models can

achieve excellent computing performance as well

as high energy efficiency.

Compared with CPU and GPU, ASIC and

FPGA are more suitable to accelerate a specific

task for their reconfigurable feature. These

reconfigurable platforms can be customized by

setting pipeline and expanding parallelism degree,

lowering power consumption and raising

computing performance. Although ASIC owns

huge advantages over FPGA on power and speed,

expensive designing and manufacturing cost

limits it’s general application. On the contrast,

FPGA keeps a good balance between

performance, power, flexibility and expense due

to programmable feature and mature industry

design. Now, FPGA has been widely used in

cloud computing and intelligence computing by

Microsoft, Amazon and Alibaba [15][16],

becoming an important part of high- performance

computing.

The sound classification model which

focuses on specific speech instructions or

acoustics signal, is a basic component of

intelligent scenario analysis in both cloud and

edge end. Such applying circumstance needs a

low-power but high-performance computing

platform especially. Typical deep convolution

neural networks can do coarse sound

classification work well. However, there still

exists some space to accelerate CNN model and

reduce computing platform’s energy

consumption by quantization and customized

hardware design. To implement this power-

efficient sound classification computation

platform, we choose a typical CNN-based speech

classification model where weight value is +1 or

-1 and activation data is in full precision floating-

data format [17]. We design an accelerator based

on Xilinx XCKU-115 FPGA platform and run

this BWN (Binary Weight Network) model.

Compared with state-of-art CPU platform, our

accelerator achieves 18-300x throughput speed

up ratio and high energy efficiency. The main

contributions of this work are as follows:

1. We turn float-type feature data into fix-

point data each level by design space

exploration method. Model’s loss on

accuracy will be covered by the

performance of fix-point computing on

FPGA platforms.

2. We design a multi-PE BWN accelerator

on FPGA, which has shared weight storage,

balanced pipeline structure and low-delay

pipeline between CNN’s levels. Also, the

performance, power consumption and

energy efficiency of this accelerator are

discussed.

3. The target speech classification model is

tested under single thread, multi-thread and

multi-node environments to get sufficient

performance baseline. Compared with these

test results, our design has absolute

advantage on performance per watt and

throughput.

2. Neural Network Forwarding Quantization

When training a deep neural network,

researchers usually choose full-precision data

format to ensure best model accuracy. However,

in inference task, these parameters will not be

changed and therefore we can prune them in an

offline method. [17] raises an algorithm to

compress floating-point DNN parameters into

binary data, which is consisted of +1 and -1.

Compared with common DNN with floating-

point weight and activation, this compression

method not only sharply reduces parameter

storage, but also replaces multiplication and

division with add and minus. Less storage space

and multiplication mean less memory-consuming

energy and less computing cycles, leading to

faster lower power consumption and faster

working speed.

[18] brings out a method to turn activation

into binary format. Unlike parameter in neural

networks, activation data fluctuates numerically

with different input (such like input image or

input audio feature map). Although [18] still

keeps a good model accuracy on very deep CNNs

like VGGNet, great numerical precision loss

would be brought out binary activation data,

which may cause vital influence on some small-

size CNNs [19][20]. In this situation, turning

floating data into fix-point format can keep a

good balance between computing performance

and model accuracy: fix-point data computation

needs less computing cycles compared with

floating data, and, fix-point can adapt to data’s

numerical distribution by flexible allocation of

integer bitwise and decimal bitwise.

In Fig 1, when integer part is allocated with

more bitwise, it can indicate bigger data; and

when we give decimal part longer word-length, it

can improve numerical precision correspondingly.

However, increasing the length of fix-point data

format will add cycle counts to computation or

excess hardware’s limitation, so taking speed and

accuracy into account, it is important to find the

relatively best data format.

10.06

10.0625

True value

Fix-point value

1010.00001

Fix-point value in binary format

{

Integer part

{

Decimal part

20.06

20.060546875

True value

Fix-point value with extra

decimal bitwise

10100.0000011111

Fix-point value in binary format

Accuracy loss: 0.025

Accuracy loss: 0.000546875

Fig.1 How Bitwise Influences Numerical Precision

3. Speech Classification Model

3.1 Model Architecture and Weight

Binarization

This CNN-based speech recognition model

is trained on Tensorflow speech command dataset.

It can recognize the six sort of short speech

segment “up”, “down”, “yes”, “right”, “left” and

“unknown words”. This model first uses MFCC

algorithm turning an audio file into a float-type

tensor, whose dimension is 20x49x1. Then this

tensor will be sent into a convolution neural

network, which is consisted of two convolution

layers, three full-connected layers and binary

weight parameters. The detail information of

model architecture is shown in Fig 2. All

convolution kernel size is 3 and convolution

stride is 1. There is no padding and expansion

operation in this network, which is convenient for

us to accelerate. To be noticed that activation is

still in float format at this stage. Via softmax

function, this model outputs the possibility of six

type of labels.

After model parameters being fixed in the

training process, we can transfer float weight

value into (-1, 1). Fig 3 shows how we processing

weight data. We assume the distribution of

primitive parameter is normal distribution, the

numerical distributing range is then modified by

tanh function and a series of scale methods.

Finally, all parameters are discretized to -1 or +1.

This stage’s BWN model (activation data is still

剩余13页未读，继续阅读

评论收藏

内容反馈

学习呀三木

粉丝: 29
资源: 303

BWN doc E 预生产1

评论0

最新资源

BWN doc E 预生产1

评论0

BWN-E 预生产1

BWN-E 预生产 - 副本1

BWN doc 中文1

BWN手册-中文1

11-开发阶段工作量统计1

指纹仪驱动WIN10安装包及调试程序_ZKFinger SDK 5.0 URU4000B

基础电子中的实际的同步突发式SRAM

关于二阶线性递归序列倒数和的对称性 (2002年)

VM-Pro通用化视觉系统框架V1.6

串口侦听 串口监听 不占用串口 不占用串口的监听

C#-WInform 低功耗蓝牙的通讯

net framework4.0和4.5开发包（用于visual studio 2022 安装net旧版本）

海康Vision Master SDK 二次开发

【C#源码】TCP+串口通信的调试工具 （源码+教学视频）

C#含有ModbusRtu通讯库，通讯示例 硬件设备测试例程

通达信股票行情接口C#版API手册

基于C#与Sql Server的智慧星学生选课管理系统.rar

C#40000字全套精华教程！！！从入门到精通，一篇就够了！！！

HslCommunication.dll 7.0.1 免费版本 全部源代码和测试工程

C# .Net使用第三方库PacketDotNet，开发的抓包软件示例

c#深度学习-PaddleOCRSharp数字识别demo

C# Winform通用开发框架

C# WinForm虚拟串口通信

MaterialDesignIn XamlToolkit 压缩包（已编译，可直接打开，0 积分免费下载）

Microsoft.ACE.OLEDB.12.0-提供程序

C# MES_开源源代码

基于C#的软件加密、授权与注册

C# 使用MQTTnet实现MQTT通信

最新资源

串口侦听串口监听不占用串口不占用串口的监听

【C#源码】TCP+串口通信的调试工具（源码+教学视频）

C#含有ModbusRtu通讯库，通讯示例硬件设备测试例程

HslCommunication.dll 7.0.1 免费版本全部源代码和测试工程