没有合适的资源?快使用搜索试试~ 我知道了~
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPG...
需积分: 36 17 下载量 23 浏览量
2017-09-07
09:13:03
上传
评论 2
收藏 1.73MB PDF 举报
温馨提示
FPGA 领域顶级会议 FPGA 2017 于 2 月 24 日在加州 Monterey 结束。在本次大会上,深鉴科技论文《ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA》获得了大会最佳论文奖(Best Paper Award)。
资源推荐
资源详情
资源评论
ESE: Efficient Speech Recognition Engine
with Compressed LSTM on FPGA
Song Han
1,2
, Junlong Kang
2
, Huizi Mao
1,2
, Yiming Hu
2,3
, Xin Li
2
, Yubin Li
2
, Dongliang Xie
2
Hong Luo
2
, Song Yao
2
, Yu Wang
2,3
, Huazhong Yang
3
and William J. Dally
1,4
1
Stanford University,
2
DeePhi Tech,
3
Tsinghua University,
4
NVIDIA
1
{songhan,dally}@stanford.edu,
2
song.yao@deephi.tech,
3
yu-wang@mail.tsinghua.edu.cn
Abstract
Long Short-Term Memory (LSTM) is widely used in speech recognition. In order
to achieve higher prediction accuracy, machine learning scientists have built larger
and larger models. Such large model is both computation intensive and memory
intensive. Deploying such bulky model results in high power consumption given
latency constraint and leads to high total cost of ownership (TCO) of a data center.
In order to speedup the prediction and make it energy efficient, we first propose
a load-balance-aware pruning method that can compress the LSTM model size
by 20
×
(10
×
from pruning and 2
×
from quantization) with negligible loss of the
prediction accuracy. The pruned model is friendly for parallel processing. Next,
we propose scheduler that encodes and partitions the compressed model to each PE
for parallelism, and schedule the complicated LSTM data flow. Finally, we design
the hardware architecture, named Efficient Speech Recognition Engine (ESE) that
works directly on the compressed model. Implemented on Xilinx XCKU060 FPGA
running at 200MHz, ESE has a performance of 282 GOPS working directly on the
compressed LSTM network, corresponding to 2.52 TOPS on the uncompressed
one, and processes a full LSTM for speech recognition with a power dissipation of
41 Watts. Evaluated on the LSTM for speech recognition benchmark, ESE is 43
×
and 3
×
faster than Core i7 5930k CPU and Pascal Titan X GPU implementations.
It achieves 40
×
and 11.5
×
higher energy efficiency compared with the CPU and
GPU respectively.
1 Introduction
Deep neural network has surpassed the traditional acoustic model and become the state-of-the-art
method for speech recognition [
1
,
2
]. Long Short-Term Memory (LSTM) [
3
], Gated Recurrent Unit
(GRU) [
4
] and vanilla recurrent neural networks (RNNs) are popular in speech recognition. In this
work, we designed a hardware accelerator called ESE for the most complex one: the LSTM.
ESE takes the approach of EIE [
5
] one step further to address a more general problem of accelerating
not only feed forward neural networks but also recurrent neural networks and LSTM. The recurrent
nature of RNN produces complicated data dependency, which is more challenging than feed forward
neural nets. To deal with this problem, we designed a data flow that can effectively schedule the
complex RNN operations using multiple EIE cores.
Among all factors contribute to the monthly bill of a data center, power consumption is the major
one. Since memory reference consumes more than two orders of magnitude higher energy than ALU
operations, we focus on reducing the memory footprint.
In order to achieve this, we design a novel method to optimize across the algorithm, software
and hardware. At algorithm level, ESE revisited pruning algorithm from the hardware efficiency
1st International Workshop on Efficient Methods for Deep Neural Networks at NIPS 2016, Barcelona, Spain.
Full paper to appear at FPGA 2017.
arXiv:1612.00694v1 [cs.CL] 1 Dec 2016
资源评论
sytek
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- MATLAB实现绘制NURBS曲线程序源码
- 处理word文档,解析文档格式、图片、表达式、表格-doc、docx篇
- C#微信营销平台源码 微信营销后台管理系统源码数据库 文本存储源码类型 WebForm
- 技术资料分享65C02汇编指令集很好的技术资料.zip
- 课程作业《用51单片机实现的红外人体检测装置》+C语言项目源码+文档说明
- app自动化小白之appium环境安装
- 课程设计-哲学家就餐问题(并发算法问题)-解决策略:资源分级、最多允许四个哲学家同时拿筷子、服务员模式、尝试等待策略
- C#大型公司财务系统源码 企业财务管理系统源码数据库 SQL2008源码类型 WebForm
- MDK文件编译配套工程
- java项目,课程设计-ssm企业人事管理系统ssm.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功