达尔文：基于尖峰神经网络的神经形态硬件协处理器资源-CSDN文库

49 浏览量 2021-04-16 01:37:00 上传评论收藏 2.12MB PDF 举报

神经形态硬件协处理器是利用生物神经系统原理设计的电子器件，旨在模拟大脑中神经元和突触的结构和功能。该领域的研究不断深化，逐渐成为高性能低功耗计算的重要研究方向。本次研究介绍了一款基于尖峰神经网络（Spiking Neural Networks，SNN）的神经形态硬件协处理器——Darwin。SNN是受生物启发的神经网络模型，其信息处理基于离散时间尖峰信号，与传统的人工神经网络（Artificial Neural Network，ANN）有着本质的不同。在硬件上实现SNN是达到高性能和低功耗的关键所在。研究中提出的Darwin神经处理单元（Neural Processing Unit，NPU）是一款高度可配置的神经形态硬件协处理器。它基于数字逻辑实现，支持可配置数量的神经元、突触连接以及突触延时。Darwin NPU采用了标准的180nm CMOS工艺制造，芯片尺寸为5×5mm²，工作在最坏情况下的时钟频率为70MHz。在典型应用中，其功耗为0.84mW/MHz，工作电压为1.8V。为了展示Darwin NPU的性能和效率，研究者使用了两个原型应用程序进行了测试。 SNN与传统的ANN主要区别在于其信息传递机制。在SNN中，神经元通过尖峰信号来进行信息交流和计算。当一个SNN神经元接收到来自其他神经元的输入尖峰时，它的膜电位会暂时增加，但随着时间推移由于离子通道的泄漏而逐渐下降。当连续快速地接收到多个输入尖峰时，膜电位可能会增加到某个阈值电压，触发一个输出尖峰通过连接的突触传递给下游神经元。这是神经元处理信息的基本方式。尖峰神经网络通过Address-Event Representation（AER）这种编码方式来传递尖峰事件。AER是一种神经编码方式，可以高效地在事件驱动的神经芯片间传输信息。同时，研究中也提到了数字VLSI（超大规模集成电路）技术在实现SNN中的重要性。数字VLSI技术提供了构建复杂神经网络结构的能力，并且与传统数字电路设计和集成的兼容性好，有助于神经形态芯片的大规模生产。本研究中涉及的硬件实现对于提高SNN的性能有着重要意义。通过硬件实现，研究者可以更好地控制和优化神经元和突触的具体参数，以及它们之间的连接方式和延时特性，这对于研究和应用生物神经系统的计算原理至关重要。而通过对Darwin NPU的性能测试，研究者希望展示其在真实世界问题中的应用潜力，证明神经形态硬件在特定任务上的优势。总而言之，本研究展示了一款在180nm CMOS工艺下实现的高性能、低功耗的SNN硬件协处理器。其设计和实现对于推动神经形态计算的研究以及实际应用具有重要的价值。未来，随着CMOS工艺的进步和新的计算范式的探索，基于SNN的硬件实现有望在各种智能计算领域得到广泛应用。

资源推荐

资源详情

资源评论

Journal of Systems Architecture 77 (2017) 43–51

Contents lists available at ScienceDirect

Journal of Systems Architecture

journal homepage: www.elsevier.com/locate/sysarc

Darwin: A neuromorphic hardware co-processor based on spiking

neural networks

De Ma

a , b

, Juncheng Shen

, Zonghua Gu

b , ∗

, Ming Zhang

, Xiaolei Zhu

b , ∗

, Xiaoqiang Xu

Qi Xu

, Yangjing Shen

, Gang Pan

MOE Key Laboratory of RF Circuits and Systems, Hangzhou Dianzi University, 310018, China

College of Computer Science/College of Microelectronics, Zhejiang University, Hangzhou, 310027, China

a r t i c l e i n f o

Article history:

Received 2 February 2016

Revised 4 June 2016

Accepted 9 January 2017

Available online 17 January 2017

Keywords:

Neuromorphic computing

Spiking neural networks (SNN)

Address-event representation (AER)

Digital VLSI

a b s t r a c t

Spiking Neural Network (SNN) is a type of biologically-inspired neural networks that perform infor-

mation processing based on discrete-time spikes, different from traditional Artiﬁcial Neural Network

(ANN). Hardware implementation of SNNs is necessary for achieving high-performance and low-power.

We present the Darwin Neural Processing Unit (NPU), a highly-conﬁgurable neuromorphic hardware co-

processor based on SNN implemented with digital logic, supporting a conﬁgurable number of neurons,

synapses and synaptic delays. The Darwin NPU was fabricated by standard 180 nm CMOS technology with

area size of 5 × 5 mm

and 70 MHz clock frequency at the worst case. It consumes 0.84 mW/MHz with

1.8 V power supply for typical applications. Two prototype applications are used to demonstrate the per-

formance and eﬃciency of the Darwin NPU.

1. Introduction and related work

Spiking Neural Network (SNN) is a type of biologically-inspired

neural networks that performs information communication and

computation based on discrete-time spikes. When an SNN neu-

ron receives an input spike, its soma’s membrane potential is in-

creased momentarily, but gradually drops due to leakage of ion

channels. When multiple input spikes are received in rapid suc-

cession, the membrane potential may increase to a certain thresh-

old voltage, triggering an output spike to its downstream neurons

through connecting synapses. There are multiple possible levels of

abstractions for SNN modeling, ranging from the most biologically-

realistic Hodgkin-Huxley model to the most simpliﬁed Leaky Inte-

grate and Fire (LIF) model, with models of intermediate complex-

ity including the Quadratic Integrate and Fire (QIF), Adaptive Expo-

nential Integrate and Fire (AdEx IF), Izhikevich, FitzHugh-Nagumo,

Hindmarsh-Rose, and many others [1] .

There are number of well-known software simulators for SNN

based on conventional CPUs, including NEST [2] , Brian [3] , NEU-

RON [4] , Nengo [5] and others. Since software simulation can be

quite slow, researchers have developed simulation acceleration so-

lutions based on high-performance parallel computing platforms,

e.g., the Bluebrain project [6] uses IBM BlueGene/Q supercomputer

∗

Corresponding authors.

E-mail addresses: zgu@zju.edu.cn (Z. Gu), zhuxl@vlsi.zju.edu.cn (X. Zhu).

with 65,536 cores to achieve simulation of large-scale SNN based

on multi-compartment Hodgkin-Huxley model; the SpiNNaker sys-

tem [7] is a custom multicore ARM-based super-computer for real-

time and low-power simulation of simpliﬁed SNN models such as

LIF or Izhikevich. There are also GPU-based simulators, e.g., the

CarlSim3 simulator [8] runs on high-performance GPUs to achieve

large-scale SNN simulation.

In order to achieve real-time and low-power implementation of

SNN simulation, it is necessary to use (digital, analog or mixed-

signal) hardware to implement SNN instead of (CPU or GPU-based)

software simulation. Hardware implementation of SNN for ma-

chine learning applications has the potential to achieve much

lower power consumption compared to traditional Artiﬁcial Neu-

ral Network (ANN) [9] . There are a number of different approaches

to developing hardware acceleration of SNN models based on dif-

ferent types of hardware platforms, including digital logic im-

plementation with FPGA or ASIC, e.g., the IBM TrueNorth pro-

cessor developed in the DARPA SyNAPSE project [9] , which sup-

ports a maximum number of 1 million neurons and 256 million

synapses; or Analog and Mixed-Signal circuit implementation, e.g.,

the ROLLS processor [10] , which supports a maximum of 256 neu-

rons and 128 K synapses with on-line learning algorithms such as

Spike Timing Dependent Plasticity (STDP); the NeuroGrid system

[11] with the aim of simulating large-scale neural models in real

time, and the HICANN wafer-scale system [12] for 10,0 0 0-times

faster emulation of SNN based on the AdEx IF neuron mode. Be-

sides conventional CMOS-based implementations, there are also

http://dx.doi.org/10.1016/j.sysarc.2017.01.003

44 D. Ma et al. / Journal of Systems Architecture 77 (2017) 43–51

attempts at building systems based on emerging devices such as

memristors [13] .

For embedded systems, it is typical for neuromorphic proces-

sors to be used as co-processors integrated with the CPU in a

master-slave conﬁguration; the neuromorphic processor is used

to accelerate computation-intensive machine learning algorithms

such as image recognition, image segmentation etc., and the CPU

is used to run the typical operating systems tasks, including graph-

ical user interface, networking stack, ﬁle systems, etc. Compared

to special-purpose hardware accelerators designed for a speciﬁc

function, the neuromorphic co-processor has the advantage of be-

ing conﬁgurable to support any function that can be implemented

with neural networks, which are universal approximators of arbi-

trary continuous functions. For example, the Qualcomm Zeroth co-

processor [14] integrated with the Snagdragon 820A processor im-

plements deep Artiﬁcial Neural Networks that can be conﬁgured to

support any function.

In this paper, we present the Darwin Neural Processing Unit

(NPU) , a highly-conﬁgurable neuromorphic hardware co-processor

based on Leaky Integrate and Fire (LIF) SNN model [15] , imple-

mented with digital logic. It is designed for resource-constrained

embedded applications, hence the hardware resource used by the

design is very limited. We reduce the computation resource cost

by time-multiplexing the physical neuron units, and minimize the

memory resource cost by the design of reconﬁgurable memory

subsystem. It has been prototyped on FPGA, and fabricated as ASIC

in SMIC’s 180 nm process. Since different applications have very

different requirements, the Darwin NPU is designed to be highly

conﬁgurable, with the maximum number of neurons, synapses and

synaptic delays all being conﬁgurable parameters.

Preliminary results have been reported in our short paper [16] .

In this paper, we present more details on the architectural design

of the Darwin NPU, including the overall architecture design, main

parameters and variables, the off-chip and on-chip memory system

design, as well as details on the SNN architecture for the demon-

stration applications. The rest of the paper is structured as follows:

Section 2 presents the neuron model and its optimizations for im-

plementation with digital logic; Section 3 presents the hardware

architecture of the Darwin NPU; Section 4 presents two demon-

stration applications; Finally, Section 5 presents conclusions.

2. The neuron model

The Leaky Integrate and Fire (LIF) model is a simpliﬁed model

of biological neuron, widely used in neuromorphic engineering

projects. It represents a good tradeoff between computational com-

plexity and biological realism. The membrane potential V of a LIF

neuron is described by the following equation:

= g

(

rest

− V

)

+ I, (1)

Where V

rest

is the resting membrane potential; C

is the membrane

capacitance; g

is the membrane conductance; I is the input cur-

rent. When the membrane potential V rises up to reach the ﬁring

threshold V

, a spike (also called an Action Potential ) is triggered,

and V rapidly rises to a large value, then reset to V = V

reset

. After-

wards, there is a refractory period with length of T

ref

, when the

neuron is not responsible to input spikes. At the end of the refrac-

tory period, the membrane potential V returns to the resting mem-

brane potential V

rest

, and starts to be responsive to input spikes

again.

To implement the model with digital logic, it is necessary to

have a discrete-time version of the LIF model. Consider a post-

synaptic neuron with index j , connected to possibly multiple pre-

synaptic neurons with indices denoted as i . The membrane poten-

tial of neuron j satisﬁes the following discrete time equations:

(

)

← V

(

t − 1

) (

1 − t/ τ

)



max

, (2)

(

)

←



0 , if t ∈



, T

+ T

re f





− V

(

)



· V

(

)

, ot herwise

, (3)

(

)

← H

(

)

− V

)

, (4)

where V

( t ) is the membrane potential of neuron j at time step

t; V

max

is a spike’s maximum contribution to membrane potential

(occurring when w

= 1 ); the term



max

denotes the input

current I, equal to sum of each input spike current multiplied by

the respective synapse weights (we use a per-neuron Weight-Sum

Queue to store this term at different time steps); t is the sim-

ulation time step size, with typically value of 0.1 ms; τ

= C

/ g

is time constant of the RC circuit model of the cell membrane;

= { 0 , 1 } denotes whether neuron i ﬁres a spike at time step

t; V

max

denotes the maximum voltage change to a neuron caused

by receiving an incoming spike; w

indicates the weight of the

synapse that connects pre-synaptic neuron i to post-synaptic neu-

ron j ; it is positive if the synapse is excitatory; negative if it is

inhibitory; V

is the ﬁring threshold; H(x ) = {

1 , x ≥ 0

0 , x < 0

is the unit

step function; V

rest

and V

reset

are both assumed to be 0. If the

neuron ﬁres an output spike at t = T

, then it remains quiescent

for the length of the refractory period during the time interval

[ T

, T

+ T

re f

] , when its membrane potential stays at V

reset

= 0 and

not responsive to input spikes. (The synapse delay does not ap-

pear explicitly in Eqs. (2) –(4) , but is modeled as a circular buffer,

as shown in Fig. 2 later.)

To reduce the computation density, the ﬂoating-point variables

in Eqs (2) –(4) need to be converted to ﬁxed-point integer variables.

We ﬁrst simplify the status update Eq. (2) by merging parameters.

We deﬁne the leakage constant N

leak

= 1 − t/ τ

, and the equiv-

alent synapse weight W

= V

max

. We then perform ﬂoating-to-

ﬁxed-point conversion by deﬁning v

(t) = V

(t) · 2

as the neu-

ron status, N

decay

= N

leak

· 2

as the decay constant; β

, γ are in-

tegers in the range [0, 31]. Eqs (2) –( 4 ) are converted into the fol-

lowing ﬁxed-point equations:

(

)

← v

(

t − 1

)

· N

decay

/ 2



· 2

(5)

(

)

←



0 , if t ∈



, T

+ T

re f





· 2

− v

(

)



· v

(

)

, ot herwise

(6)

(

)

← H



(

)

− V

· 2



(7)

Since the membrane potential v

( t ) and synapse weights W

have signiﬁcantly different dynamic ranges, we apply different

scaling factors during ﬂoating-point to ﬁxed-point conversion, β

and β

, respectively. We deﬁne β

= β

− β

as the difference

between scaling factors, then Eq. (5) turns into:

(

)

← v

(

t − 1

)

· N

decay

/ 2





· 2



· 2

(8)

Eqs. (6) - (8) form the set of kernel equations that are executed

by the NPU to perform simulation of a network of LIF neurons.

3. Architectural design of the NPU

3.1. Architecture overview

Fig. 1 shows the overall microarchitecture of the Darwin NPU.

Due to its limited area size, the NPU supports 8 physical neuron

units on the chip, which are used to implement logical neurons

剩余8页未读，继续阅读

评论收藏

内容反馈

weixin_38631389

粉丝: 6
资源: 891

达尔文：基于尖峰神经网络的神经形态硬件协处理器

达尔文：兴趣与恒心是科学发展的动力.ppt

达尔文：兴趣与恒心是科学发展的动力》.ppt

《达尔文：兴趣与恒心是科学发展的动力》.ppt

《达尔文：兴趣与恒心是科学发展动力》课件全解.ppt

《创新设计课堂讲义》配套课件：第五课达尔文：兴趣与恒心是科学发现的动力.ppt

高中语文《达尔文：兴趣与恒心是科学发展的动力》教案1新人教版选修中外传记作品选读.docx

《达尔文：兴趣与恒心是科学发展的动力》课件（人教版选修《中外传记作品选读》）.ppt

使用神经网络和遗传 算法 实现的自动驾驶汽车（人工智能课程项目）_JavaScript_代码_下载

测试达尔文主义：网络弹性的历史与演变.pdf

安全人才 测试达尔文主义：网络弹性的历史与演变 - https.zip

达尔文：具有GPII支持的自适应Web组件的Javascript框架

connectfour-neuralnet:一个简单的命令行将四个游戏与一个神经网络 AI 和遗传算法连接起来

达尔文进化论PPT

基于 CDPSO 的多级阈值：基于混沌函数的达尔文 PSO 算法的改进变体。-matlab开发

Java基于遗传算法的自动排课系统源码.zip

基于CORDIC的反正弦和反余弦计算的FPGA实现

使用3DCNN和卷积LSTM进行手势识别学习时空特征

BA无标度网络中的SIR模型

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于维纳过程的退化模型，具有递归过滤算法，可用于估计剩余使用寿命

基于FPGA的奇异值和特征值分解的快速实现。

基于BP神经网络的人口预测

磁悬浮系统自适应模糊PID控制器的设计

无人机协同目标的多无人机协同搜索方法

两轮平衡车的建模与控制研究

基于改进遗传算法的六自由度机器人时间最优轨迹规划

最新资源

使用神经网络和遗传算法实现的自动驾驶汽车（人工智能课程项目）_JavaScript_代码_下载

安全人才测试达尔文主义：网络弹性的历史与演变 - https.zip