44 D. Ma et al. / Journal of Systems Architecture 77 (2017) 43–51
attempts at building systems based on emerging devices such as
memristors [13] .
For embedded systems, it is typical for neuromorphic proces-
sors to be used as co-processors integrated with the CPU in a
master-slave configuration; the neuromorphic processor is used
to accelerate computation-intensive machine learning algorithms
such as image recognition, image segmentation etc., and the CPU
is used to run the typical operating systems tasks, including graph-
ical user interface, networking stack, file systems, etc. Compared
to special-purpose hardware accelerators designed for a specific
function, the neuromorphic co-processor has the advantage of be-
ing configurable to support any function that can be implemented
with neural networks, which are universal approximators of arbi-
trary continuous functions. For example, the Qualcomm Zeroth co-
processor [14] integrated with the Snagdragon 820A processor im-
plements deep Artificial Neural Networks that can be configured to
support any function.
In this paper, we present the Darwin Neural Processing Unit
(NPU) , a highly-configurable neuromorphic hardware co-processor
based on Leaky Integrate and Fire (LIF) SNN model [15] , imple-
mented with digital logic. It is designed for resource-constrained
embedded applications, hence the hardware resource used by the
design is very limited. We reduce the computation resource cost
by time-multiplexing the physical neuron units, and minimize the
memory resource cost by the design of reconfigurable memory
subsystem. It has been prototyped on FPGA, and fabricated as ASIC
in SMIC’s 180 nm process. Since different applications have very
different requirements, the Darwin NPU is designed to be highly
configurable, with the maximum number of neurons, synapses and
synaptic delays all being configurable parameters.
Preliminary results have been reported in our short paper [16] .
In this paper, we present more details on the architectural design
of the Darwin NPU, including the overall architecture design, main
parameters and variables, the off-chip and on-chip memory system
design, as well as details on the SNN architecture for the demon-
stration applications. The rest of the paper is structured as follows:
Section 2 presents the neuron model and its optimizations for im-
plementation with digital logic; Section 3 presents the hardware
architecture of the Darwin NPU; Section 4 presents two demon-
stration applications; Finally, Section 5 presents conclusions.
2. The neuron model
The Leaky Integrate and Fire (LIF) model is a simplified model
of biological neuron, widely used in neuromorphic engineering
projects. It represents a good tradeoff between computational com-
plexity and biological realism. The membrane potential V of a LIF
neuron is described by the following equation:
C
m
dV
dt
= g
l
(
V
rest
− V
)
+ I, (1)
Where V
rest
is the resting membrane potential; C
m
is the membrane
capacitance; g
l
is the membrane conductance; I is the input cur-
rent. When the membrane potential V rises up to reach the firing
threshold V
th
, a spike (also called an Action Potential ) is triggered,
and V rapidly rises to a large value, then reset to V = V
reset
. After-
wards, there is a refractory period with length of T
ref
, when the
neuron is not responsible to input spikes. At the end of the refrac-
tory period, the membrane potential V returns to the resting mem-
brane potential V
rest
, and starts to be responsive to input spikes
again.
To implement the model with digital logic, it is necessary to
have a discrete-time version of the LIF model. Consider a post-
synaptic neuron with index j , connected to possibly multiple pre-
synaptic neurons with indices denoted as i . The membrane poten-
tial of neuron j satisfies the following discrete time equations:
V
j
(
t
)
← V
j
(
t − 1
) (
1 − t/ τ
m
)
+
i
S
ij
V
max
w
ij
, (2)
V
j
(
t
)
←
0 , if t ∈
T
f
, T
f
+ T
re f
H
V
th
− V
j
(
t
)
· V
j
(
t
)
, ot herwise
, (3)
S
i
(
t
)
← H
(
V
i
(
t
)
− V
th
)
, (4)
where V
j
( t ) is the membrane potential of neuron j at time step
t; V
max
is a spike’s maximum contribution to membrane potential
(occurring when w
ij
= 1 ); the term
i
S
ij
V
max
w
ij
denotes the input
current I, equal to sum of each input spike current multiplied by
the respective synapse weights (we use a per-neuron Weight-Sum
Queue to store this term at different time steps); t is the sim-
ulation time step size, with typically value of 0.1 ms; τ
m
= C
m
/ g
l
is time constant of the RC circuit model of the cell membrane;
S
ij
= { 0 , 1 } denotes whether neuron i fires a spike at time step
t; V
max
denotes the maximum voltage change to a neuron caused
by receiving an incoming spike; w
ij
indicates the weight of the
synapse that connects pre-synaptic neuron i to post-synaptic neu-
ron j ; it is positive if the synapse is excitatory; negative if it is
inhibitory; V
th
is the firing threshold; H(x ) = {
1 , x ≥ 0
0 , x < 0
is the unit
step function; V
rest
and V
reset
are both assumed to be 0. If the
neuron fires an output spike at t = T
f
, then it remains quiescent
for the length of the refractory period during the time interval
[ T
f
, T
f
+ T
re f
] , when its membrane potential stays at V
reset
= 0 and
not responsive to input spikes. (The synapse delay does not ap-
pear explicitly in Eqs. (2) –(4) , but is modeled as a circular buffer,
as shown in Fig. 2 later.)
To reduce the computation density, the floating-point variables
in Eqs (2) –(4) need to be converted to fixed-point integer variables.
We first simplify the status update Eq. (2) by merging parameters.
We define the leakage constant N
leak
= 1 − t/ τ
m
, and the equiv-
alent synapse weight W
i
= V
max
w
ij
. We then perform floating-to-
fixed-point conversion by defining v
j
(t) = V
j
(t) · 2
β
v
as the neu-
ron status, N
decay
= N
leak
· 2
γ
as the decay constant; β
v
, γ are in-
tegers in the range [0, 31]. Eqs (2) –( 4 ) are converted into the fol-
lowing fixed-point equations:
v
j
(
t
)
← v
j
(
t − 1
)
· N
decay
/ 2
γ
+
i
S
ij
W
ij
· 2
β
v
(5)
v
j
(
t
)
←
0 , if t ∈
T
f
, T
f
+ T
re f
H
V
th
· 2
β
− v
j
(
t
)
· v
j
(
t
)
, ot herwise
(6)
S
i
(
t
)
← H
v
j
(
t
)
− V
th
· 2
β
v
(7)
Since the membrane potential v
j
( t ) and synapse weights W
ij
have significantly different dynamic ranges, we apply different
scaling factors during floating-point to fixed-point conversion, β
v
and β
w
, respectively. We define β
d
= β
v
− β
w
as the difference
between scaling factors, then Eq. (5) turns into:
v
j
(
t
)
← v
j
(
t − 1
)
· N
decay
/ 2
γ
+
i
S
ij
W
ij
· 2
β
w
· 2
β
d
(8)
Eqs. (6) - (8) form the set of kernel equations that are executed
by the NPU to perform simulation of a network of LIF neurons.
3. Architectural design of the NPU
3.1. Architecture overview
Fig. 1 shows the overall microarchitecture of the Darwin NPU.
Due to its limited area size, the NPU supports 8 physical neuron
units on the chip, which are used to implement logical neurons