The algorithm derivation below can be found in Brierley [1] and Brierley and Batty [2]. Please
refer to these for a hard copy.
Back Propagation Weight Update Rule
This idea was first described by Werbos [3] and popularised by Rumelhart et al.[4].
Fig 1 A multilayer perceptron
Consider the network above, with one layer of hidden neurons and one output neuron. When an
input vector is propagated through the network, for the current set of weights there is an output
Pred. The objective of supervised training is to adjust the weights so that the difference
between the network output Pred and the required output Req is reduced. This requires an
algorithm that reduces the absolute error, which is the same as reducing the squared error,
where:
Network Error = Pred - Req
=
E
(1)
The algorithm should adjust the weights such that E
2
is minimised. Back-propagation is such
an algorithm that performs a gradient descent minimisation of E
2
.
In order to minimise E
2
, its sensitivity to each of the weights must be calculated. In other
words, we need to know what effect changing each of the weights will have on E
2
. If this is
known then the weights can be adjusted in the direction that reduces the absolute error.
The notation for the following description of the back-propagation rule is based on the diagram
below.