Notes on Convolutional Neural Networks
Jake Bouvrie
Center for Biological and Computational Learning
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology
Cambridge, MA 02139
November 22, 2006
1 Introduction
This document discusses the derivation and implementation of convolutional neural networks
(CNNs) [3, 4], followed by a few straightforward extensions. Convolutional neural networks in-
volve many more connections than weights; the architecture itself realizes a form of regularization.
In addition, a convolutional network automatically provides some degree of translation invariance.
This particular kind of neural network assumes that we wish to learn filters, in a data-driven fash-
ion, as a means to extract features describing the inputs. The derivation we present is specific to
two-dimensional data and convolutions, but can be extended without much additional effort to an
arbitrary number of dimensions.
We begin with a description of classical backpropagation in fully connected networks, followed by a
derivation of the backpropagation updates for the filtering and subsampling layers in a 2D convolu-
tional neural network. Throughout the discussion, we emphasize efficiency of the implementation,
and give small snippets of MATLAB code to accompany the equations. The importance of writing
efficient code when it comes to CNNs cannot be overstated. We then turn to the topic of learning
how to combine feature maps from previous layers automatically, and consider in particular, learning
sparse combinations of feature maps.
Disclaimer: This rough note could contain errors, exaggerations, and false claims.
2 Vanilla Back-propagation Through Fully Connected Networks
In typical convolutional neural networks you might find in the literature, the early analysis consists of
alternating convolution and sub-sampling operations, while the last stage of the architecture consists
of a generic multi-layer network: the last few layers (closest to the outputs) will be fully connected
1-dimensional layers. When you’re ready to pass the final 2D feature maps as inputs to the fully
connected 1-D network, it is often convenient to just concatenate all the features present in all the
output maps into one long input vector, and we’re back to vanilla backpropagation. The standard
backprop algorithm will be described before going onto specializing the algorithm to the case of
convolutional networks (see e.g. [1] for more details).
2.1 Feedforward Pass
In the derivation that follows, we will consider the squared-error loss function. For a multiclass
problem with c classes and N training examples, this error is given by
E
N
=
1
2
N
X
n=1
c
X
k=1
(t
n
k
− y
n
k
)
2
.
Here t
n
k
is the k-th dimension of the n-th pattern’s corresponding target (label), and y
n
k
is similarly
the value of the k-th output layer unit in response to the n-th input pattern. For multiclass classifi-
cation problems, the targets will typically be organized as a “one-of-c” code where the k-th element
- 1
- 2
前往页