

Neural Networks for
Pattern Recognition
CHRISTOPHER M. BISHOP
Department of
Computer Science
and Applied
Mathematics
Aston
University
Birmingham, UK
CLARENDON PRESS • OXFORD
1995


FOREWORD
Geoffrey Hinton
Department of Computer Science
University of Toronto
For those entering the field of artificial neural networks, there has been an acute
need for an authoritative textbook that explains the main ideas clearly and con-
sistently using the basic tools of linear algebra, calculus, and simple probability
theory. There have been many attempts to provide such a text, but until now,
none has succeeded. Some authors have failed to separate the basic ideas and
principles from the soft and fuzzy intuitions that led to some of the models as
well as to most of the exaggerated claims. Others have been unwilling to use the
basic mathematical tools that are essential for a rigorous understanding of the
material. Yet others have tried to cover too many different kinds of neural net-
work without going into enough depth on any one of them. The most successful
attempt to date has been "Introduction to the Theory of Neural Computation"
by Hertz, Krogh and Palmer. Unfortunately, this book started life as a graduate
course in statistical physics and it shows. So despite its many admirable qualities
it is not ideal as a general textbook.
Bishop is a leading researcher who has a deep understanding of the material
and has gone to great lengths to organize it into a sequence that makes sense. He
has wisely avoided the temptation to try to cover everything and has therefore
omitted interesting topics like reinforcement learning, Hopfield Networks and
Boltzmann machines in order to focus on the types of neural network that are
most widely used in practical applications. He assumes that the reader has the
basic mathematical literacy required for an undergraduate science degree, and
using these tools he explains everything from scratch. Before introducing the
multilayer perceptron, for example, he lays a solid foundation of basic statistical
concepts. So the crucial concept of overfitting is first introduced using easily
visualised examples of one-dimensional polynomials and only later applied to
neural networks. An impressive aspect of this book is that it takes the reader all
the way from the simplest linear models to the very latest Bayesian multilayer
neural networks without ever requiring any great intellectual leaps.
Although Bishop has been involved in some of the most impressive applica-
tions of neural networks, the theme of the book is principles rather than applica-
tions.
Nevertheless, it is much more useful than any of the applications-oriented
texts in preparing the reader for applying this technology effectively. The crucial
issues of how to get good generalization and rapid learning are covered in great
depth and detail and there are also excellent discussions of how to preprocess

vni
Foreword
the input and how to choose a suitable error function for the output.
It is a sign of the increasing maturity of the field that methods which were
once justified by vague appeals to their neuron-like qualities can now be given a
solid statistical foundation. Ultimately, we all hope that a better statistical un-
derstanding of artificial neural networks will help us understand how the brain
actually works, but until that day comes it is reassuring to know why our cur-
rent models work and how to use them effectively to solve important practical
problems.