computation) in a network of weighted directed graphs
in which the nodes are artificial neurons and directed
edges (with weights) are connections between neuron
outputs and neuron inputs. The main characteristics of
neural networks are that they have the ability to learn
complex nonlinear input-output relationships, use se-
quential training procedures, and adapt themselves to
the data.
The most commonly used family of neural networks for
pattern classification tasks [83] is the feed-forward network,
which includes multilayer perceptron and Radial-Basis
Function (RBF) networks. These networks are organized
into layers and have unidirectional connections between the
layers. Another popular network is the Self-Organizing
Map (SOM), or Kohonen-Network [92], which is mainly
used for data clustering and feature mapping. The learning
process involves updating network architecture and con-
nection weights so that a network can efficiently perform a
specific classification/clustering task. The increasing popu-
larity of neural network models to solve pattern recognition
problems has been primarily due to their seemingly low
dependence on domain-specific knowledge (relative to
model-based and rule-based approaches) and due to the
availability of efficient learning algorithms for practitioners
to use.
Neural networks provide a new suite of nonlinear
algorithms for feature extraction (using hidden layers)
and classification (e.g., multilayer perceptrons). In addition,
existing feature extraction and classification algorithms can
also be mapped on neural network architectures for
efficient (hardware) implementation. In spite of the see-
mingly different underlying principles, most of the well-
known neural network models are implicitly equivalent or
similar to classical statistical pattern recognition methods
(see Table 3). Ripley [136] and Anderson et al. [5] also
discuss this relationship between neural networks and
statistical pattern recognition. Anderson et al. point out that
ªneural networks are statistics for amateurs... Most NNs
conceal the statistics from the user.º Despite these simila-
rities, neural networks do offer several advantages such as,
unified approaches for feature extraction and classification
and flexible procedures for finding good, moderately
nonlinear solutions.
1.6 Scope and Organization
In the remainder of this paper we will primarily review
statistical methods for pattern representation and classifica-
tion, emphasizing recent developments. Whenever appro-
priate, we will also discuss closely related algorithms from
the neural networks literature. We omit the whole body of
literature on fuzzy classification and fuzzy clustering which
are in our opinion beyond the scope of this paper.
Interested readers can refer to the well-written books on
fuzzy pattern recognition by Bezdek [15] and [16]. In most
of the sections, the various approaches and methods are
summarized in tables as an easy and quick reference for the
reader. Due to space constraints, we are not able to provide
many details and we have to omit some of the approaches
and the associated references. Our goal is to emphasize
those approaches which have been extensively evaluated
and demonstrated to be useful in practical applications,
along with the new trends and ideas.
The literature on pattern recognition is vast and
scattered in numerous journals in several disciplines
(e.g., applied statistics, machine learning, neural net-
works, and signal and image processing). A quick scan of
the table of contents of all the issues of the IEEE
Transactions on Pattern Analysis and Machine Intelligence,
since its first publication in January 1979, reveals that
approximately 350 papers deal with pattern recognition.
Approximately 300 of these papers covered the statistical
approach and can be broadly categorized into the
following subtopics: curse of dimensionality (15), dimen-
sionality reduction (50), classifier design (175), classifier
combination (10), error estimation (25) and unsupervised
classification (50). In addition to the excellent textbooks
by Duda and Hart [44],
1
Fukunaga [58], Devijver and
Kittler [39], Devroye et al. [41], Bishop [18], Ripley [137],
Schurmann [147], and McLachlan [105], we should also
point out two excellent survey papers written by Nagy
[111] in 1968 and by Kanal [89] in 1974. Nagy described
the early roots of pattern recognition, which at that time
was shared with researchers in artificial intelligence and
perception. A large part of Nagy's paper introduced a
number of potential applications of pattern recognition
and the interplay between feature definition and the
application domain knowledge. He also emphasized the
linear classification methods; nonlinear techniques were
based on polynomial discriminant functions as well as on
potential functions (similar to what are now called the
kernel functions). By the time Kanal wrote his survey
paper, more than 500 papers and about half a dozen
books on pattern recognition were already published.
Kanal placed less emphasis on applications, but more on
modeling and design of pattern recognition systems. The
discussion on automatic feature extraction in [89] was
based on various distance measures between class-
conditional probability density functions and the result-
ing error bounds. Kanal's review also contained a large
section on structural methods and pattern grammars.
In comparison to the state of the pattern recognition field
as described by Nagy and Kanal in the 1960s and 1970s,
today a number of commercial pattern recognition systems
are available which even individuals can buy for personal
use (e.g., machine printed character recognition and
isolated spoken word recognition). This has been made
possible by various technological developments resulting in
the availability of inexpensive sensors and powerful desk-
top computers. The field of pattern recognition has become
so large that in this review we had to skip detailed
descriptions of various applications, as well as almost all
the procedures which model domain-specific knowledge
(e.g., structural pattern recognition, and rule-based sys-
tems). The starting point of our review (Section 2) is the
basic elements of statistical methods for pattern recognition.
It should be apparent that a feature vector is a representa-
tion of real world objects; the choice of the representation
strongly influences the classification results.
JAIN ET AL.: STATISTICAL PATTERN RECOGNITION: A REVIEW 7
1. Its second edition by Duda, Hart, and Stork [45] is in press.