Pattern Classification 2nd Edition(模式分类 )

所需积分/C币:11 2015-06-09 18:42:49 14.42MB PDF

The first edition, published in 1973, has become a classic reference in the field. Now with the second edition, readers will find information on key new topics such as neural networks and statistical pattern recognition, the theory of machine learning, and the theory of invariances. Also included ar
CONTENTS PREFACE INTRODUCTION I I Machine Perception, I 1. 2 An Example, 1 1.2.1 Related Fields. 8 1.3 Pattern Recognition Systems. 9 1.3. I Sensing, 9 1.3.2 Segmentation and Grouping. 9 1.3. 3 Feature Extraction, I1 1.3.4 Classification. 12 1.3.5 Post Processing, 13 1, 4 The Design Cyele, 14 14. 1 Data Collection. 14 1, 4.2 Feature Choice, 14 1.4.3 Model Choice, 15 IA.4 Training. I5 1.4.5 Evaluation. 15 1. 4.6 Computational Complexity, 16 1.5 Learning and Adaptation, 16 1.5. 1 Supervised Learning, 16 1.5.2 Unsupervised Learning. 17 1.5.3 Reinforcement Leaming. 17 1.6 Conclusion. 17 Summary by Chapters. I7 Bibliographical and Historical Remarks, 18 Bibliography. 19 2 BAYESIAN DECISION THEORY 20 2.1 Introduction, 20 2.2 Bayesian Decision Theory-Continuous Features, 24 2.2.1 Two-Category Classification. 25 23 Minimum- Error-Rate Classification. 26 -2.3. 1 Minimax Criterion, 27 VilI CONTENTS 2.3.2 Neyman. Pearson Criterion, 28 2. 4 Classifiers. Discriminant Functions, and Decision Surfaces, 29 2. 4. 1 The Multicategory Case, 29 2.4.2 The Two-Category Case, 30 2.5 The Normal Density, 31 2.5.1 Univariate Density. 32 2.5.2 Multivariate Density, 33 2.6 Discriminant Functions for the Normal Density. 36 2.6.1 Case 1: 2=o-1,36 262Casc2:m,39 2.6.3 Case 3: 2= arbitrary, 41 Example I Decision Regions for Two-Dimensional Gaussian Data, 4I 2.7 Error Probabilities and Integrals, 45 .2 8 Error Bounds for Normal Densities 2.8. 1 Chernoff Bound. 46 2.8.2 Bhattacharyya Bound, 47 Example 2 Error Bounds for Gaussian Distributions. 48 2.8.3 Signal Detection Theory and Operating Characteristics, 48 2.9 Bayes Decision Theory-Discrete Features, SI 2.9.1 Independent Binary Features, 52 Example 3 Bayesian Decisions for Three-Dimensional Binary Data. 53 a2.10 Missing and Noisy Features. S4 2.10.1 Missing Features 2.10.2 Noisy Features. 55 e2. 11 Bayesian Belief Networks, 56 Example 4 Belief Network for Fish. 59 "2. 12 Compound Bayesian Decision Theory and Context, 62 St uninary. 63 Bibliographical and Histoncal Remarks, 64 Problems. 65 Computer exercises. so Bibliography, 82 MAXIMUM-LIKELIHOOD AND BAYESIAN 3PARAMETER ESTIMATION 3.1 Introducti 3.2 Maximum- Likelihood Estimation, 85 3.2.1 The General Principle. & 3.2.2 The Gaussian Case: Unknown u. 88 3. 2. 3 The Gaussian Case: Unknown u and 2. 3.24Bias,89 3.3 Bayesian Estimation, 9o 3.3. 1 The Class. Conditional Densities, 91 3.3.2 The Parameter Distribution, 91 3.4 Bayesian Parameter Estimation: Gaussian Case, 92 3.4.1 The Univariate Case: P(u(D), 92 3.4.2 The Univariate Case: P(x[ D). 95 3. 4.3 The Multivarate Case, 95 CONTENTS Ix 3.5 Bayesian Parameter Estimation: Gencral Theory, 97 Example I Recursive Bayes Learning. 98 3.5.1 When Do MaximumL ikelihood and Bayes Methods Differ?, 100 3.5.2 Noninformative Priors and Invariance, 101 3.5.3 Gibbs Algorithm, 102 *3. Sufficient Statistics, 102 3.6.1 Sufficient Statistics and the Exponential Family. 106 3.7 Problems of Dimensionality, 107 3.7.1 Accuracy, Dimension, and Training Sample Size, 107 3.7.2 Computational Complexity, II1 3.7.3 Overfitting, 113 3.8 Component Analysis and Discriminants, 114 3. 8. 1 Principal Component Analysis(PCA), 115 3.8.2 Fisher Linear Discriminant, 117 3.8.3 Multiple Discriminant Analysis, 121 3.9 Expectation-Maximization(EM), 124 Example 2 Expectation-Maximization for a 2D Normal Model, 126 3. 10 Hidden Markov Models, 128 3. 10.1 First-Order Markov Models, 128 3. 10.2 First- Order Hidden Markow Models, 12 3. 10.3 Hidden Markow Model Computation, 129 3. 10.4 Evaluation. 131 Example 3 Hidden Markov Model. 133 3. 10.5 Decoding. 135 Example 4 HMM Decoding. 136 3, 10. 6 Learning, 137 Summary, 139 Bibliographical and Historical Remarks, 139 Problems, 1440 Computer exercises. 155 Bibliography, 159 4 NONPARAMETRIC TECHNIQUES 161 4.1 Introduction. 161 4. 2 Density Estimation, 161 4 3 Parzen Windows, 164 4.3.1 Convergence of the Mean, 167 4.3.2 Convergence of the Variance, 167 4.3.3 Illustrations, 168 4.3.4 Classification Example, 168 4.3.5 Probabilistic Neural Networks(PNNs, 172 4.3.6 Choosing the Window Function, 174 4.4 A,- Neighbor Estimation, 174 4.4.1 A,-Nearest Neighbor and Parzen. Window Estimation, 176 4.4.2 Estimation of A Posteriori Probabilities, 177 4.5 The Nearest- Neighbor Rule, 177 4.5. 1 Convergence of the Nearest Neighbor. 179 4.5.2 Error Rate for the Nearest-Neighbor Rule, 180 4.5.3 Error Bounds, 180 4.5.4 The k-Nearest-Neighbor Rule, 182 X CONTEN 4.5.5 Computational Complexity of the k-Nearesi-Neighbor Rule, 184 4.6 Metrics and Nearest-Neighbor Classification. 187 4.6.1 Properties of Metrics, 187 4. 6.2 Tangent Distance, 188 .4 7 Fuzzy Classification, 192 4.8 Reduced Coulomb Energy Networks, 195 4.9 Approximations by Series Expansions, 197 Summary, 199 Bibliographical and Historical Remarks, 200 Problems. 201 Computer exereises, 209 Bibliography, 213 5 LINEAR DISCRIMINANT FUNCTIONS 215 S 1 Introduction, 215 5.2 Linear Discriminant Functions and Decision Surfaces, 216 5.2.1 The Two-Category Case, 216 5.2.2 The Multicategory Case. 218 5.3 Generalized Linear Discriminant Functions, 219 5.4 The Two-Category Linearly Separable Case, 223 5.4.1 Geometry and Terminology. 224 5.4.2 Gradient Descent Procedures, 224 5.5 Minimizing the perceptron Criterion Function, 227 5.5.1 The Perceptron Criteron Function, 227 5.5.2 Convergence Proof for Single-Sample Correction, 229 S.5.3 Some Direct Generalizations, 232 5.6 Relaxation Procedures, 235 5.6.1 The Descent Algonthm. 235 5.6.2 Convergence Proof, 237 5.7 Nonseparable Behavior, 238 5.8 Minimum Squared-Error Procedures. 239 5,8. 1 Minimum Squared- Error and the pseudoinverse. 240 Example I Constructing a Linear Classifier by Matrix Pseudoinverse. 241 5. 8. 2 Relation to Fishers Linear Discriminant, 242 5.8.3 Asymptotic Approximation to an Optimal Discriminant, 243 5.8.4 The Widrow. Hoff or L MS Procedure 245 5.8.5 Stochastic Approximation Methods, 246 5.9 The Ho-Kashyap Procedures, 249 5.9.1 The Descent Procedure, 250 5.9.2 Comergence Proof. 25/ 5.9.3 Nonseparable Behavior. 253 5.94 Some Related Procedures, 253 5.10 Linear Programming Algorithms, 256 5.10.1 Linear Programming, 256 5. 10.2 The Linearly Separable Case. 257 5. 10.3 Minimizing the Perceptron Criterion Function, 258 5.11 Support Vector Machines, 259 5.11.1 SVM Training. 263 Example 2 SVM for the XOR Problem. 264 CONTENTS XI 5.12 Multicategory Generalizations. 265 5.12.1 Kesler's Construction. 266 5.12.2 Convergence of the Fixed-Increment Rule. 266 5.12.3 Generalizations for MSE Procedures, 268 Summary, 269 Bibliographical and Historical Remarks, 270 Problems, 271 Computer exercises, 278 Bibliography, 281 6MULTILAYER NEURAL NETWORKS 282 6.1 Introduction, 282 6.2 Feedforward Operation and Classification, 284 6.2. 1 General Feedforward Operation. 286 6. 2.2 Expressive Power of Multilayer Networks, 287 6.3 Backpropagation Algorithm. 288 6.3.1 Network Learning, 289 6.3.2 Training Protocols, 293 6.3.3 Learning Curves, 295 6.4 Error Surfaces. 296 6.4.1 Some Small Networks, 296 6.4.2 The Exclusive-OR(XOR). 298 6.4.3 Larger Networks. 298 6.4.4 How Important Are Multiple Minima?, 299 6.5 Backpropagation as Feature Mapping, 299 65. 1 Representations at the Hidden Layer-Weights, 302 6.6 Backpropagation, Bayes Theory and Probability. 303 6.6.1 Baves Discriminants and Neural Networks. 303 6. 6.2 Outputs as Probabilities. 304 .6. 7 Related Statistical Techniques, 305 6. 8 Practical Techniques for Improving Backpropagation, 306 6.8.1 Activation Function, 307 6.8.2 Parameters for the Sigmoid. 308 6.8.3 Scaling Input. 30S 6.8, 4 Target values, 309 6. 8.5 Training with Noise. 310 6.8.6 Manufacturing Data, 310 6.8. 7 Number of Hidden Units, 310 6. 8. 8 Initializing Weights, 311 6.8.9 Leaming Rates, 312 68 10 Momentum. 313 6. 8. 1I Weight Decay. 314 68.12 Hints.315 6.8.13 On-Line. Stochastic or Batch Training,, 316 6.8. 14 Stopped Training. 316 68. 15 Number of Hidden Layers, 317 6.8. 16 Criterion Function, 318 6.9 Sccond-Order Methods, 318 6.9.1 Hessian Matrix. 318 6.9.2 Newton,s Method, 319 XI CONTENTS 6.9.3 Quickprop. 320 6.9, 4 Conjugate Gradient Descent, 321 Example I Conjugate Gradient Descent, 322 6.10 Additional Networks and Training Methods. 324 6. 10. 1 Radial Basis Function Networks(RBFs). 324 6. 10.2 Special Bases, 325 6. 10.3 Matched Filters, 325 6. 10. 4 Convolutional Networks, 326 6.10.5 Recurrent Networks. 328 6.10.6 Cascade-Correlation. 329 6.11 Regularization, Complexity Adjustment and Pruning, 330 Summary. 333 Bibliographical and Historical Remarks, 333 Problems, 335 Computer exercises, 343 Bibl hy.347 7STOCHASTIC METHODS 350 7.1 Introduction, 3s 7.2 Stochastic Search, 351 7.2.1 Simulated Annealing, 351 7.2.2 The Boltzmann Factor. 352 7.2.3 Deterministic Simulated Annealing. 357 7.3 Boltzmann Learning. 360 7.3.1 Stochastic Boltzmann Learning of Visible States, 360 7.3.2 Missing Features and Category Constraints, 365 7.3.3 Deterministic Boltzmann Learning, 366 7.3.4 Initialization and Setting Parameters, 367 74 Boltzmann Networks and Graphical Models, 370 7.4.1 Other Graphical Models, 372 -7.5 Evolutionary Methods, 373 7.5. 1 Genetic Algorithms, 373 7.5.2 Further Heuristics, 377 7.5.3 Why do They work?. 378 .7 6 Genetic Programming, 378 umar ry. 381 Bibliographical and Historical Remarks, 381 Problems. 383 Computer exercises, 388 Bibliography, 391 8 NONMETRIC METHODS 394 8. 1 Introduction, 394 8.2 Decision Trees. 395 8.3cART.396 83 1 Number of Splits, 397 8.3.2 Query Selection and Node Impurity, 398 8.3.3 When to Stop Splitting. 402 83, 4 Pruning. 403 CONTENTS XIlI 8.3.5 Assignment of Leaf Node Labels, 404 Example I A Simple Tree, 404 8.3.6 Computational Complexity, 406 8.3.7 Feature Choice. 407 8. 3. 8 Multivariate Decision Trees, 408 8.3.9 Priors and Costs, 409 83. 10 Missing Attributes, 409 Example 2 Surrogate Splits and Missing Attributes. 410 8.4 Other Tree Methods, 411 84.lID3.4l1 84.2C4.5.411 84. 3 Which Tree Classifer Is Best?. 412 *8.5 Recognition with Strings, 413 8.5.1 String Matching, 415 8. 5. 2 Edit Distance. 418 8.5.3 Computational Complexity, 420 8.5.4 String Matching with Errors, 420 8.5.5 String Matching with the"Dont-Care"Symbol. 421 8.6 Grammatical Methods, 421 8.6.1 Grammars, 422 8.6.2 Types of String Grammars, 424 Example 3 A Grammar for Pronouncing Numbers, 425 8.6.3 Recognition Using Grammars, 426 8. 7 Grammatical Inference, 429 Example 4 Grammatical Inference, 431 .8 8 Rule- Based Methods, 431 8.8.1 Learning Rules, 433 Summary, 434 Bibliographical and Historical Remarks, 435 Problems, 437 Computer exercises, 446 Bibliography, 450 9 ALGORITHM-INDEPENDENT MACHINE LEARNING 453 9.1 Introduction, 453 9.2 Lack of Inherent Superiority of Any Classifier. 454 9.2.1 No Free Lunch Theorem, 454 Example I No Free Lunch for Binary Data, 457 *9.2.2 Ugly Duckling Theorem, 458 9.2.3 Minimum Description Length(MDL), 461 9.2.4 Minimum Description Length Principle, 463 9.2.5 Overfitting Avoidance and Occams Razor. 4 9.3 Bias and Variance, 465 9.3. 1 Bias and Variance for Regression. 466 9.3.2 Bias and variance for Classification, 468 9.4 Resampling for Estimating Statistics, 471 9.4.1 Jackknife, 472 Example 2 Jackknife Estimate of Bias and Variance of the Mode, 473 9.4.2 Bootstrap, 474 9.5 Resampling for Classifier Design, 475 XIV CONTENTS 95 1 Bagging. 475 9.5.2 Boosting. 476 9.5.3 Learming with Queries, 480 9.5.4 Arcing. Learning with Quenes, Bias and Variance, 482 9 6 Estimating and Comparing Classitiers, 482 9.6.1 Parametric Models, 483 9.6.2 Cross Validation, 483 9.6.4 Maximum- \ikelihood Model Companson. a% on Accuracy. 485 9. 6. 3 Jackk nife and Bootstrap Estimation of Classifica 9.6.5 Bayesian Model Comparison, 487 9.6.6 The Problem-Average Error Rate, 489 9. 6. 7 Predicting Final Performance from Learning Curves, 492 9,6. 8 The Capacity of a Separating Plane, 494 9.7 Combining Classifiers, 495 9.7. I Component Classifiers with Discnminant Functions, 496 9.7.2 Component Classifiers without Discriminant Functions. 498 Summary. 499 Bibliographical and Historical Remarks, 500 Problems, 502 Computer exereises, 308 Bibliography. 513 10 UNSUPERVISED LEARNING AND CLUSTERING 517 10. 1 Introduction, 517 10.2 Mixture Densities and Identifiability, 518 10.3 Maximum Likelihood Estimates. 519 10.4 Application to Normal Mixtures, 521 104.1 Case I: Unknown Mean Vectors, 522 10.4.2 Case 2: All Parameter Unknown. 524 104.3 KMeans Clustering. 526 10. 4.4 Fuzzy k-Means Clustering. 528 10.5 Unsupervised Bayesian Leaming, 530 10.5.1 The Bayes Classifier. 530 10.5.2 Leaming the Parameter Vector. 531 Example I Unsupervised Learning of Gaussian Data, 534 10.5.3 Decision-Directed Approximation. 536 10. 6 Data Description and Clustering. 537 10.6. 1 Similarity measures. 538 10.7 Criterion Functions for Clustering, 542 10.7.1 The Sum-of-Squared-Error Criterion, 542 10.7.2 Related Minimum Variance Criteria, 543 10.7.3 Scatter Criteria. 544 Example 2 Clustering Criteria, 546 10.8 Iterative Optimization. $48 10.9 Hierarchical Clustering 550 10.9.1 Detinitions, 551 10.9.2 Agglomerative Hierarchical Clustering. 552 10.9.3 StepwiseOptimal Hierarchical Clustering. 555 10.94 Hierarchical Clustering and Induced Metrics, 556 e10.10 The Problem of validity. 557

...展开详情

评论 下载该资源后可以进行评论 1

cereusxing 不错的内容,慢慢读
2019-04-22
回复
img
lengwuqin

关注 私信 TA的资源

上传资源赚积分,得勋章
最新资源