Machine Learning, Tom M. Mitchell

所需积分/C币:13 2012-06-28 22:00:52 34.84MB PDF

机器学习经典著作,作者Tom M .Mitchell是卡内基梅隆大学教授,美国《machine learning》杂志,国籍机器学习年度会议创始人,多种技术杂志攥稿人。
PREFACE The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. In recent years many successful machine learning applications have been developed, ranging fro data-mining programs that learn to detect fraudulent credit card transactions, to information-filtering systems that learn users' reading preferences, to autonomous vehicles that learn to drive on public highways. At the same time, there have been important advances in the theory and algorithms that form the foundations of this field The goal of this textbook is to present the key algorithms and theory that form the core of machine learning. Machine learning draws on concepts and results from many fields, including statistics, artificial intelligence, philosophy information theory, biology, cognitive science, computational complexity, and control theory. My belief is that the best way to learn about machine learning is to view it from all of these perspectives and to understand the problem settings, algorithms, and assumptions that underlie each. In the past, this has been difficult due to the absence of a broad-based single source introduction to the field. The primary goal of this book is to provide such an introduction. Because of the interdisciplinary nature of the material, this book makes few assumptions about the background of the reader. Instead, it introduces basic concepts from statistics, artificial intelligence, information theory and other disci- plines as the need arises, focusing on just those concepts most relevant to machine learning. The book is intended for both undergraduate and graduate students in fields such as computer science, engineering, statistics, and the social sciences and as a reference for software professionals and practitioners. two principles that guided the writing of the book were that it should be accessible to undergrad- uate students and that it should contain the material i would want my own Ph. D students to learn before beginning their doctoral research in machine learning XVI PREFACE a third principle that guided the writing of this book was that it should present a balance of theory and practice. Machine learning theory attempts to an- swer questions such as"How does learning performance vary with the number of training examples presented? and"Which learning algorithms are most appropri- ate for various types of learning tasks? "This book includes discussions of these and other theoretical issues drawing on theoretical constructs from statistics, com putational complexity, and Bayesian analysis. The practice of machine learning s covered by presenting the major algorithms in the field, along with illustrative traces of their operation. Online data sets and implementations of several algo rithmsareavailableviatheWorldwidewebat mbook. html. These include neural network code and data for face recognition decision tree learning code and data for financial loan analysis, and Bayes clas sifier code and data for analyzing text documents. i am grateful to a number of colleagues who have helped to create these on line resources, including Jason Ren nie, Paul Hsiung, Jeff Shufelt, Matt Glickman, Scott Davies, Joseph O'Sullivan Ken Lang, Andrew McCallum, and Thorsten Joachims ACKNOWLEDGMENTS In writing this book, I have been fortunate to be assisted by technical experts in many of the subdisciplines that make up the field of machine learning. This book could not have been written without their help. I am deeply indebted to the following scientists who took the time to review chapter drafts and, in many cases, to tutor me and help organize chapters in their individual areas of expertis Avrim Blum, Jaime Carbonell, William Cohen, Greg Cooper, Mark Craven, Ken DeJong, Jerry DeJong, Tom Dietterich, Susan Epstein, Oren Etzioni Scott Fahlman, Stephanie Forrest, David Haussler, Haym Hirsh, Rob Holte, Leslie Pack Kaelbling, Dennis Kibler, Moshe Koppel, John Koza, Miroslav Kubat, John Lafferty, Ramon Lopez de mantaras, Sridhar Mahadevan, Stan Matwin, Andrew McCallum, Raymond Mooney, Andrew Moore, Katharina Morik, Steve Muggleton, Michael Pazzani, David Poole, Armand Prieditis Jim Reggia, Stuart Russell, Lorenza Saitta, Claude Sammut, Jeff Schneider, Jude Shavlik, Devika Subramanian, Michael Swain, Gheorgh Tecuci, Se bastian Thrun, Peter Turney, Paul Utgoff, Manuela Veloso, Alex Waibel Stefan Wrobel, and Yiming Yang I am also grateful to the many instructors and students at various universi ties who have field tested various drafts of this book and who have contributed their suggestions. Although there is no space to thank the hundreds of students instructors, and others who tested earlier drafts of this book i would like to thank the following for particularly helpful comments and discussions Shumeet Baluja, Andrew Banas, Andy Barto, Jim Blackson, Justin Boyan Rich Caruana, Philip Chan, Jonathan Cheyer, Lonnie Chrisman, Dayne Frei tag, Geoff Gordon, Warren Greiff, Alexander Harm, Tom loerger, Thorsten PREFACE X Joachim, Atsushi Kawamura, Martina Klose, Sven Koenig, Jay modi, An- drew Ng, Joseph O'Sullivan, Patrawadee Prasangsit, Doina Precup, Bob Price Choon quek, Sean slattery Belinda Thom. Astro Teller. Will tracz would like to thank Joan Mitchell for creating the index for the book. I also would like to thank Jean Harpley for help in editing many of the figures Jane Loftus from ETP Harrison improved the presentation significantly through her copyediting of the manuscript and generally helped usher the manuscript through the intricacies of final production. Eric Munson, my editor at McGraw Hill, provided encouragement and expertise in all phases of this project. As always, the greatest debt one owes is to one' s colleagues, friends, and family. In my case, this debt is especially large. I can hardly imagine a more intellectually stimulating environment and supportive set of friends than those I have at Carnegie Mellon. Among the many here who helped, i would especially like to thank sebastian Thrun, who throughout this project was a constant source of encouragement, technical expertise, and support of all kinds. My parents,as always, encouraged and asked"Is it done yet? " at just the right times. Finally, I must thank my family: Meghan, Shannon, and Joan. They are responsible for this book in more ways than even they know. This book is dedicated to them Tom M. Mitchell Copyrighted material; sample page 2 of 30 CONTENTS Preface 再 Acknowledgments xvi 1 Introduction 1.1 Well-Posed Leaming Problems 1.2 Designing a Learming System 1.2.1 Choosing the Tmining Experience 1. 2.2 Choosing the Target Function 7 1.2.3 Choosing a Representation for the Target Function 1.21 4 Choosing a Function Approximation Algorthim 9 1.2.5 The Final Design 1.3 Perspectives and Issues in Machine Leaming 1.3.1 Issues in Machine Learning 1 4 How to Read This book 6 1.5 Summary and Further Reading Exercises References 19 2 Concept Learning and the General-to-Specific Ordering 20 21 Introduction 20 2. 2 A Concept Learning Task 21 ?21 Notation 2.2.2 The Inductive Learning Hy pothesis 23 2 3 Concept Leaming as Searth 23 2.3. 1 Gencral-to-Specitic Ordering of Hypotheses 24 2.4 FIND- S: Finding a Maximally Specitic Hypothesis 26 2.5 Version Spaces and the CANDIDATE-ELIMINATION Algorithm 2.5.1 Representation 29 2.5.2 The LIST-THEN-ELIMINATE Algorithm 2.5.3 A More Compact Representation for Version Spaces Copyrighted material; sample page 3 of 30 vi CONTENTS 2.5.4 CANDIDATE-ELIMINATION Leaming Algorithm 2.5-5 An Illustrative Example 6 Remarks on Version Spaces and CANDIDATE-ELIMINATION 261 Will the CANDIDATE- ELIMINATIN Algrithm Converge to the Correct Hypothesis? 2.6.2 What Training Example Should the Leamer Reques Next! 2.6.3 How Can Partially Leamed Concepts Be Used? g 2.7 Inductive Bias 39 2.7.1 A Biased Hypothesis Spice 40 2.7.2 An Unbiased Learner 2.7.3 The Futility of Bias-Free Learning 42 2.8 Summary and Further Reading 45 Exercises 47 Refercnces 50 3 Decision Tree Learming 3.1 Introduction 5 3.2 Decision Tree RepresentatioN 52 3.3 Appropriate Problems for Decision Tree Learming 3.4 The Basic Decision Tree Learning Algorithm 3. 4.1 Which Attribute Is the Best Classifier? 3.4.2 An Illustrative Example 3.5 Hypothesis Space Search in Decision Tree Learnin 3.6 Induetive Bias in Deision Tree Leiming 63 3. 6.1 Restriction Biases and Preference Biases 3.6.2 Why Prefer Short Hypotheses? 5 3.7 Issues in Decision Tree Learning 3.7.1 Awoiding Overfitting the Data 3.7.2 Incorporating Continuous-Valued Attributes 72 3.7.3 Alternative Measures for Selecting Attributes 3.7.4 Handling Training Examples with Missing Attribute Vallies 3.75 Handling Attributes with DifFering Costs 75 5.8 Summary and Further Reading 76 Exercises References 78 4 Artificial Neural Networks 4,1 Initinduetian 4. I. Biological Motivation 82 4. 2 Neural Network Representations 2 4.3 Appropriate Prublelis for Neural Network Leaning 83 44 Pereeptroris s4s 4.4.1 Representational Power of Perceptrons 4.4.2 The Perceptron Training Rule 4.4.3 Gradient Dement arid the Delta Rule 444 Remarks Copyrighted material; sample page 4 of 30 4.5 Multilayer Networks and the BACKPROPAGATION Algonthm 4. A Differentiable Threshold Uril 4.5. 2 The BACKPROPAGATION Algorithm 9了 4.5.3 Derivation of the BACKPROPAGATION Rule 4 t Remarks on the BACkPRoPAGATION Algorithm 04 4. 6.1 Convergence and Local Minima 4 4.6.2 RepresenTational Power of Feedforward Nerworks 4. 6, 3 Hypothesis Space Search and Inductive Bias 4.6.4 Hidden Layer Representations 4.6.5 Generalization, Overfitting, and Stopping Cnterion IOH 4.7 An Illustrative Example: Face Recognition 12 4. 7.1 The Task 112 4.7.2 Design Choices 113 4.7.3 Lead Hidden Representations l16 4.8 Advanced Topics in Arificial Neural Networks 117 a 8 Ahemative error fuiscriots 4.8.2 Altenative error minimization procedures 19 4. 5.3 Recurrent Networks l19 4. 8.4 Dynamically Modifying Network Structure 12I 4, 9 Summary and Further Reading Exercises 124 References 26 5 Evaluating Hypotheses 128 5.1 Motivation 128 5.2 Estimating Hypothesis Accuracy 129 5.2.1 Sample Error and True Error 130 5.2.2 Confidence Intervals for Discrete-valued Hypotheses 13L 5.3 Basics of Sampling Theory 132 5.3.1 Error Estimation and Estimating Binomial Proportions 133 5.3.2 The Binomial Distribution 135 5,33 Mean and variance 136 53.4 Estimators, Bias, and variance 137 5.3.5 Confidence Intervals 138 5.3.6 Two-Sided and One-Sided bounds 14I 4 A General Approach for Deriving Confidence Intervals 142 5.4.1 Central Limit Theorem 142 5.5 Difference in Error of Two Hypotheses 143 5.5.1 Hypothesis Testing 144 5.t Comparing Leaming Algorithms 145 5.6.1 Paired r Tests 148 5.4.2 Practical Considerations 149 5.7 Summary and Further Reading 50 Exercises 52 References 152 6 Bayesian Learning 154 6.1 Introduction 154 6.2 Bayes Theorem 156 6. 2.1 An Example 157 Copyrighted material; sample page 5 of 30 CwT上NTs h3 Bayes Theorem and Concept Learming 58 5.3.1 Brute-Force Bayes Coneept Learning 159 6,3, 2 MAP Hypotheses and Consistent Leamers 6.4 Maximum Likelihood and Least-Squared Error Hypotheses 6.5 Maximum Likelihood Hypotheses for Predicting Probabilities 167 6.5 Gradient search to maximize likelihood in a neurial Nel 70 6.6 Minimum Description Length Principle 6.7 Bayes Optimal Classifier 174 6. s Gibbs Algorithm l76 6.9 Naive Bayes Classifier l77 6.9.1 An Illustrative Example 78 6.10 An Example: Learning lo Classify Text I80 6.10.1 Experimental Results 82 6. 1I Bayesian Belie Networks 6.I1I Conditional Independence l85 6.112 Representation 86 b.l3 nference l87 6. I 1.4 Leiming Bayesian Belief Networks 88 6.11.5 Gradient Ascent Training of Bayesian Networks 88 6.11. 6 Leiming the Structure of Bayesian Networks 6 1? Th eM alsorithm 191 6.12.1 Estimating means of k gaussians 191 h.12,2 General Statement of EM Algorithm 6.123 Derivation of the k Means Aleonthii 195 0. 13 Summary and Further Reading 97 E cilS 19s References computational Learning Theory 201 7。 InTroduction 201 7.2 Probably Leaming an Approximately Correct Hypothesis 203 1.2.1 The Problem Setting 203 7.2.2 Error of a Hypothesis 20 3.2.3 PAC Learnability 105 7.3 Sample Complexity for Finite Hypothesis Spaces 27 1.3.I Agnoslic Learning and Inconsistent Hypotheses 210 7.3, 2 Conjunctions of Boolean Literals Are PAC-Leamable 211 1.3.3 PAC-Learnability of Other Coneept Classes 212 7,4 Sample Complexity for Infinite Hypothesis Spaces 214 7.4.1 Shattering a Sel of Instances 214 7,4,2 The Vapnik-Chervonenkis Dimension 215 7.4.3 Sample Complexity and the vC Dimension 217 T, 4 vC Dimension for Neural Networks 218 7.5 The Mistake Bound Model of Learning 0 7.5.1 Mistake Bound for the FIND-S Algorithm 220 7.5.2 Mistake Bound for the HALVING Algorithm 71 7.5.3 OpIimal Mistake Boinds 77 7.5.4 WEIGH TED-MAJOMITY Algorithm Copyrighted material; sample page 6 of 30 ITHIIE 7.6 Summary and Further Reading Exercises 27 References 8 Instance- Based Le earning 230 8.1 Introduction 230 82 NEAREST NEIGHBOR LEARNING 231 8.2. Distance- Weighted NEAREST NEIGHBOR Algorithm 233 8.22 Remarks on k-NEAREST NEIGHBOR Algorithm 234 8. 2.3 A Note on Terminology 236 8.3 Locally Weighted Regression 236 3.3. Locally Weighted Linear Regression 237 8.3.2 Remarks on Locally Weighted Regression 23 8 4 Radial Basis functions 238 8.5 Case- Based Resoning 240 8.6 Remarks on Lazy and Eager Learning 244 8.7 Summary and Further Reading 245 吧rses 247 References 247 9 Genetic Algorithms 249 9. 1 Motivation 249 9. 2 Genetic algorithms 250 9.2.1 Representing Hypotheses 25 92 Genetie Operators 253 9.24 Fitness function and selection 255 9.3 An Illustrative Example 256 93. Extensions 258 9 4 Hypothesis Space Search 159 9. 4. Population Evolution and the Schema Theorem 260 9.s Genetic Pr imint 262 9.5.1 Representing Programs 262 3.5.2 Illustrative Example 263 9.5.3 Remarks on Genetic Programining 后5 9.6 Molels of Evolution and learning 266 9.6.1 Lamarckian Evolution 266 9.6.2 Baldwin eltect 267 9.7 Parallelizing Genetic AlgorithIns 26 9.8 Summary and Further Reading 268 Exercises 270 References 270 10 Learning Scts of Rules 274 10.1 Introduction 274 10.2 Sequential Covering Algorithms 275 10.2.1 General to Specific Beam Search 277 10.22 variations 279 10.3 Leaming Rule Sets: Surnmary 280


评论 下载该资源后可以进行评论 8

physicst 不是很清晰,很经典的书,多谢
engineerxc 这本书算是机器学习领域的经典教材,可惜是扫描版,没有那么清晰
bianwencyy 终于下到此书了,机器学习的经典教材
cosecant_csc 这学期的教材,感谢楼主
jassicachen 除了目录页,其他地方都挺清楚的,辛苦了~很感谢
meto1111 好东西经典教材
wind23_mn 经典的机器学习教材,不过扫描版不太清晰,还是建议到书店去买
Shual 不清楚,扫面的

关注 私信 TA的资源