The Element of Statistical Learnging (统计学习基础:数据挖掘、推理与预测)

所需积分/C币:18 2015-06-09 18:55:55 12.69MB PDF

During the past decade there has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field
This is page vii Printer: Opaque this Preface to the second edition -William Edwards Deming(1900-1993)1 We have been gratified by the popularity of the first edition of The Elements of Statistical Lcarning. This, along with thc fast pacc of rcscarch in the statistical learning field. Illotivaled us to update our book with a second edition We have added four new chapters and updated some of the existing chapters. Because many readers are familiar with the layout of the fire edition, we have tried to change it as little as possible. Here is a summary of the main changes 1 On the Web, this quote has been widely attributed to both Deming and Robertw Haydcn; howcver Professor Hayden told us that hc can claim no credit for this quote and ironically we could find no""confirming that Deming actually said this e becon Chapter What's new 1. Introduction 2. Overview of Supervised Learning 3. Linear Methods for Regression LAR algorithm and generalizations 4. Linear Methods for Classification Lasso path for logistic regression 5. Basis Expansions and Regulariza- Additional illustrations of RKHS 6. Kernel Smoothing Methods 7. Model Assessment and Selection Strengths and pitfalls of cross 8. Model Inference and Averaging 9. Additive models. Trees and Rclated Mcthods 10. Boosting and Additive Trees New example from ecology; some material split off to Chapter 16 11. Neural Networks Bayesian neural nets and the NIPs 2003 challenge 12. Support Vector Machines and Path algorithm for sVM classifier Flexible discriminants 13. Prototy Methods and Nearest-Neighbors 14. Unsupervised Learning Spectral clustering, kernel PCA sparsc PCA, non-negativc matrix factorization archetypal analysis nonlinear dimension reduction, Google ak algorithm. a direct approach to ICA 15. Random Forests N 16. Ensemble learning ew 17. Undirected Graphical Models New 18. High-Dimensional Problems N Some further notes Our first edition was unfriendly to colorblind readers: in particular we tended to favor red /green contrasts which are particularly trou blcsomc. Wc have changed th color palette in this edition to a largc extent, replacing the above with an orange/ blue contrast We have changed the name of Chapter 6 from "Kernel Methods " to Kernel smoothing methods to avoid confusion with the machine learning kernel method that is discussed in the context of support vec- tor machines(Chapter 11)and more generally in Chapters 5 and 14 In the first edition, the discussion of error-rate estimation in Chap ter 7 was sloppy. as we did not clearly differentiate the notions of conditional error rates(conditiona l on the training set)and uncondi- tional rates. We have fixed this in the new edition Preface to the Second dition Chapters 15 and 16 follow naturally from Chapter 10, and the chap ters are probably best read in that order In Chapter 17, wc have not attcmpted a comprehensive trcatmcnt of graphical lllodels, alld discuss only undirected Inodels and soine new methods for their estimation. Due to a lack of space, we have specifically omitted coverage of directed graphical models Chapter 18 explores the"p>>N problem, which is learning in high- dimensional feature spaces. These problems arise in many areas, in cluding genomic and proteonic studies, anld document classificatiON We thank the many readers who have found the(too numerous)errors in the first edition. We apologize for those and have done our best to avoid er- rors in this new edition. We thank Mark Segal, Bala Rajaratnam, and Larry Wasserman for commcnts on somc of the ncw chapters, and many Stanford raduate alld post-doctoral students who offered comments, in particular Mohammed alQuraishi, JOhn Boik, Holger Hoefling, Arian Maleki, Dona. McMahon. Saharon Rosset. Babak Shababa. Daniela witten. Ji Zhu and new edition. rt dedicates this edition to the memory of Anna Mcphee T. Ilui Zou. We thank John Kimmel for his patience in guiding us through thi Trevor hastie Robert tibshirani erome friedman Stanford. california August 2008 Preface to the Second Edition This is page xi Printer: Opaque this Preface to the first edition We are drowning in information and starving for knowledge Rutherford D. Roger The field of Statistics is constantly challenged by the problems that science and industry brings to its door. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded both in size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of " data mining; statistical and computational problcms in biology and Inledicinle hlave created "bioinformatics. Vast amounts of data are beilIs generated in many fields, and the statisticians job is to make sense of it all: to extract important patterns and trends, and understand "what the data says. We call this learning from data The challenges in learning from data have led to a revolution in the sta- tistical sciences. Since computation plays such a key role, it is not surprising that much of this new development has been done by researchers in other fields such as coNputer science and engineering The learning problems that we consider can be roughly categorized as either supervised or unsupervised In supervised learning, the goal is to pre- dict the value of an outcome measure based on a number of input measures in unsupervised learning, there is no outcome measure, and the goal is to describe the associations and patterns among a set of input measures Preface to the first edition This book is our attempt to bring together many of the important new deas in learning, and explain them in a statistical framework. While some mathematical details are needed, we emphasize the methods and their con- ceptual underpinnings rather than their theoretical properties. As a result, we hope that this book will appeal not just to statisticians but also to researchers and practitioners in d wide variety of fields Just as we have learned a great deal from researchers outside of the field f statistics, our statistical viewpoint may help others to better understand different aspects of learning There is no true interpretation of any thing: interpretation is a vehicle in the service of human comprehension. The value of interpretation is in, enabling others to fruit filly think about an idea -Andreas Buja We would like to acknowledge the contribution of many people to the conception and completion of this book. David Andrews, Leo Breiman Andreas Buja, John Chambers, Bradley Efron, Geoffrey Hinton, Werner Stuctzlc, and John Tukey have greatly influenced our carccrs. Balasub rumanian Narasimhan gave us advice and help on many computational problems, and maintained an excellent computing environment. Shin-Ho Bang helped in the production of a number of the figures. Lee Wilkinson gave valuable tips on color production. Ilana belitskaya, Eva Cantoni, maya Gupta, Michael Jordan, Shanti Gopatam, Radford Neal, Jorge Picazo, Bog- dan Popescu, Olivier Renaud, Saharon Rosset, John Storey, Ji Zhu, Mu Zhu, two reviewers and many students read parts of the manuscript and offered helpful suggestions. John Kimmel was supportive, patient and help ful at every phase; Mary Ann Brickner and Frank Ganz headed a superb production team at Springer. Trevor Hastie would like to thank the statis tics department at the University of Cape Town for their hospitality during the final stages of this book. We gratefully acknow ledge NSF and NIH their support of this work. Finally, we would like to thank our families and our parents for their love and support Trevor hasti robert tibshirani Je rome freedman orma May 2001 The quiet statisticians have clanged our world; not by discov ering neau facts or technical developments, but by ch anging the ways that we re ent and form our lan hack This is page xiii Printer: Opaque this Contents Preface to the second edition Preface to the first edition 1 Introduction 2 Overview of Supervised Learning 2.1 Introduction 9 2 Variable Types and Terminology 2.3 Two Simple Approaches to Prediction Least Squares and Nearest Neighbors 2.3.1 Linear Models and Least Squares 11 2.3.2 Nearest-Neighbor Methods 14 2.3.3 From Least Squares to Neighbors 16 2.4 Statistical Decision Theory 18 2.5 Local Methods in Iligh Dimensions 2.6 Statistical Models, Supervised Learning and Function Approximation 2.6.1 A Statistical Model for the Joint Distribution Pr(X,Y) 8 2.6.2 Supervised learnin 2.6.3 Function Approximation 2.7 Structured regression models 22233 2.7.1 Difficulty of the Problem


评论 下载该资源后可以进行评论 2

alto1394 The Element of Statistical Learnging. 2nd Edition (统计学习基础:数据挖掘、推理与预测),原版。
adidashuhuhu 对机器学习很有帮助

关注 私信 TA的资源