没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
试读
792页
Principles and Theory for Data Mining and Machine Learning Principles and Theory for Data Mining and Machine Learning Principles and Theory for Data Mining and Machine Learning
资源推荐
资源详情
资源评论
Springer Series in Statistics
Advisors:
P. Bickel, P. Diggle, S. Fienberg, U. Gather,
I. Olkin, S. Zeger
For other titles published in this series go to,
http://www.springer.com/series/692
Bertrand Clarke · Ernest Fokou
´
e · Hao Helen Zhang
Principles and Theory
for Data Mining
and Machine Learning
123
Bertrand Clarke
University of Miami
120 NW 14th Street
CRB 1055 (C-213)
Miami, FL, 33136
bclarke2@med.miami.edu
Ernest Fokou
´
e
Center for Quality and Applied Statistics
Rochester Institute of Technology
98 Lomb Memorial Drive
Rochester, NY 14623
ernest.fokoue@gmail.com
Hao Helen Zhang
Department of Statistics
North Carolina State University
Genetics
P.O.Box 8203
Raleigh, NC 27695-8203
USA
hzhang2@stat.ncsu.edu
ISSN 0172-7397
ISBN 978-0-387-98134-5 e-ISBN 978-0-387-98135-2
DOI 10.1007/978-0-387-98135-2
Springer Dordrecht Heidelberg London New York
Library of Congress Control Number: 2009930499
c
Springer Science+Business Media, LLC 2009
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The idea for this book came from the time the authors spent at the Statistics and
Applied Mathematical Sciences Institute (SAMSI) in Research Triangle Park in North
Carolina starting in fall 2003. The first author was there for a total of two years, the
first year as a Duke/SAMSI Research Fellow. The second author was there for a year
as a Post-Doctoral Scholar. The third author has the great fortune to be in RTP per-
manently. SAMSI was – and remains – an incredibly rich intellectual environment
with a general atmosphere of free-wheeling inquiry that cuts across established fields.
SAMSI encourages creativity: It is the kind of place where researchers can be found at
work in the small hours of the morning – computing, interpreting computations, and
developing methodology. Visiting SAMSI is a unique and wonderful experience.
The people most responsible for making SAMSI the great success it is include Jim
Berger, Alan Karr, and Steve Marron. We would also like to express our gratitude to
Dalene Stangl and all the others from Duke, UNC-Chapel Hill, and NC State, as well
as to the visitors (short and long term) who were involved in the SAMSI programs. It
was a magical time we remember with ongoing appreciation.
While we were there, we participated most in two groups: Data Mining and Machine
Learning, for which Clarke was the group leader, and a General Methods group run
by David Banks. We thank David for being a continual source of enthusiasm and
inspiration. The first chapter of this book is based on the outline of the first part of
his short course on Data Mining and Machine Learning. Moreover, David graciously
contributed many of his figures to us. Specifically, we gratefully acknowledge that
Figs. 1.1–6, Figs. 2.1,3,4,5,7, Fig. 4.2, Figs. 8.3,6, and Figs. 9.1,2 were either done by
him or prepared under his guidance.
On the other side of the pond, the Newton Institute at Cambridge University provided
invaluable support and stimulation to Clarke when he visited for three months in 2008.
While there, he completed the final versions of Chapters 8 and 9. Like SAMSI, the
Newton Institute was an amazing, wonderful, and intense experience.
This work was also partially supported by Clarke’s NSERC Operating Grant
2004–2008. In the USA, Zhang’s research has been supported over the years by two
v
vi Preface
grants from the National Science Foundation. Some of the research those grants sup-
ported is in Chapter 10.
We hope that this book will be of value as a graduate text for a PhD-level course on data
mining and machine learning (DMML). However, we have tried to make it comprehen-
sive enough that it can be used as a reference or for independent reading. Our paradigm
reader is someone in statistics, computer science, or electrical or computer engineering
who has taken advanced calculus and linear algebra, a strong undergraduate probabil-
ity course, and basic undergraduate mathematical statistics. Someone whose expertise
in is one of the topics covered here will likely find that chapter routine, but hopefully
find the other chapters are at a comfortable level.
The book roughly separates into three parts. Part I consists of Chapters 1 through 4:
This is mostly a treatment of nonparametric regression, assuming a mastery of linear
regression. Part II consists of Chapters 5, 6, and 7: This is a mix of classification, recent
nonparametric methods, and computational comparisons. Part III consists of Chapters
8 through 11. These focus on high dimensional problems, including clustering, di-
mension reduction, variable selection, and multiple comparisons. We suggest that a
selection of topics from the first two parts would be a good one semester course and a
selection of topics from Part III would be a good follow-up course.
There are many topics left out: proper treatments of information theory, VC dimension,
PAC learning, Oracle inequalities, hidden Markov models, graphical models, frames,
and wavelets are the main absences. We regret this, but no book can be everything.
The main perspective undergirding this work is that DMML is a fusion of large sectors
of statistics, computer science, and electrical and computer engineering. The DMML
fusion rests on good prediction and a complete assessment of modeling uncertainty
as its main organizing principles. The assessment of modeling uncertainty ideally in-
cludes all of the contributing factors, including those commonly neglected, in order to
be valid. Given this, other aspects of inference – model identification, parameter esti-
mation, hypothesis testing, and so forth – can largely be regarded as a consequence of
good prediction. We suggest that the development and analysis of good predictors is
the paradigm problem for DMML.
Overall, for students and practitioners alike, DMML is an exciting context in which
whole new worlds of reasoning can be productively explored and applied to important
problems.
Bertrand Clarke
University of Miami, Miami, FL
Ernest Fokou
´
e
Kettering University, Flint, MI
Hao Helen Zhang
North Carolina State University,
Raleigh, NC
剩余791页未读,继续阅读
资源评论
ljbsdu
- 粉丝: 204
- 资源: 150
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功