PrinciplesandTheoryforDataMiningandMachineLearning资源-CSDN文库

需积分: 9 87 浏览量 2011-12-16 19:58:02 上传评论收藏 9.71MB PDF 举报

资源推荐

资源详情

资源评论

Springer Series in Statistics

Advisors:

P. Bickel, P. Diggle, S. Fienberg, U. Gather,

I. Olkin, S. Zeger

For other titles published in this series go to,

http://www.springer.com/series/692

Preface

The idea for this book came from the time the authors spent at the Statistics and

Applied Mathematical Sciences Institute (SAMSI) in Research Triangle Park in North

Carolina starting in fall 2003. The ﬁrst author was there for a total of two years, the

ﬁrst year as a Duke/SAMSI Research Fellow. The second author was there for a year

as a Post-Doctoral Scholar. The third author has the great fortune to be in RTP per-

manently. SAMSI was – and remains – an incredibly rich intellectual environment

with a general atmosphere of free-wheeling inquiry that cuts across established ﬁelds.

SAMSI encourages creativity: It is the kind of place where researchers can be found at

work in the small hours of the morning – computing, interpreting computations, and

developing methodology. Visiting SAMSI is a unique and wonderful experience.

The people most responsible for making SAMSI the great success it is include Jim

Berger, Alan Karr, and Steve Marron. We would also like to express our gratitude to

Dalene Stangl and all the others from Duke, UNC-Chapel Hill, and NC State, as well

as to the visitors (short and long term) who were involved in the SAMSI programs. It

was a magical time we remember with ongoing appreciation.

While we were there, we participated most in two groups: Data Mining and Machine

Learning, for which Clarke was the group leader, and a General Methods group run

by David Banks. We thank David for being a continual source of enthusiasm and

inspiration. The ﬁrst chapter of this book is based on the outline of the ﬁrst part of

his short course on Data Mining and Machine Learning. Moreover, David graciously

contributed many of his ﬁgures to us. Speciﬁcally, we gratefully acknowledge that

Figs. 1.1–6, Figs. 2.1,3,4,5,7, Fig. 4.2, Figs. 8.3,6, and Figs. 9.1,2 were either done by

him or prepared under his guidance.

On the other side of the pond, the Newton Institute at Cambridge University provided

invaluable support and stimulation to Clarke when he visited for three months in 2008.

While there, he completed the ﬁnal versions of Chapters 8 and 9. Like SAMSI, the

Newton Institute was an amazing, wonderful, and intense experience.

This work was also partially supported by Clarke’s NSERC Operating Grant

2004–2008. In the USA, Zhang’s research has been supported over the years by two

vi Preface

grants from the National Science Foundation. Some of the research those grants sup-

ported is in Chapter 10.

We hope that this book will be of value as a graduate text for a PhD-level course on data

mining and machine learning (DMML). However, we have tried to make it comprehen-

sive enough that it can be used as a reference or for independent reading. Our paradigm

reader is someone in statistics, computer science, or electrical or computer engineering

who has taken advanced calculus and linear algebra, a strong undergraduate probabil-

ity course, and basic undergraduate mathematical statistics. Someone whose expertise

in is one of the topics covered here will likely ﬁnd that chapter routine, but hopefully

ﬁnd the other chapters are at a comfortable level.

The book roughly separates into three parts. Part I consists of Chapters 1 through 4:

This is mostly a treatment of nonparametric regression, assuming a mastery of linear

regression. Part II consists of Chapters 5, 6, and 7: This is a mix of classiﬁcation, recent

nonparametric methods, and computational comparisons. Part III consists of Chapters

8 through 11. These focus on high dimensional problems, including clustering, di-

mension reduction, variable selection, and multiple comparisons. We suggest that a

selection of topics from the ﬁrst two parts would be a good one semester course and a

selection of topics from Part III would be a good follow-up course.

There are many topics left out: proper treatments of information theory, VC dimension,

PAC learning, Oracle inequalities, hidden Markov models, graphical models, frames,

and wavelets are the main absences. We regret this, but no book can be everything.

The main perspective undergirding this work is that DMML is a fusion of large sectors

of statistics, computer science, and electrical and computer engineering. The DMML

fusion rests on good prediction and a complete assessment of modeling uncertainty

as its main organizing principles. The assessment of modeling uncertainty ideally in-

cludes all of the contributing factors, including those commonly neglected, in order to

be valid. Given this, other aspects of inference – model identiﬁcation, parameter esti-

mation, hypothesis testing, and so forth – can largely be regarded as a consequence of

good prediction. We suggest that the development and analysis of good predictors is

the paradigm problem for DMML.

Overall, for students and practitioners alike, DMML is an exciting context in which

whole new worlds of reasoning can be productively explored and applied to important

problems.

Bertrand Clarke

University of Miami, Miami, FL

Ernest Fokou

Kettering University, Flint, MI

Hao Helen Zhang

North Carolina State University,

Raleigh, NC

剩余791页未读，继续阅读

评论收藏

内容反馈

ljbsdu

粉丝: 204
资源: 150

Principles and Theory for Data Mining and Machine Learning

最新资源

Principles and Theory for Data Mining and Machine Learning

Data mining and machine learning in cybersecurity

Statistics, Data Mining, and Machine Learning in Astronomy (astroML)

Principles And Theory For Data Mining And Machine Learning

Feature Engineering for Machine Learning_Principles and Techniques

Data Mining and Learning Analytics: Applications in Educational Research

Principles and Theory of Radar Interferometry.pdf

Principles of data mining

Feature Engineering for Machine Learning

Feature Engineering for Machine Learning - Alice Zheng

Principles of Data Mining by David Hand

Principles of Data Mining-3rd.pdf

Principles and Theory of Radar Interferometry !

Python Machine Learning

Principles.of.Data.Science

Machine Learning for OpenCV

Lie Group Machine Learning

Wireless Communications _ Principles, Theory and Methodology.xdf

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

仿真电路以及操作方法

【纯干货啊】华为IPD流程管理(完整版).pptx

可编程语言标准IEC61131-3中文版.pdf

OFDM完整仿真过程与教程.zip

信号与系统——保研复习资料.pdf

Landsat_WRS2.zip

最全的Visio形状/图形库

AxureRP9项目原型50套、案例20个、元件库1套.zip

北理工+成电+东南——通信/信号保研面试真题.pdf

数字信号处理——保研复习资料.pdf

风电和储能并网Simulink模型

最新资源

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar