MiningofMassiveDatasets（大数据挖掘）资源-CSDN文库

4星 · 超过85%的资源需积分: 10 32 浏览量 2012-11-24 09:49:55 上传评论收藏 1.98MB PDF 举报

资源推荐

资源详情

资源评论

Mining

of

Massive

Datasets

Anand Rajaraman

Kosmix, Inc.

Jeﬀrey D. Ullman

Stanford Univ.

Copyright

c

 2010 Anand Rajaraman and Jeﬀrey D. Ullman

Contents

1 Data Mining 1

1.1 What is Data Mining? . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Statistical Modeling . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . 2

1.1.3 Computational Approaches to Modeling . . . . . . . . . . 2

1.1.4 Summarization . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.5 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Statistical Limits on Data Mining . . . . . . . . . . . . . . . . . . 4

1.2.1 Total Information Awareness . . . . . . . . . . . . . . . . 5

1.2.2 Bonferroni’s Principle . . . . . . . . . . . . . . . . . . . . 5

1.2.3 An Example o f Bonferroni’s Principle . . . . . . . . . . . 6

1.2.4 Exercises fo r Section 1.2 . . . . . . . . . . . . . . . . . . . 7

1.3 Things Useful to Know . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.1 Importance of Words in Documents . . . . . . . . . . . . 7

1.3.2 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.3 Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.4 Secondary Stor age . . . . . . . . . . . . . . . . . . . . . . 11

1.3.5 The Base of Natura l Log arithms . . . . . . . . . . . . . . 12

1.3.6 Power Laws . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.7 Exercises fo r Section 1.3 . . . . . . . . . . . . . . . . . . . 15

1.4 Outline of the Book . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5 Summary of Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . 17

1.6 References for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . 18

2 Large-Scale File Systems and Map-Reduce 19

2.1 Distributed File Systems . . . . . . . . . . . . . . . . . . . . . . . 20

2.1.1 Physical Organization of Compute Nodes . . . . . . . . . 20

2.1.2 Large-Scale File-System O rganization . . . . . . . . . . . 21

2.2 Map-Reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.1 The Map Tasks . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.2 Grouping and Aggregation . . . . . . . . . . . . . . . . . 24

2.2.3 The Reduce Tasks . . . . . . . . . . . . . . . . . . . . . . 24

2.2.4 Combiners . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

v

剩余339页未读，继续阅读

内容反馈

蜗牛也给劲

2015-10-17

很不错的一本书，大牛之作
wangjinbao123456

2012-12-02

英语阅读文档能力真是太重要了，哎还是要加强学习！

zhouzuhao

粉丝: 1
资源: 3

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

feedback-tip