没有合适的资源?快使用搜索试试~ 我知道了~
大数据的挖掘(web,分布式数据)——书籍
4星 · 超过85%的资源 需积分: 10 15 下载量 93 浏览量
2012-11-05
10:22:17
上传
评论
收藏 2.31MB PDF 举报
温馨提示
试读
398页
该书对大数据的挖掘有着独特的理解,为数据挖掘的学习者,尤其是大数据方向,提供了非常好的指导。
资源推荐
资源详情
资源评论
Mining
of
Massive
Datasets
Anand Rajaraman
Jure Leskovec
Stanford Univ.
Jeffrey D. Ullman
Stanford Univ.
Copyright
c
2010, 2011, 2012 Anand Rajaraman, Jure Leskovec, and Jeffrey
D. Ullman
ii
Preface
This book evolved from material developed over several years by Anand Raja-
raman and Jeff Ullman for a one-quarter course at Stanford. The course
CS345A, titled “Web Mining,” was desig ned as an advance d graduate course,
although it has become accessible and interesting to advanced undergraduates.
When Jure Leskovec jo ined the Stanford fa culty, we reorganized the material
considerably. He introduced a new course CS224W on network a nalysis and
added material to CS345A, which was renumbered CS246. The three authors
also introduced a large-scale data-mining project course, CS341. The book now
contains material ta ught in all three courses.
What the Book Is About
At the highest level of description, this book is about data mining. However,
it focus e s on data mining of very large amounts of data, that is, data so large
it do e s not fit in main memory. Because of the emphasis on size, many of our
examples are about the Web or data derived from the Web. Further, the book
takes an a lgorithmic point of view: data mining is about applying algorithms
to data, rather than using data to “train” a machine-learning engine of some
sort. The principal topics covered ar e :
1. Distributed file systems and map-reduce as a tool for creating parallel
algorithms tha t succeed on very large amounts of data.
2. Similarity search, including the key techniques of minhashing and locality-
sensitive hashing.
3. Data-stream processing and specialized algorithms for dealing with data
that arrives so fast it must be proce ssed immediately or los t.
4. The technology of search engines, including Goog le ’s PageRank, link-spam
detection, and the hubs-and-authorities approach.
5. Frequent-itemset mining, including association rules, market-baskets, the
A-Priori Algo rithm and its improvements.
6. Algorithms for clustering very large, high-dimensional datasets.
iii
iv PREFACE
7. Two key problems for Web applications: managing advertising and rec -
ommendation systems.
8. Algorithms for analyzing and mining the structure of very large graphs,
especially social-network graphs.
Prerequisites
To apprecia te fully the material in this book, we recommend the following
prerequisites:
1. An introduction to database systems, covering SQL and related program-
ming sy stems.
2. A sophomore-level course in data structures, algorithms, and discre te
math.
3. A sophomore -level course in software systems, software engineering, and
programming languages.
Exercises
The book contains extensive exercises, with some for a lmost every section. We
indicate harder ex e rcises or parts of exe rcises with an exclamation point. The
hardest exercises have a double excla mation point.
Support on the Web
You can find materials from past offerings of CS345A at:
http://infolab.stanford.edu/~ullman/mining/mining.html
There, you will find s lides , homework assignments, project requirements, and
in some cases, exams.
Acknowledgements
Cover art is by Scott Ullman.
We would like to thank Foto Afrati and Arun Marathe for critical readings
of the draft of this manuscript.
Errors were also reported by Apoorv Agarwal, Susan Bianca ni, Leland Chen,
Shrey Gupta, Xie Ke, Haewoon Kwak, Ellis Lau, Ethan Lozano, Justin Meyer,
Brad Penoff, Philips Kokoh Prasetyo, Angad Singh, Sandeep Sripada, Dennis
PREFACE v
Sidharta, Mark Storus, Roshan Sumbaly, Tim Triche Jr., and Robert West.
The remaining errors are ours, of course.
A. R.
J. L.
J. D. U.
Palo Alto, CA
July, 2012
剩余397页未读,继续阅读
资源评论
- mc19842013-01-08写的挺好,很有参考价值。
- master1982232012-11-24英文版,不好懂,英语一般的就不要下载了
- liveboyo2013-07-08值得拥有的一本书,谢谢分享!!!
keepgulp
- 粉丝: 7
- 资源: 30
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功