没有合适的资源？快使用搜索试试~ 我知道了~

文库首页大数据HadoopMining of Massive Datasets（2nd edition）

Mining of Massive Datasets（2nd edition）

数据挖掘

需积分: 10 38 下载量 84 浏览量 2018-04-30 09:15:11 上传评论 1 收藏 2.86MB PDF 举报

温馨提示

试读

513页

大数据-互联网大规模书数据挖掘与分布式处理（第2版）英文版

资源推荐

资源详情

资源评论

Mining

Massive

Datasets

Jure Leskovec

Stanford Univ.

Anand Rajaraman

Milliway Labs

Jeﬀrey D. Ullman

Stanford Univ.

 2010, 2011, 2012, 2013, 2014 Anand Rajaraman, Jure Leskovec,

and Jeﬀrey D. Ullman

Preface

This book evolved from material develope d over several years by Anand Raja-

raman and Jeﬀ Ullman for a one-quarter course at Stanford. The course

CS345A, titled “Web Mining,” was designed as an advanced graduate course,

although it has become a c cessible and interesting to advance d undergraduates .

When Jure Leskovec joined the Stanford faculty, we reorganized the material

considerably. He introduced a new course CS224W on network analysis and

added material to CS3 45A, which was renumbered CS246. The three author s

also introduced a large-scale data-mining project course, C S341. The book now

contains ma terial taught in all thr e e courses.

What the Book Is About

At the highest level of description, this book is a bo ut data mining. However,

it focuses on da ta mining of very large amounts of data, that is, data so large

it do e s not ﬁt in main memory. Because of the emphasis on size, many of our

examples are about the Web or data derived from the Web. Further, the book

takes a n algorithmic point of view: data mining is about applying algorithms

to data, rather than using data to “train” a machine-learning engine of some

sort. The principal topics covered are:

1. Distributed ﬁle systems and map-reduce as a tool for creating parallel

algorithms that succeed on very large amounts of data.

2. Similarity search, including the key techniques of minhashing and locality-

sensitive hashing.

3. Data-stream processing and specializ ed algorithms for dealing with data

that arrives so fast it must be processed immediately or lost.

4. The technology of search engines, including Google’s PageRank, link-spam

detection, and the hubs-and-authorities approach.

5. Frequent-itemset mining, including association rules, market-baskets, the

A-Priori Algorithm and its improvements.

6. Algorithms for clustering very larg e, high-dimensional datasets.

iii

iv PREFACE

7. Two key pro ble ms for Web applications: mana ging advertising and rec-

ommendation systems.

8. Algorithms for analyzing and mining the structure of very large graphs,

especially social- ne twork graphs.

9. Techniques for obtaining the important properties of a large data set by

dimensionality reduction, including singular -va lue decomposition and la-

tent semantic indexing.

10. Machine-learning algorithms that can be applied to very large data, such

as perceptrons, support-vector machines, and gradient descent.

Prerequisites

To appreciate fully the material in this book, we recommend the following

prerequisites:

1. An introduction to database systems, covering SQL and related program-

ming systems.

2. A sophomore-level course in data structures, algorithms, and discrete

math.

3. A sophomore -level course in software s ystems, software engineering, and

programming languages.

Exercises

The book c ontains extensive exercises, with some for almost every sec tion. We

indicate harder exercises or pa rts of exercises with an exclamation point. The

hardest exercises have a do uble ex c lamation point.

Support on the Web

Go to http://www.mmds.org for slides, homework ass ignments, project require-

ments, and exams from courses related to this book.

Gradiance Automated Homework

There are automated exer cises based on this book, using the Gradiance root-

question technology, available at www.gradiance.com/services. Students may

enter a public clas s by creating an acc ount at that site and entering the class

with code 1EDD8A1D. Instructors may use the site by making an account ther e

PREFACE v

and then emailing support at gradiance dot com with their login name, the

name of their school, and a request to use the MMDS materials.

Acknowledgements

Cover art is by Scott Ullman.

We would like to thank Foto Afrati, Arun Marathe, and Rok Sosic for critica l

readings of a draft of this manuscript.

Erro rs were also reported by Rajiv Abraham, Ruslan Aduk, Apoorv Agar-

wal, Aris Anagnostopoulos, Yokila Arora, Stefanie Anna Baby, Atilla Soner

Balkir, Arnaud Belletoile, Robin Bennett, Susan B iancani, Amitabh Chaud-

hary, Leland Chen, Hua Feng, Marcus Gemeinder, Anastasios Gounaris, Clark

Grubb, Shrey Gupta, Waleed Hameid, Saman Haratizadeh, Julien Hoachuck,

Przemyslaw Horban, Jeﬀ Hwang, Raﬁ Kamal, Lachlan Kang, Ed Knorr, Hae-

woon Kwak, E llis Lau, Greg Lee, David Z. Liu, Ethan Lozano, Yunan Luo,

Michael Mahoney, Justin Meyer, Bryant Moscon, Bra d Penoﬀ, John Phillips,

Philips Kokoh Prasetyo, Qi Ge, Harizo Rajaona, Timon Ruban, Rich Seiter,

Hitesh Shetty, Angad Sing h, Sandee p Sripada, Dennis Sidharta, Krzysztof Sten-

cel, Mark Storus, Roshan Sumbaly, Zack Taylor, Tim Triche Jr., Wang Bin,

Weng Zhen-Bin, Robe rt West, Oscar Wu, Xie Ke, Christopher T.-R. Yeh, Nico-

las Zhao, and Zhou Jingbo, The remaining errors are ours, of cour se.

J. L.

A. R.

J. D. U.

Palo Alto, CA

March, 2014

剩余512页未读，继续阅读

评论收藏

内容反馈

资源评论

资源反馈

评论星级较低，若资源使用遇到问题可联系上传者，3个工作日内问题未解决可申请退款~

wolf61600

粉丝: 3
资源: 8

上传资源快速赚钱

我的内容管理展开

我的资源快来上传第一个资源

我的收益

登录查看自己的收益

我的积分登录查看自己的积分

我的C币登录后查看C币余额

我的收藏

我的下载

下载帮助

前往需求广场，查看用户热搜

Mining of Massive Datasets（2nd edition）

mining of massive datasets

Mining of Massive Datasets

Mining of Massive Dataset的中文版

Mining of Massive Datasets, 英文原版，斯坦福CS246官方教程

Mining of Massive Datasets数据挖掘

斯坦福大学book-Mining of Massive Datasets

Mining Massive Datasets

Anand.Rajaraman-Mining of Massive Datasets

Mining of Massive Datasets.zip

Mining of Massive Datasets, 英文原版，斯坦福CS246课程视频

Mining of Massive Datasets.pdf

《Mining of Massive Datasets》

大数据(Mining of Massive Datasets)

斯坦福大学Mining of Massive Datasets课程相关资源

Mining of Massive Dataset.rar

最新资源